Band
Level 3
Job Description Summary
Join our team at data42!The vision of data42 is to inspire collaborative groundbreaking data and AI initiatives, to reimagine drug discovery at Novartis and accelerate time to market, ultimately transforming healthcare and improving lives. We are a small team dedicated to bringing this vision to life by accelerating secondary research and data-driven decisions where we provide scientific support and data ready for analysis in a collaborative environment, leveraging a secure and governed AI-enabled platform.
As a member of the data42 team the Data Engineer will work alongside data scientists and domain experts to enable teams to answering scientific questions using multi-modal data on the data42 platform.
They will be involved in gathering use-case requirements, performing engineering activities for data services, building ETL processes/data pipelines in quick iterations to deliver data ready for analysis. The Data Engineer will integrate data engineering best practices and data quality checks and seek to continuously optimise efficiency.
Job Description
Major accountabilities:
- Collaborates with domain experts, data scientists and other stakeholders to fulfil use-case specific data needs
- Designs, develops, tests, and maintains ETL processes/data pipelines to extract, prepare and iterate data for analysis in close alignment with TA / DA scientific leads and data scientists
- Implements and maintains data checks to ensure accurate and high quality-data in close collaboration with domain experts
- Identifies and rectifies data inconsistencies and irregularities
- Promotes culture of transparency and communication regarding data modifications and lineage to all stakeholders
- Implements and advocates for data engineering best practices, ensuring ETL processes/data pipelines are efficient, well-documented and well-tested
- Plays a role in knowledge sharing across data42 and wider data engineering community at Novartis
- Ensures compliance with Security and Governance Principles
Ideal background
- Bachelor’s degree in computer science or other quantitative field (Mathematics, Statistics, Physics, Engineering, etc.) or equivalent practical experience.
- Proven experience as a data engineer, data wrangler or a similar role.
- Exceptional programming skills with expertise in Python, R and Spark.
- Experience and familiarity with a variety of data types, including but not limited to images, tabular, unstructured, and text.
- Experience in scalable data processing engines, data ingestion, extraction and modeling.
- Proficient knowledge in statistics, with an ability to assess data quality, errors, inconsistencies, etc.
- Good knowledge of data engineering best practices · Excellent communication and stakeholder management skills.
- Demonstrated ability to work independently and as part of global Agile teams.
Desirable additional skills in two or more of the following areas:
- Hands on experience on Palantir Foundry (Code Repository, Code Workbook, Contour, Data Lineage, etc.)
- Knowledge of CDISC data standards (SDTM, ADaM)
- Experience using AI (eg: GenAI/LLMs) for data wrangling.
- Experience with pooling of clinical trial data.
- High-level understanding of the drug discovery and development process.
Novartis is committed to building an outstanding, inclusive work environment and diverse teams representative of the patients and communities we serve.
Skills Desired
Clinical Data Management, Databases, Data Governance, Data Integrity, Data Management, Data Quality, Data Science, Waterfall Model
Report job