Skip to content
mimi

Principal Data Engineer

Johnson & Johnson

flexible Full-time Lead $102k – $177k/yr Today

About the role

About Innovative Medicine

Our expertise in Innovative Medicine is informed and inspired by patients, whose insights fuel our science-based advancements. Visionaries like you work on teams that save lives by developing the medicines of tomorrow.

Join us in developing treatments, finding cures, and pioneering the path from lab to life while championing patients every step of the way. Learn more at

Role Overview

We are seeking a Principal Data Engineer to provide technical leadership within Global Medical Safety (GMS), supporting the Safety Analytics organization. This role is focused on building and enabling modern safety analytics tools using AI, Machine Learning, and GenAI, underpinned by robust, compliant, and scalable data engineering on Google Cloud Platform (Google Cloud Platform).

The Principal Data Engineer is responsible for end-to-end ownership of safety analytics data engineering, spanning data intake, data quality and continuity, pipeline and architecture design, automation, performance optimization, and compliance. The role enables advanced analytical, machine learning, and predictive capabilities for pharmacovigilance and serves as a technical data engineering leader within Global Medical Safety.

This is a Principal-level individual contributor role with broad technical influence, working closely with safety scientists, analytics teams, data scientists, IT, and platform partners to deliver trusted, production-grade analytics capabilities for safety decision-making.

Key Responsibilities

Safety Analytics & Pharmacovigilance Enablement

  • Design and maintain production-grade data pipelines and curated datasets that directly support pharmacovigilance activities, including safety monitoring, analytics, and regulatory reporting.
  • Ensure data engineering solutions produce reproducible, explainable, and trusted analytics outputs suitable for safety decision support and inspection readiness.
  • Enable AI/ML and GenAI workflows for safety analytics, including:
    • Feature engineering and feature store enablement
    • Embeddings, vectorized representations, and semantic retrieval
    • Retrieval-Augmented Generation (RAG) patterns for safety analytics tools

End-to-End Data Architecture & Lifecycle Ownership

  • Own the end-to-end data lifecycle for safety analytics, from source system intake through transformation, serving, and downstream analytical consumption, ensuring data continuity, traceability, and integrity.
  • Lead architectural decisions across ingestion, transformation, storage, and serving layers on Google Cloud Platform (e.g., BigQuery, Dataform, object storage).
  • Design, implement, and automate scalable, reusable data pipelines and architectures to support evolving safety analytics needs.

Data Quality, Governance & Compliance

  • Establish and enforce data quality, validation, lineage, and observability standards for safety analytics datasets.
  • Define and implement data governance practices, including data contracts, schema versioning, access control, stewardship, and lifecycle management.
  • Ensure safety analytics data and systems meet Global Medical Safety requirements for reliability, auditability, and regulatory use.

GxP Validation & Regulatory Readiness

  • Apply GxP validation expertise to data pipelines, analytics services, and supporting infrastructure.
  • Partner with quality and compliance teams to implement CSV/CSA-aligned controls, audit trails, documentation, and organizational change.
  • Balance delivery velocity and innovation with the rigor required for regulated pharmacovigilance systems.

Services, APIs & Microservices

  • Design and build APIs and microservices-based architectures to operationalize safety analytics and ML capabilities (e.g., feature serving, retrieval services, analytics backends).
  • Deploy and operate services on Google Cloud Platform (e.g., Cloud Run, GKE) with a strong focus on security, scalability, and observability.
  • Enforce contract-first integration patterns between producing and consuming systems to ensure reliability and safe evolution.

Infrastructure, CI/CD & Cost Optimization

  • Provision and manage cloud infrastructure using Terraform (Infrastructure as Code) on Google Cloud Platform.
  • Build and maintain CI/CD pipelines (e.g., Jenkins) for data pipelines, analytics services, feature pipelines, and ML data assets.
  • Continuously optimize the performance and cost efficiency of data and analytics infrastructure while maintaining compliance and reliability standards.

Technical Leadership & Stakeholder Engagement

  • Serve as a technical authority and data engineering leader for Safety Analytics within Global Medical Safety.
  • Review and influence designs across pipelines, services, feature stores, and AI/ML integrations to maintain a high technical bar.
  • Collaborate closely with safety scientists, epidemiologists, biostatisticians, analytics teams, IT, and platform partners to translate safety needs into scalable technical solutions.
  • Communicate complex technical concepts and tradeoffs clearly to both technical and non-technical stakeholders.
  • Enable and upskill teams through mentorship, guidance, and knowledge sharing on modern data, cloud, and AI technologies.

Qualifications

  • Master's degree in Computer Science, Engineering, or a related field (or equivalent experience) is required.
  • 5+ years of experience in data engineering or analytics engineering with increasing responsibilities.
  • Proficient programming skills in Python and SQL.
  • Deep understanding of data architecture for analytics and ML (e.g., batch/streaming, modeling, performance optimization).
  • Proven ability to translate complex problems into clear, concise, and testable programming code/tools.
  • Experience implementing data contracts, data validation, schema versioning, and governance practices, as well as a solid understanding of leading cloud concepts (Google Cloud Platform preferred).
  • Experience designing and operating APIs and microservices-based architectures.
  • Excellent written and verbal communication, customer service, interpersonal, and teamwork skills to foster a collaborative team environment.
  • Solid understanding of SDLC and Agile methodologies, alongside basic project management skills.
  • Experience building production workloads on Google Cloud Platform (Google Cloud Platform) is preferred.
  • Experience provisioning infrastructure using Terraform (Infrastructure as Code) and building CI/CD pipelines (e.g., Jenkins) is preferred.
  • Experience in pharmaceuticals, life sciences, healthcare, or a related regulated domain is preferred.
  • Google Cloud Platform certification is preferred.
  • Experience enabling AI/ML and GenAI workflows (e.g., feature engineering, RAG patterns, semantic retrieval) for analytical applications is preferred.

Benefits

  • Pension plan
  • 401(k) plan
  • Vacation - 120 hours per calendar year
  • Sick time - 40 hours per calendar year
  • Holiday pay, including Floating Holidays - 13 days per calendar year
  • Work, Personal and Family Time - up to 40 hours per calendar year
  • Parental Leave - 480 hours within one year of the birth/adoption/foster care of a child
  • Bereavement Leave - 240 hours for an immediate family member: 40 hours for an extended family member per calendar year
  • Caregiver Leave - 80 hours in a 52-week rolling period
  • Volunteer Leave - 32 hours per calendar year
  • Military Spouse Time-Off - 80 hours per calendar year

Skills

AIBigQueryCloud RunDataformData GovernanceData QualityData ScienceGenAIGKEGoogle Cloud PlatformInfrastructure as CodeJenkinsMachine LearningMicroservicesObject StoragePythonSQLTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free