Senior Data Engineer
PCGI Consulting
About the role
Company Description
PCGI is a boutique consulting firm specializing in high-performance business and technology consulting services. We focus on providing a strategic blend of management consulting, technology expertise, and industry-specific insights to drive meaningful business outcomes. Our mission is to empower businesses with robust data management, governance, and analytics solutions to address complex challenges and streamline decision-making processes. By emphasizing business outcomes and fostering alignment across teams, PCGI ensures clients can become more insights-driven and achieve their goals effectively.
Position Overview
PCGI is seeking an experienced Senior Data Engineer to own and evolve our Pharma Commercial Data Warehouse built on Snowflake. This is a critical role that sits at the intersection of commercial data operations and next-generation AI/ML enablement. The ideal candidate brings deep, hands-on experience with pharmaceutical commercial datasets, particularly IQVIA syndicated and patient-level data and can architect the data foundation that makes downstream analytics, generative AI, and machine-learning workloads production-ready.
You will partner closely with Commercial Insights, Sales & Marketing Operation, Market Access, Medical Affairs, and Data Science teams to ensure data is modeled, governed, and served in a way that accelerates insight delivery across the enterprise.
Core Responsibilities
Commercial Data Warehouse Ownership:
- Snowflake Architecture: Own and evolve the Pharma Commercial Data Warehouse on Snowflake including schema design, data sharing, role-based access, query optimization, clustering strategies, and zero-copy clone environments for development and testing.
- ETL/ELT Pipeline Engineering: Design, build, and maintain production-grade ingestion pipelines for IQVIA feeds, internal transactional systems, and third-party vendor data using Informatica, Python, and Airflow.
- Informatica MDM & Data Quality: Administer and extend the Informatica MDM hub for mastering key commercial entities (HCP, HCO, Product, Geography). Configure and maintain Informatica Data Quality rules and profiles to enforce standards across all source feeds.
- Orchestration & Scheduling: Manage end-to-end workflow orchestration using Apache Airflow, DAG design, dependency management, SLA monitoring, failure alerting, and retry strategies.
AI-Ready Data Architecture & Enablement:
- Feature Store & ML-Ready Datasets: Design and build curated, governed feature stores and ML-ready datasets on Snowflake to accelerate data science workloads including patient propensity models, next-best-action engines, formulary prediction, and treatment-pathway classifiers.
- Semantic / Knowledge Layer: Build and maintain a semantic layer (metadata catalog, business glossary, entity definitions) that enables generative AI and LLM-based applications to reason over commercial data assets with context and accuracy.
- Embeddings & Vector-Ready Pipelines: Architect pipelines that produce vector embeddings from structured and unstructured pharma data (call notes, medical inquiries, adverse events) for retrieval-augmented generation (RAG) and semantic-search use cases.
- Data Contracts & Schema Governance: Implement data contracts and schema registries to guarantee the stability, versioning, and backward compatibility of datasets consumed by downstream ML and AI systems.
- Lineage, Observability & Trust: Instrument data pipelines with end-to-end lineage tracking, data-quality scoring, freshness monitoring, and anomaly detection so that AI/ML consumers can trust the data they receive.
Analytics & Reporting Enablement:
- Power BI Semantic Models: Build and optimize Power BI datasets (Direct Query and Import), DAX measures, and row-level security models that serve Commercial, Market Access, and Medical Affairs dashboards.
- Automation & Self-Service: Drive automation of reporting workflows using Python, shell scripting, and Airflow to reduce manual effort and enable self-service analytics.
Delivery & Leadership:
- Team Leadership: Lead and mentor a team of 3–5 data engineers in an Onshore–Offshore delivery model. Conduct code reviews, define engineering standards, and drive sprint planning.
- Stakeholder Partnership: Serve as the primary data-engineering point of contact for Commercial Analytics, Data Science, Market Access, and IT leadership. Translate business requirements into scalable technical solutions.
- Vendor & Data Provider Management: Manage relationships with IQVIA, MMIT, and other data vendors including feed onboarding, SLA tracking, data-quality issue resolution, and contract-renewal input.
Skills Required: Pharma Commercial Data Domain Expertise (Critical)
This role requires demonstrated, working knowledge of the following IQVIA and third-party pharmaceutical data assets:
Sales & Demand Data:
- IQVIA DDD / NPA / NSP: Deep understanding of sub-national prescriber-level (DDD), national prescription audit (NPA), and national sales perspectives (NSP) data structures, including product hierarchies, outlet types, and projection methodologies.
- IQVIA Xponent / Plantrak: Proficiency in prescription-level demand data, prescriber-specialty mapping, and plan-level tracking for managed care analytics.
- Sales Force Alignment & Territory Data: Experience integrating territory alignment files (e.g., IQVIA OneKey / AMA masterfiles) into warehouse structures to support call-plan and field-force analytics.
Market Access & Managed Care Data:
- Formulary & Coverage Data: Hands-on experience with MMIT / IQVIA Formulary data formulary status, PA/ST requirements, tier positioning and how it links to payer hierarchies.
- Gross-to-Net & Contract Performance: Understanding of rebate, chargeback, and contract data flows from GPOs, PBMs, and managed Medicaid; ability to model GTN waterfalls in the warehouse.
- Government Pricing & 340B: Familiarity with Medicaid Drug Rebate Program data, AMP/BP calculations, and 340B covered-entity data.
Claims, Labs & Patient-Level Data (RWD):
- Medical & Pharmacy Claims: Experience with IQVIA PharMetrics Plus, Dx/Rx claims feeds, or comparable claims databases. Understands diagnosis codes (ICD-10), procedure codes (CPT/HCPCS), NDC-level pharmacy claims, and longitudinal patient linkage.
- Lab / EMR Data: Familiarity with lab-result datasets (e.g., IQVIA Lab Data, Praxis/LabCorp feeds) including LOINC coding, result normalization, and how lab values feed patient-journey and biomarker analyses.
- Patient-Level Data & Longitudinal Linking: Proven experience building patient-centric data models from de-identified or tokenized patient datasets, supporting patient-journey, adherence/persistence, switch analyses, and treatment-pattern analytics.
- APLD / Anonymized Patient Longitudinal Data: Ability to integrate and model IQVIA APLD assets for outcomes research and commercial analytics.
Technical Requirements:
- Data Warehousing: 7+ years hands-on data engineering; 3+ years with Snowflake (administration, performance tuning, data sharing, Snowpark)
- SQL & Python: Expert-level SQL with deep performance-tuning experience; strong Python skills for ETL, data wrangling, and automation
- Informatica MDM & DQ: 3+ years administering Informatica MDM and Data Quality; experience with match/merge rules, survivorship, and DQ scorecards
- Orchestration: Production experience with Apache Airflow (DAG design, custom operators, SLA monitoring)
- BI & Visualization: Hands-on Power BI development datasets, DAX, RLS, scheduled refresh, and gateway configuration
- Cloud Platforms: Strong AWS experience (S3, Glue, Lambda, EMR, EC2); familiarity with Azure is a plus
- Data Modeling: Expertise in dimensional modeling (star/snowflake schemas), slowly changing dimensions, and data vault concepts
- AI/ML Data Infra: Experience building feature stores, ML pipelines, or vector-embedding workflows; familiarity with tools like dbt, Great Expectations, or MLflow is a plus
Domain & Leadership Requirements:
- Pharma Data: 3+ years working with IQVIA commercial datasets (DDD, Xponent, Plantrak, NPA, APLD, Claims) in a data-engineering or analytics-engineering capacity
- Market Access: Working knowledge of formulary/coverage data, GTN analytics, and government-pricing data flows would be a plus
- Consulting Pedigree: Background in Pharma/Healthcare data consulting strongly preferred
- Team Leadership: Proven experience leading and mentoring data-engineering teams of 3+ in an onshore–offshore model
- Communication: Ability to translate complex technical concepts for business stakeholders and present to senior leadership
Preferred Qualifications
Candidates who also bring any of the following will stand out:
- Experience with Snowflake Cortex, Snowpark ML, or Snowflake’s native AI/ML capabilities.
- Hands-on experience with dbt for transformation layer management and testing.
- Familiarity with data-mesh or data-product architectures in a pharma context.
- Prior work with IQVIA OneKey / AMA Masterfile for HCP/HCO master data.
- Knowledge of HIPAA, de-identification standards (Safe Harbor / Expert Determination), and pharma data-privacy regulations.
- Experience with generative AI application development (RAG pipelines, prompt engineering, LLM evaluation).
- Certifications: Snowflake SnowPro, AWS Solutions Architect, Informatica MDM.
Why Join PCGI?
- Own a mission-critical commercial data asset that directly drives brand strategy, field-force effectiveness, and market-access decisions across a global pharma portfolio.
- Build the AI-ready data foundation your work will power the next generation of generative AI and predictive analytics in Life Sciences.
- Lead high-impact, enterprise-grade data-transformation programs with direct visibility to executive leadership.
- Shape architecture, standards, and best practices in a fast-growing DataTech and AI/ML practice.
- Competitive compensation, comprehensive benefits, and a culture that values technical depth and intellectual curiosity.
Explore more details and apply here:
https://www.pcgisystems.com/technology-stack/ai-data-engineering-lead
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free