CAREERS

AI & Data Engineering Lead (Manager) - Pharma Commercial Data Warehouse

Experience Level

7 + Years Experience

Required Qualification

Technical Requirements
Data Warehousing, SQL & Python, Informatica MDM & DQ, Orchestration, BI & Visualization, Cloud Platforms, Data Modeling AI/ML, Data Infra
Domain & Leadership Requirements
Pharma Data, Market Access, Consulting Pedigree, Team Leadership, Communication

Employment Type

Full Time

Industry

Technology

Location

Washington, D.C. Metro area

Duties & Responsibilities

Position Overview

PCGI is seeking an experienced Senior Data Engineer to own and evolve our Pharma Commercial Data Warehouse built on Snowflake. This is a critical role that sits at the intersection of commercial data operations and next-generation AI/ML enablement. The ideal candidate brings deep, hands-on experience with pharmaceutical commercial datasets, particularly IQVIA syndicated and patient-level data and can architect the data foundation that makes downstream analytics, generative AI, and machine-learning workloads production-ready.

You will partner closely with Commercial Insights, Sales & Marketing Operation, Market Access, Medical Affairs, and Data Science teams to ensure data is modeled, governed, and served in a way that accelerates insight delivery across the enterprise.

Commercial Data Warehouse Ownership

  • Snowflake Architecture: Own and evolve the Pharma Commercial Data Warehouse on Snowflake including schema design, data sharing, role-based access, query optimization, clustering strategies, and zero-copy clone environments for development and testing.
  • ETL/ELT Pipeline Engineering: Design, build, and maintain production-grade ingestion pipelines for IQVIA feeds, internal transactional systems, and third-party vendor data using Informatica, Python, and Airflow.
  • Informatica MDM & Data Quality: Administer and extend the Informatica MDM hub for mastering key commercial entities (HCP, HCO, Product, Geography). Configure and maintain Informatica Data Quality rules and profiles to enforce standards across all source feeds.
  • Orchestration & Scheduling: Manage end-to-end workflow orchestration using Apache Airflow, DAG design, dependency management, SLA monitoring, failure alerting, and retry strategies.


AI-Ready Data Architecture & Enablement

  • Feature Store & ML-Ready Datasets: Design and build curated, governed feature stores and ML-ready datasets on Snowflake to accelerate data science workloads including patient propensity models, next-best-action engines, formulary prediction, and treatment-pathway classifiers.
  • Semantic / Knowledge Layer: Build and maintain a semantic layer (metadata catalog, business glossary, entity definitions) that enables generative AI and LLM-based applications to reason over commercial data assets with context and accuracy.
  • Embeddings & Vector-Ready Pipelines: Architect pipelines that produce vector embeddings from structured and unstructured pharma data (call notes, medical inquiries, adverse events) for retrieval-augmented generation (RAG) and semantic-search use cases.
  • Data Contracts & Schema Governance: Implement data contracts and schema registries to guarantee the stability, versioning, and backward compatibility of datasets consumed by downstream ML and AI systems.
  • Lineage, Observability & Trust: Instrument data pipelines with end-to-end lineage tracking, data-quality scoring, freshness monitoring, and anomaly detection so that AI/ML consumers can trust the data they receive.


Analytics & Reporting Enablement

  • Power BI Semantic Models: Build and optimize Power BI datasets (Direct Query and Import), DAX measures, and row-level security models that serve Commercial, Market Access, and Medical Affairs dashboards.
  • Automation & Self-Service: Drive automation of reporting workflows using Python, shell scripting, and Airflow to reduce manual effort and enable self-service analytics.


Delivery & Leadership

  • Team Leadership: Lead and mentor a team of 3–5 data engineers in an Onshore–Offshore delivery model. Conduct code reviews, define engineering standards, and drive sprint planning.
  • Stakeholder Partnership: Serve as the primary data-engineering point of contact for Commercial Analytics, Data Science, Market Access, and IT leadership. Translate business requirements into scalable technical solutions.
  • Vendor & Data Provider Management: Manage relationships with IQVIA, MMIT, and other data vendors including feed onboarding, SLA tracking, data-quality issue resolution, and contract-renewal input.

Skills Required

  • Pharma Commercial Data Domain Expertise (Critical)
    This role requires demonstrated, working knowledge of the following IQVIA and third-party pharmaceutical data assets:

    Sales & Demand Data

    • IQVIA DDD / NPA / NSP: Deep understanding of sub-national prescriber-level (DDD), national prescription audit (NPA), and national sales perspectives (NSP) data structures, including product hierarchies, outlet types, and projection methodologies.
    • IQVIA Xponent / Plantrak: Proficiency in prescription-level demand data, prescriber-specialty mapping, and plan-level tracking for managed care analytics.
    • Sales Force Alignment & Territory Data: Experience integrating territory alignment files (e.g., IQVIA OneKey / AMA masterfiles) into warehouse structures to support call-plan and field-force analytics.


    Market Access & Managed Care Data

    • Formulary & Coverage Data: Hands-on experience with MMIT / IQVIA Formulary data formulary status, PA/ST requirements, tier positioning and how it links to payer hierarchies.
    • Gross-to-Net & Contract Performance: Understanding of rebate, chargeback, and contract data flows from GPOs, PBMs, and managed Medicaid; ability to model GTN waterfalls in the warehouse.
    • Government Pricing & 340B: Familiarity with Medicaid Drug Rebate Program data, AMP/BP calculations, and 340B covered-entity data.


    Claims, Labs & Patient-Level Data (RWD)

    • Medical & Pharmacy Claims: Experience with IQVIA PharMetrics Plus, Dx/Rx claims feeds, or comparable claims databases. Understands diagnosis codes (ICD-10), procedure codes (CPT/HCPCS), NDC-level pharmacy claims, and longitudinal patient linkage.
    • Lab / EMR Data: Familiarity with lab-result datasets (e.g., IQVIA Lab Data, Praxis/LabCorp feeds) including LOINC coding, result normalization, and how lab values feed patient-journey and biomarker analyses.
    • Patient-Level Data & Longitudinal Linking: Proven experience building patient-centric data models from de-identified or tokenized patient datasets, supporting patient-journey, adherence/persistence, switch analyses, and treatment-pattern analytics.
    • APLD / Anonymized Patient Longitudinal Data: Ability to integrate and model IQVIA APLD assets for outcomes research and commercial analytics.


    Technical Requirements :
    Data Warehousing : 7+ years hands-on data engineering; 3+ years with Snowflake (administration, performance tuning, data sharing, Snowpark)
    SQL & Python : Expert-level SQL with deep performance-tuning experience; strong Python skills for ETL, data wrangling, and automation
    Informatica MDM & DQ : 3+ years administering Informatica MDM and Data Quality; experience with match/merge rules, survivorship, and DQ scorecards
    Orchestration : Production experience with Apache Airflow (DAG design, custom operators, SLA monitoring)
    BI & Visualization : Hands-on Power BI development datasets, DAX, RLS, scheduled refresh, and gateway configuration
    Cloud Platforms : Strong AWS experience (S3, Glue, Lambda, EMR, EC2); familiarity with Azure is a plus
    Data Modeling : Expertise in dimensional modeling (star/snowflake schemas), slowly changing dimensions, and data vault concepts
    AI/ML Data Infra : Experience building feature stores, ML pipelines, or vector-embedding workflows; familiarity with tools like dbt, Great Expectations, or MLflow is a plus

    Domain & Leadership Requirements :
    Pharma Data : 3+ years working with IQVIA commercial datasets (DDD, Xponent, Plantrak, NPA, APLD, Claims) in a data-engineering or analytics-engineering capacity
    Market Access : Working knowledge of formulary/coverage data, GTN analytics, and government-pricing data flows would be a plus
    Consulting Pedigree : Background in Pharma/Healthcare data consulting strongly preferred
    Team Leadership: Proven experience leading and mentoring data-engineering teams of 3+ in an onshore–offshore model
    Communication : Ability to translate complex technical concepts for business stakeholders and present to senior leadership

    Preferred Qualifications
    Candidates who also bring any of the following will stand out:
    • Experience with Snowflake Cortex, Snowpark ML, or Snowflake’s native AI/ML capabilities.
    • Hands-on experience with dbt for transformation layer management and testing.
    • Familiarity with data-mesh or data-product architectures in a pharma context.
    • Prior work with IQVIA OneKey / AMA Masterfile for HCP/HCO master data.
    • Knowledge of HIPAA, de-identification standards (Safe Harbor / Expert Determination), and pharma data-privacy regulations.
    • Experience with generative AI application development (RAG pipelines, prompt engineering, LLM evaluation).
    • Certifications: Snowflake SnowPro, AWS Solutions Architect, Informatica MDM.

    Why Join PCGI?
    • Own a mission-critical commercial data asset that directly drives brand strategy, field-force effectiveness, and market-access decisions across a global pharma portfolio.
    • Build the AI-ready data foundation your work will power the next generation of generative AI and predictive analytics in Life Sciences.
    • Lead high-impact, enterprise-grade data-transformation programs with direct visibility to executive leadership.
    • Shape architecture, standards, and best practices in a fast-growing DataTech and AI/ML practice.
    • Competitive compensation, comprehensive benefits, and a culture that values technical depth and intellectual curiosity.