Lead Data engineer (migration )

Yochana

Jersey City · On-site Full-time Lead 2mo ago

About the role

Role Overview

This role is part of a multi year enterprise initiative to modernize data platforms by migrating from legacy and on prem environments to cloud native, governed, and scalable architectures.

The role focuses on migrating enterprise data workloads to Databricks on strategic cloud platforms, enabling standardized data engineering, analytics, centralized reporting, reconciliation utilities, and AI/ML use cases-while adhering to controls, security, resilience, and regulatory compliance.

Key Responsibilities

Lead and execute migration of legacy data platforms (on prem / non standard tools) to Databricks on cloud under the Olympus program
Perform application, data, and pipeline refactoring to cloud native Databricks patterns
Drive migration planning including dependency analysis, sequencing, and cutover strategy
Support coexistence models and transition from dual run to cloud only execution

Databricks Lakehouse Engineering

Design and implement Databricks Lakehouse architecture (Bronze / Silver / Gold)
Build scalable batch and streaming pipelines using PySpark, Spark SQL
Leverage Delta Lake for reliability, versioning, and performance
Optimize compute usage and cost in line with enterprise cloud efficiency goals

Enterprise Data Controls & Governance

Embed data quality, reconciliation, and completeness controls as part of migration
Ensure migrated workloads meet EDO governance, MCA, and audit requirements
Maintain lineage, traceability, and explainability across migrated assets
Support risk critical use cases (Finance, Ops, Recon, Reporting)

Cloud Security & Resilience

Implement cloud aligned RBAC, identity controls, and secure access patterns
Enforce data encryption, masking, and classification standards
Ensure workloads meet operational resilience and recovery expectations
Partner with cloud platform and security teams for certification and sign off

Reporting, Analytics & AI Enablement

Enable downstream BI, regulatory reporting, and MI workloads on Databricks
Support centralized reporting programs (e.g., ARA, GRU related use cases)
Prepare data foundations for AI / ML and Agentic workflows post migration

Required Qualifications

8-12+ years in data engineering / platform modernization
Strong hands on experience with Databricks in large scale enterprises
Proven experience delivering cloud migration programs (on prem → cloud)
Deep expertise in Apache Spark, PySpark, Spark SQL
Experience embedding controls, reconciliation, and data quality in migrations
Experience in regulated environments (banking / financial services preferred)

Preferred Qualifications

Experience with Citi Olympus or equivalent enterprise cloud programs
Knowledge of legacy data platforms and modernization patterns
Familiarity with Finance, Ops, Recon, or Balance Sheet data domains
Exposure to MLflow, AI pipelines, or GenAI enablement on cloud
Strong understanding of run the bank vs change the bank execution

Behavioral & Delivery Expectations

Strong ownership and execution mindset
Comfortable operating in large, multi vendor transformation programs
Ability to engage with Technology, Operations, Risk, and Audit stakeholders

Disciplined approach to migration risk, controls, and documentation

Skills

Apache SparkDatabricksDelta LakeMLflowPySparkSpark SQL

Similar roles

AI Forward Deploy Engineer

Arango

Senior, hands-on Cloud Security Engineer

Sigma

$210k – $240k/yr

Junior Identity Security Metrics Consultant & Databricks Analyst

PlanIT Group LLC

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free