Y
Lead Data engineer (migration )
Yochana
Jersey City · On-site Full-time Lead 3w ago
About the role
Role Overview
This role is part of a multi year enterprise initiative to modernize data platforms by migrating from legacy and on prem environments to cloud native, governed, and scalable architectures.
The role focuses on migrating enterprise data workloads to Databricks on strategic cloud platforms, enabling standardized data engineering, analytics, centralized reporting, reconciliation utilities, and AI/ML use cases-while adhering to controls, security, resilience, and regulatory compliance.
Key Responsibilities
- Lead and execute migration of legacy data platforms (on prem / non standard tools) to Databricks on cloud under the Olympus program
- Perform application, data, and pipeline refactoring to cloud native Databricks patterns
- Drive migration planning including dependency analysis, sequencing, and cutover strategy
- Support coexistence models and transition from dual run to cloud only execution
Databricks Lakehouse Engineering
- Design and implement Databricks Lakehouse architecture (Bronze / Silver / Gold)
- Build scalable batch and streaming pipelines using PySpark, Spark SQL
- Leverage Delta Lake for reliability, versioning, and performance
- Optimize compute usage and cost in line with enterprise cloud efficiency goals
Enterprise Data Controls & Governance
- Embed data quality, reconciliation, and completeness controls as part of migration
- Ensure migrated workloads meet EDO governance, MCA, and audit requirements
- Maintain lineage, traceability, and explainability across migrated assets
- Support risk critical use cases (Finance, Ops, Recon, Reporting)
Cloud Security & Resilience
- Implement cloud aligned RBAC, identity controls, and secure access patterns
- Enforce data encryption, masking, and classification standards
- Ensure workloads meet operational resilience and recovery expectations
- Partner with cloud platform and security teams for certification and sign off
Reporting, Analytics & AI Enablement
- Enable downstream BI, regulatory reporting, and MI workloads on Databricks
- Support centralized reporting programs (e.g., ARA, GRU related use cases)
- Prepare data foundations for AI / ML and Agentic workflows post migration
Required Qualifications
- 8-12+ years in data engineering / platform modernization
- Strong hands on experience with Databricks in large scale enterprises
- Proven experience delivering cloud migration programs (on prem → cloud)
- Deep expertise in Apache Spark, PySpark, Spark SQL
- Experience embedding controls, reconciliation, and data quality in migrations
- Experience in regulated environments (banking / financial services preferred)
Preferred Qualifications
- Experience with Citi Olympus or equivalent enterprise cloud programs
- Knowledge of legacy data platforms and modernization patterns
- Familiarity with Finance, Ops, Recon, or Balance Sheet data domains
- Exposure to MLflow, AI pipelines, or GenAI enablement on cloud
- Strong understanding of run the bank vs change the bank execution
Behavioral & Delivery Expectations
- Strong ownership and execution mindset
- Comfortable operating in large, multi vendor transformation programs
- Ability to engage with Technology, Operations, Risk, and Audit stakeholders
Disciplined approach to migration risk, controls, and documentation
Skills
Apache SparkDatabricksDelta LakeMLflowPySparkSpark SQL
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free