PH
Data Engineer (AI/LLM, Data Lake) – Hedge Fund | Hybrid NYC
Purple Hires Inc.
New York · On-site Contract 1mo ago
About the role
Must-Have Skills
- 10+ years of Data Engineering experience
- Strong in Java or Python + SQL
- Hands-on with Data Lakes (S3, Delta Lake, Iceberg)
- Experience with ETL tools (AWS Glue, custom frameworks, etc.)
- AI/LLM data experience (MANDATORY)
- Exposure to RAG, embeddings, or AI pipelines
Key Responsibilities
Data Engineering & Pipelines
- Build and maintain batch & streaming pipelines using Java / Python & SQL
- Ingest and normalize financial datasets from multiple vendors
- Implement strong data validation & quality frameworks
AI / LLM Data Enablement
- Design data models optimized for:
- LLMs
- Vector embeddings
- AI agents
- Build pipelines supporting RAG architectures
- Work with tools like LangChain, Amazon Bedrock, or open-source LLMs
Platform Optimization
- Optimize data lake architecture (S3, Delta Lake)
- Improve performance using Snowflake, Spark, EMR
- Ensure metadata management & observability
Collaboration
- Partner with Analytics, Security & Platform teams
- Deliver high-quality, production-ready datasets
- Troubleshoot data issues like schema drift & vendor anomalies
Skills
AWS GlueDelta LakeEMRIcebergJavaLangChainPythonRAGS3SnowflakeSparkSQL
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free