Skip to content
mimi

Data Engineer (AI/LLM, Data Lake) – Hedge Fund | Hybrid NYC

Purple Hires Inc.

New York · On-site Contract 1mo ago

About the role

Must-Have Skills

  • 10+ years of Data Engineering experience
  • Strong in Java or Python + SQL
  • Hands-on with Data Lakes (S3, Delta Lake, Iceberg)
  • Experience with ETL tools (AWS Glue, custom frameworks, etc.)
  • AI/LLM data experience (MANDATORY)
  • Exposure to RAG, embeddings, or AI pipelines

Key Responsibilities

Data Engineering & Pipelines

  • Build and maintain batch & streaming pipelines using Java / Python & SQL
  • Ingest and normalize financial datasets from multiple vendors
  • Implement strong data validation & quality frameworks

AI / LLM Data Enablement

  • Design data models optimized for:
    • LLMs
    • Vector embeddings
    • AI agents
  • Build pipelines supporting RAG architectures
  • Work with tools like LangChain, Amazon Bedrock, or open-source LLMs

Platform Optimization

  • Optimize data lake architecture (S3, Delta Lake)
  • Improve performance using Snowflake, Spark, EMR
  • Ensure metadata management & observability

Collaboration

  • Partner with Analytics, Security & Platform teams
  • Deliver high-quality, production-ready datasets
  • Troubleshoot data issues like schema drift & vendor anomalies

Skills

AWS GlueDelta LakeEMRIcebergJavaLangChainPythonRAGS3SnowflakeSparkSQL

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free