JV
Data Engineer / AI Engineer (Agentic AI Platform – Financial Data)
Jobs via Dice
Philadelphia · Hybrid Contract Today
About the role
About the Role:
We are building a platform that converts unstructured financial data (emails, corporate actions, index announcements) into high-quality, structured datasets used by financial institutions.
This is not a typical “LLM wrapper” role.
You will work on systems that:
- Extract data from noisy, inconsistent sources
- Validate and reconcile outputs across multiple inputs
- Ensure correctness, traceability, and auditability
The challenge is not just applying LLMs—it’s making them reliable in production for financial workflows.
What You’ll Work On
- Designing pipelines that process high-volume financial documents (batch + near real-time)
- Building LLM-powered extraction workflows (classification, parsing, summarization)
- Implementing validation layers (rule-based + model-based) to reduce hallucinations
- Developing retrieval systems using embeddings and vector search
- Architecting end-to-end systems: ingestion → processing → storage → serving
- Ensuring data quality, observability, and fault tolerance
- Collaborating with product to turn messy data into usable financial intelligence
Core Requirements
- Strong Python and backend/data engineering experience
- Experience building production data pipelines (ETL, streaming, or async systems)
- Solid understanding of distributed systems and failure modes
- Experience working with LLM-based systems in production:
- Prompt design
- Output validation
- Retry/fallback strategies
- Evaluation and monitoring
- Experience with data storage systems (SQL + NoSQL)
- Familiarity with cloud infrastructure (AWS or similar)
Preferred Experience
- Experience with RAG / vector search systems
- Background in financial data or capital markets
- Experience with streaming systems (Kafka, etc.)
- Experience building multi-step or agent-style workflows
What Makes This Role Interesting
- Work on high-accuracy AI systems where correctness matters
- Solve real problems around:
- LLM reliability and hallucination mitigation
- Data consistency across conflicting sources
- Real-time vs correctness tradeoffs
- Build systems used in financial decision-making workflows
- High ownership over core architecture in an early-stage environment
Nice To Know (but Not Required)
- Experience with orchestration tools (Airflow, etc.)
- Exposure to evaluation frameworks for LLMs
- Experience working with large-scale document processing
Tech Stack (Representative, not exhaustive)
- Python, APIs, async processing
- LLM APIs + embeddings
- SQL / NoSQL databases
- Cloud infrastructure (AWS)
- Data pipelines and streaming systems
- Vector Databases
Skills
AWSAPIsAsync processingCloud infrastructureData pipelinesEmbeddingsETLKafkaLLMNoSQLObservabilityPythonRAGSQLStreaming systemsVector databasesVector search
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free