Skip to content
mimi

Founding Engineer - Remote India

shyva AI

Pune · On-site Internship 2w ago

About the role

SHYVA | Founding Engineer Stealth · Enterprise AI · Remote

About

We are building something that should have existed a decade ago — in a market where the data is fragmented, unverified, and nobody has fixed it yet. The founder has been the customer for 25 years and knows exactly what is broken. Six Fortune 500 enterprises are already committed as design partners. We are in stealth and will stay there for now.

The Role

You will be one of the first engineering hires, working directly with the founder to build the core platform — a large-scale data intelligence system with an AI-native interface. The hard problems are data, not models: ingestion at volume, entity resolution across heterogeneous sources, auditability of every output, and a graph-based data model built to compound over time.

No platform team. No DevOps org. No PM handing you specs. Full architectural ownership from day one.

Must-Have

Full-Stack Engineering

• Python backend (FastAPI/Django) and React/Next.js frontend — you own the entire stack • Cloud-native: AWS or GCP, Docker/Kubernetes Large-Scale Data Engineering • ETL/ELT pipelines at 10M+ record scale — Spark, dbt, Kafka, Airflow • Experience ingesting and normalising licensed third-party commercial data feeds — bulk files, schema inconsistency, freshness tracking, provenance management • Data lineage and auditability: every output traceable to a source record, timestamp, and confidence level • Batch and event-driven ingestion patterns Graph Data Modeling • Neo4j or graph layers on relational DBs: node/edge schema design, relationship versioning, provenance preservation • Graph traversal for network analysis and entity influence ranking Entity Resolution & Deduplication • Probabilistic record linkage, fuzzy matching, multi-attribute scoring at volume • Blocking strategies for large record pools (LSH, phonetic encoding, prefix blocking) • Canonical entity management with merge history and audit trail LLM & Agent Orchestration • LangChain, LangGraph, CrewAI or custom orchestrators — shipped multi-step agent workflows in production • RAG pipelines: hybrid retrieval, chunking, reranking • Guardrail architecture: post-generation validation, uncertainty flagging, stale-data detection Document Extraction • OCR pipelines and structured extraction from complex business documents • Field normalisation across currencies, date formats, and units of measure Semantic & Vector Search • Elasticsearch, pgvector, Weaviate, or Pinecone — hybrid retrieval at scale Background • CS or Electrical Engineering degree from a strong institution • 6–10 years hands-on; at least one role with genuine end-to-end ownership

Strong Plus

• Startup or early-stage experience — comfortable without guardrails • Supply chain, procurement, or trade finance domain knowledge • Multi-source data reconciliation across heterogeneous commercial providers • Enterprise system connectors (SAP Ariba, Oracle, or similar)

What We Offer

• Founding engineer equity • Direct collaboration with a domain expert founder — no translation layer between you and the customer problem • Real customers from day one — six Fortune 500 design partners already committed • Full architectural ownership • India remote

How to Apply

Skip the cover letter. Answer three questions:

• What is the most technically complex data system you have built? What made it hard? • Describe an architectural decision you made with incomplete information. What did you decide and why? • What draws you to a role where the hardest problems are data quality and trust, not model performance? Include a link to something you built that is running right now.

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free