Senior ML Engineer — Reinforcement Learning and Prediction
Zillwork
About the role
About Zillwork
Zillwork is a Singapore-incorporated AI company building technology that creates economic opportunity at scale. We are pre-launch, founder-led, and operating out of Chennai and Coimbatore.
We are technically ambitious and globally minded. Our stack includes fine-tuned multilingual speech models, real-time geo-intelligent matching at scale, and a homegrown AI engine designed from first principles. We are building for markets and communities that existing technology has never seriously attempted to serve.
We are a smart, focused team. Every hire shapes the culture and the product. If you want your work to matter — and you are energised by hard problems over comfort — read on.
The Role
As Senior ML Engineer (RL and Forecasting), you will build two of the platform's most technically complex systems: a PPO reinforcement learning policy for fair multi-objective matching, and a Temporal Fusion Transformer demand forecasting system that predicts platform activity 7 days ahead by geo-cell and category.
What You Will Build
- PPO policy using RLlib (Ray) — multi-objective reward: quality + fairness + satisfaction
- Temporal Fusion Transformer for geo-cell demand forecasting at city scale
- Kafka time-series integration for real-time demand signals
- ClickHouse aggregations for demand-supply heatmaps
- Proactive alert system based on predicted demand shifts
What We Are Looking For
- 5+ years ML engineering with production deployments
- Reinforcement Learning hands-on: PPO, DQN, or policy gradient in production
- RLlib (Ray) or Stable Baselines preferred
- Time-series forecasting: TFT, Prophet, DeepAR, or LSTM in production
- Apache Kafka for ML feature pipelines
- Python, PyTorch, Kubeflow
Bonus: Multi-objective RL, geo-cell demand forecasting, ClickHouse real-time analytics
Apply
Subject: Senior ML Engineer RL Forecasting Application
Requirements
- 5+ years ML engineering with production deployments
- Hands‑on reinforcement learning experience (PPO, DQN, or policy gradient) in production
- Experience with RLlib (Ray) or Stable Baselines
- Time‑series forecasting experience (TFT, Prophet, DeepAR, or LSTM) in production
- Apache Kafka for ML feature pipelines
- Proficiency in Python, PyTorch, Kubeflow
Responsibilities
- Build PPO reinforcement learning policy using RLlib (Ray) with multi-objective reward (quality, fairness, satisfaction)
- Develop Temporal Fusion Transformer for geo‑cell demand forecasting at city scale (7‑day horizon)
- Integrate real‑time demand signals via Kafka time‑series pipelines
- Create ClickHouse aggregations for demand‑supply heatmaps
- Implement proactive alert system based on predicted demand shifts
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free