Staff ML Platform Engineer
Mistplay
About the role
Mistplay is the #1 loyalty app for mobile gamers. Our community of millions of engaged mobile gamers come to Mistplay to discover new games to play and earn rewards. Gamers are rewarded for their time and money spent within the games and can redeem those rewards for gift cards. Mistplay is on a mission to be the best way to play mobile games for everyone everywhere Download Mistplay on the Google Play Store here and follow us on Instagram, Twitter and Facebook.
📍 Please Note: In Canada 🇨🇦, Mistplay follows a 2 days/week in-office hybrid model in Toronto (400 University Ave) & Montreal (1001 Blvd. Robert-Bourassa)
Reporting to the VP of Data and Machine Learning Platform, the Staff ML Platform Engineer within Mistplay’s Data Team will play a key role in researching and developing machine learning solutions to solve complex business problems. The Staff ML Platform Engineer will work closely with a cross-functional team to identify areas for improvement and design and implement scalable solutions. Relevant experience can range from working infrastructure and software to support machine learning applications on a wide variety of online recommendation systems, reinforcement learning systems or other online machine learning applications.
What you’ll do:
Be the main driver and expert for designing, building, and operating:
- Machine and data infrastructure solutions for training models
- Real-time inference systems to operate and serve models in a real time production environment.
- High usability and accuracy feature platform capabilities for generating, backfilling and storing user level features.
- High accuracy low latency feature serving layer and preprocessing solutions to support online serving of the models
- Build platform abstractions and golden paths: Airflow DAG templates, CLI/SDKs, cookie-cutter repos, and CI/CD pipelines that take models from notebooks to production predictably.
- Implement end-to-end observability: data/feature freshness checks, drift/quality gates, model performance/latency SLOs, infra health dashboards, tracing, and alerting—plus incident response and postmortems.
- Partner with Security, SRE, and Data Engineering on private networking, policy-as-code, PII handling, least-privilege IAM, and cost-efficient architectures across environments.
- Evaluate, integrate, and rationalize platform tooling (e.g., MLflow registry, feature stores, serving gateways); lead migrations with clear change management and minimal downtime.
What you’ll bring:
- 10+ years building and operating production-grade ML/data platforms with a focus on serving, reliability, and developer experience.
- Strong software engineering in Python, Go, or Java; experience building resilient services, APIs, and automation tooling with high test coverage.
- Deep experience with inference solutions: endpoint configuration, containerization, model packaging, autoscaling, serverless vs. real-time trade-offs, MME, A/B and canary releases.
- Expertise with online feature store paradigms and underlying storage solutions in ML serving contexts.
- Proven Terraform experience managing ML and data infra end-to-end: modules, workspaces, drift detection, change reviews, and safe rollbacks; familiarity with GitOps patterns.
- Airflow orchestration at scale: dependency modeling, sensors, retries, SLAs, backfills, DAG factories, and integrations with registries, artifact stores, and Terraform pipelines.
- Familiarity with ML frameworks (scikit-learn, XGBoost, PyTorch, TensorFlow) from a platform-integration perspective to support diverse runtimes and containers.
- Observability for ML Workflows: metrics/logs/traces, performance profiling, capacity planning, cost monitoring, and runbooks.
- Excellent communication and cross-functional collaboration with Data Science, Data Engineering, DevOps and Backend.
Why Mistplay?
We strive to make our work environment as inviting and fun as possible Working at Mistplay is coupled with a whole array of perks that we've adopted virtually and in-person:
- Team Lunches
- game nights
- company-wide events
Our culture is deeply rooted in growth and upheld by a team of smart, dynamic, and enthusiastic people. We utilize data to constantly learn, improve, and adapt. We foster an environment where everyone is encouraged to share their ideas, push boundaries, take calculated risks, and witness their visions come to life.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free