ML Researcher or Engineer, Agentic Evaluation Frameworks

Alberta Machine Intelligence Institute (Amii)

Edmonton · On-site Contract 3mo ago

About the role

Join us to build key agentic evaluation frameworks with a rising startup at the cutting edge of game AI. We are looking for an ML researcher or engineer with solid ML chops and demonstrated scientific rigor to build an efficient and robust system for evaluating agentic behavior across a variety of simulated worlds and environments. You’ll collaborate with domain experts, engineers and scientists to build and deploy a durable and robust system.

• Dave Staszak, Lead Machine Learning Scientist, Advanced Technology

About the Role

This is a paid residency that will be undertaken over a 12-month period with the potential to be hired by our client, Artificial Agency, afterwards (note: at the discretion of the client). The Resident will report to an Amii Scientist and regularly consult with the client team to share insights and engage in knowledge transfer activities. Successful candidates will be members of a cross-functional project team with backgrounds in ML research, project management, software engineering, and new product development. This is a rare opportunity to be mentored by world-class scientists and to develop something truly impactful.

About the Client

Artificial Agency is an AI company that created a behavior engine that empowers game developers to easily add generative behavior into any system within a game. Generative behavior enables developers to transform characters and other decision-making systems into individualized agents with perceptions, actions, goals, personalities, and inner lives. The company's vision is to unlock new interactive experiences and genres by enabling creatives to embed run-time intelligence into any aspect of their game.

About the Projects

This project focuses on building an evaluation framework for agentic systems, extending an evaluation initiative that is already underway at Artificial Agency.

The resident will contribute to and expand existing tooling to support structured evaluation across distinct layers of capability. Rather than treating agent quality as a single score or task outcome, the framework explicitly separates foundational competencies from higher-level behaviours and task performance.

In the first project, the core research question is to identify how we can systematically evaluate agent capabilities across multiple levels of abstraction, from foundational model competencies to complex task execution, in interactive game worlds with non-real-world semantics. In the second project, the core research question is to identify how agent behaviours degrade as model capacity, context, and compute budgets are reduced, and what failure modes emerge under real deployment constraints.

For both these projects, evaluation will be conducted through a combination of controlled simulation environments, targeted test scenarios, and behavioural unit tests that isolate specific agent capabilities in a repeatable way. The Amii resident will join an active engineering and ML team as an embedded, in-person team member, with weekly check-ins involving the resident, an Amii supervisor, and Artificial Agency staff. The work emphasizes empirical rigor, reproducibility, and close integration with production systems. The outputs are durable evaluation and analysis infrastructure that will continue to be used after the residency.

Required Skills / Expertise

Are you passionate about building great solutions? You’ll be presented with opportunities to both personally and professionally develop as you build your career. We’re looking for a talented and enthusiastic individual with a solid background in machine learning and statistics, with a demonstrated interest in agentic systems, sequential decision-making, or game AI.

Key Responsibilities:

Co-create an evaluation framework for agentic systems.
Design and build virtual testing environments, identify robust metrics, and reproducible test methodologies to evaluate agentic and sequential decision behaviors under a variety of environments.
Undertake applied research on ML and statistical techniques to address the limitations in existing models and approaches.
Optimize ML and evaluation pipelines to ensure efficiency and scalability processing capabilities.
Collaborate with the project team and stakeholders to develop MVP and client focused solutions.
Embed with the Artificial Agency team, participating in team meetings, and sprints.

Required Qualifications:

Completion of a Computer Science (or a related scientific/engineering graduate degree program) MSc. or PhD.
Proficient in Python programming language and related ML frameworks, libraries, and toolkits.
You’ve dabbled with agentic frameworks (n8n, open-claw)
You’re comfortable storing, manipulating and analyzing data (pandas, matplotlib, chart.js)
You’ve hosted a foundation model before (Ollama, VLLM)
You’ve used python-based training ML stacks (Pytorch, Transformers, Huggingface)
Solid understanding of classical statistics and its application in experiment design and model validation.
Familiarity with Linux, Git version control, and writing clean code.
A positive attitude towards learning and understanding a new applied domain.
Must be legally eligible to work in Canada.

Preferred Experiences:

Familiarity with and hands-on experience with unstructured and structured data, including managing imperfect signals buried within complex systems.
Publication record in peer-reviewed academic conferences or relevant journals in machine learning.
Experience/familiarity with software engineering best practices.
Experience with deploying machine learning models in production environments or strong software engineering (or MLE) skills is a strong plus.
Comfortable designing, running, and analyzing experiments.

Non-Technical Requirements:

An ability to turn data into knowledge and communicate insights with people of varying backgrounds.
Interdisciplinary team player enthusiastic about working together to achieve excellence.
Capable of critical and independent thought.
Able to communicate technical concepts clearly and advise on the application of machine intelligence.
Intellectual curiosity and the desire to learn new things, techniques, and technologies.
Able to operate inside a fast-moving startup environment without sacrificing rigor.

Why You Should Apply

Besides gaining industry experience, additional perks include:

Work under the mentorship of an Amii Scientist for the duration of the project.
Participate in professional development activities.
Gain access to the Amii community and events.
Get paid for your work (a fair and equitable rate of pay will be negotiated at the time of offer.)
Build your professional network.
The opportunity for an ongoing machine learning role at the client’s organization at the end of the term (at the client’s discretion.)

About Amii

One of Canada’s three main institutes for artificial intelligence (AI) and machine learning, our world-renowned researchers drive fundamental and applied research at the University of Alberta (and other academic institutions), training some of the world’s top scientific talent. Our cross-functional teams work collaboratively with Alberta-based businesses and organizations to build AI capacity and translate scientific advancement into industry adoption and economic impact.

Skills

Chart.jsGitHuggingfaceLinuxMatplotlibn8nOllamaOpen-clawPandasPytorchPythonTransformersVLLM

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free