Software Engineer – AI Testing Analyst
Alignerr
About the role
About The Role
We're seeking experienced software engineers in Toronto to evaluate and improve the performance of frontier AI models. You'll critically assess AI-generated code, identify subtle bugs and hallucinations, and provide expert-level feedback. • Organization: Alignerr • Type: Hourly Contract • Compensation: $50–$100 /hour • Location: Remote • Commitment: 10–40 hours/week
What You'll Do • Evaluate the performance of frontier language models on complex software engineering tasks • Identify bugs, logical errors, hallucinations, and reliability issues in model outputs • Design and review prompts, test cases, and evaluation scenarios for advanced coding workflows • Provide precise written feedback explaining model strengths, weaknesses, and edge cases • Work across multiple languages and codebases to assess generalization and correctness
Who You Are • 3–4+ years of professional software engineering experience • Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++ • Excellent written and spoken English • Demonstrated ability to reason about complex systems and debug non-obvious issues • Familiarity with modern AI / LLM tooling (Git, CLI workflows, testing frameworks, etc.) • Ability to critically evaluate model behavior rather than simply use model outputs
Why Join Us • Competitive pay and flexible remote work • Work on cutting-edge AI projects with top research labs • Freelance perks: autonomy, flexibility, and global collaboration • Potential for ongoing work and contract extension
Application Process (Takes 10–15 min) • Submit your resume • Complete a short screening • Project matching and onboarding
PS: Our team reviews applications daily. Please complete your application steps to be considered for this opportunity.
Requirements
- 3-4+ years of professional software engineering experience
- Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++
- Excellent written and spoken English
- Demonstrated ability to reason about complex systems and debug non-obvious issues
- Familiarity with modern AI / LLM tooling (Git, CLI workflows, testing frameworks, etc.)
Responsibilities
- Evaluate the performance of frontier language models on complex software engineering tasks
- Identify bugs, logical errors, hallucinations, and reliability issues in model outputs
- Design and review prompts, test cases, and evaluation scenarios for advanced coding workflows
- Provide precise written feedback explaining model strengths, weaknesses, and edge cases
- Work across multiple languages and codebases to assess generalization and correctness
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free