Software Engineer | AI Evaluation & Benchmarking
Baaraku
About the role
The Mission
We are seeking a versatile developer to help us push the boundaries of LLM capabilities. In this role, you won't just be writing code; you will be designing the "Gold Standard" benchmarks used to evaluate how AI models reason, execute, and solve problems in real-world technical environments.
Your Responsibilities • Architect Multi-Step Tasks: Design and build complex, high-stakes technical scenarios that models must navigate within a CLI. • Dockerized Orchestration: Create isolated, reproducible Docker environments for task execution, ensuring security and consistency across benchmarks. • Implement Reference Solutions: Write "ground truth" solutions in multiple languages to serve as the benchmark for success. • Automated Verification: Develop robust testing suites to programmatically verify if the model’s output is functionally correct and efficient.
Technical Requirements • Polyglot Programming: Fluency in at least two major languages (e.g., Python, C++, Java, or JavaScript). You should be comfortable switching contexts and idioms. • Power User of the CLI: Deep familiarity with command-line environments, shell scripting, and system-level operations. • Containerization Pro: Strong experience with Docker—specifically building images, managing volumes, and securing runtime environments. • Testing Mindset: Experience writing unit and integration tests (Pytest, JUnit, Mocha, etc.) to validate complex logic.
Bonus Points • Experience with LLM evaluation frameworks (e.g., OpenCompass, HELM). • Background in Competitive Programming or technical curriculum design. • Strong understanding of CI/CD pipelines.
Job Type: Full-time
Pay: From ₦1,000,000.00 per month
Experience: • JavaScript: 4 years (Required) • C++: 4 years (Required) • Python: 4 years (Required) • Docker: 4 years (Required)
Work Location: Remote
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free