Software Engineer | AI Evaluation & Benchmarking

Baaraku

Nigeria · On-site Full-time 2mo ago

About the role

The Mission

We are seeking a versatile developer to help us push the boundaries of LLM capabilities. In this role, you won't just be writing code; you will be designing the "Gold Standard" benchmarks used to evaluate how AI models reason, execute, and solve problems in real-world technical environments.

Your Responsibilities • Architect Multi-Step Tasks: Design and build complex, high-stakes technical scenarios that models must navigate within a CLI. • Dockerized Orchestration: Create isolated, reproducible Docker environments for task execution, ensuring security and consistency across benchmarks. • Implement Reference Solutions: Write "ground truth" solutions in multiple languages to serve as the benchmark for success. • Automated Verification: Develop robust testing suites to programmatically verify if the model’s output is functionally correct and efficient.

Technical Requirements • Polyglot Programming: Fluency in at least two major languages (e.g., Python, C++, Java, or JavaScript). You should be comfortable switching contexts and idioms. • Power User of the CLI: Deep familiarity with command-line environments, shell scripting, and system-level operations. • Containerization Pro: Strong experience with Docker—specifically building images, managing volumes, and securing runtime environments. • Testing Mindset: Experience writing unit and integration tests (Pytest, JUnit, Mocha, etc.) to validate complex logic.

Bonus Points • Experience with LLM evaluation frameworks (e.g., OpenCompass, HELM). • Background in Competitive Programming or technical curriculum design. • Strong understanding of CI/CD pipelines.

Job Type: Full-time

Pay: From ₦1,000,000.00 per month

Experience: • JavaScript: 4 years (Required) • C++: 4 years (Required) • Python: 4 years (Required) • Docker: 4 years (Required)

Work Location: Remote

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Software Engineer | AI Evaluation & Benchmarking

About the role

Similar roles

Accountant Trainee

Data Scientist/Engineer

Principal Information Security Systems Engineer (ISSE)

Don't send a generic resume