Senior Software Engineer – LLM Evaluation
Turing
About the role
About Us:
At Turing, we are at the forefront of AI research, serving as a leading research accelerator for pioneering AI labs and supporting global enterprises in deploying advanced AI systems. Our mission centers around two core functions: accelerating leading-edge research through high-quality data and exceptional training pipelines, and transforming AI from mere concepts into effective proprietary intelligence that yields measurable results on the P&L.
Who We're Looking For:
This opportunity is perfect for seasoned engineers who have successfully developed production systems within renowned companies such as Google, Microsoft, Apple, Amazon, Meta, or other high-scale engineering organizations. We also appreciate candidates from prestigious educational institutions like Harvard, Columbia, Princeton, Yale, and University of Pennsylvania, though we value exceptional experience and skills above all.
Project Overview:
In this role, you will work as a Software Engineering evaluator, tasked with creating innovative datasets for training, benchmarking, and advancing large language models. Your work will involve curating code examples, delivering precise solutions, and performing corrections on full stack environments. This includes backend development in Python and frontend work using JavaScript (React, Node.js), as well as contributions in C/C++, Java, Rust, and Go. You will also evaluate and optimize AI-generated code for effectiveness and reliability, collaborating with teams to improve enterprise-level AI coding solutions.
Expectations for a Typical Day:
- Contribute to AI model training initiatives by curating code, developing solutions, and correcting code in Python and JavaScript (React, Node.js), while also engaging with C/C++, Java, Rust, and Go.
- Assess and enhance AI-generated code across various contexts to maintain efficiency, scalability, and reliability.
- Work alongside cross-functional teams to refine AI-driven coding solutions against industry benchmarks.
- Develop agents capable of verifying code quality and identifying error patterns within full-stack applications.
- Propose hypotheses regarding phases of the software engineering cycle (like prototyping, architecture design, API design, etc.) and evaluate model effectiveness throughout these stages.
- Design verification frameworks that can autonomously validate solutions to software engineering challenges.
Essential Skills:
- Minimum 3 years of software engineering experience.
- Strong proficiency in building full-stack applications utilizing Python and JavaScript (React, Node.js), with expertise in both backend and frontend development.
- Experience in deploying robust, scalable software utilizing modern programming languages and tools.
- Thorough understanding of software architecture, design, development, debugging, and code quality assessment.
- Exceptional verbal and written communication skills necessary for clear, concise evaluation rationales.
Engagement Details:
- Commitment: Flexible engagement, minimum of 10 hours/week, up to 40 hours/week.
- Type: Contractor (no medical/paid leave).
- Duration: 1 month (with potential for extensions based on performance).
- Location: Candidates must be based in the United States.
Evaluation Process:
- Application process duration: 15-30 minutes.
- Completion of an AI video interview is mandatory.
Note: An AI video interview will be part of the assessment process.
After applying, you will receive an email with a login link. Please use this link to access the portal and complete your profile.
Know extraordinary talent? Refer them for potential rewards.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free