A
Engineering Manager
Affirm
France · flexible Full-time Lead 1w ago
About the role
About
We are seeking a seasoned Engineering Manager to lead our Resilience Engineering team. This role is critical in ensuring the safety and reliability of our production systems through proactive validation techniques, including production load testing and chaos engineering. You will lead the development of systems and practices that allow engineers to safely test system behavior under stress and failure conditions in production, ensuring issues are discovered and mitigated before they impact real users.
Responsibilities
- Define and drive the vision for resilience engineering at Affirm, with a focus on production load testing and chaos engineering as first-class engineering practices
- Lead and mentor a team of engineers building platforms and tooling for safe production experimentation
- Partner with infrastructure, product, and security leadership to embed resilience validation into the software development lifecycle
- Establish best practices for safely testing system limits and failure scenarios in production
- Own the design and evolution of platforms that enable safe, controlled production load testing and fault injection
- Ensure strong safeguards are in place, including isolation boundaries, approval workflows, and automated rollback mechanisms to protect real users
- Build systems that provide end-to-end observability, traceability, and auditability for all resilience experiments
- Drive reliability improvements by systematically identifying weaknesses through load testing and chaos experiments
- Establish monitoring, alerting, and incident response practices tailored to proactive resilience validation
- Work closely with engineering teams to design and execute production load tests and chaos experiments safely
- Partner with infrastructure teams to build guardrails around tests and experimentations
- Enable teams to adopt resilience practices by providing reusable tooling, frameworks, and standardized workflows
- Identify systemic weaknesses and lead cross-functional efforts to improve reliability and fault tolerance
- Evangelize a culture of “test failure before failure tests you” across the organization
Benefits
- Compensation: We have a simple, flexible, and transparent remote-first compensation structure so you can make the best decisions for yourself and your family
- Spending Wallets: Access tech, food, lifestyle, and family planning wallets for your expenses
- Supportive Communities: Get involved with our employee resource groups and community groups
- Remote-first Workforce: If your role is remote, you can set up shop anywhere in your home country
- Generous Time Off: Take the time you need when life happens
- Health Benefits: Get a plan that fits your needs
- Mental Healthcare: Take care of your mind with great mental health programs
- Parental Leave: Birth and non-birth parents get 18 weeks’ paid leave. Plus, a 4-week return-to-work transition program, at full base pay
- Away Days: We offer 20 company-wide paid days off—which help our teams collectively pause to recharge
- Learning & Development: Engage in exciting learning programs to level up your growth
Skills
chaos engineeringload testing
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free