Senior Technology Site Reliability Engineer
Cooley LLP
About the role
Senior Technology Site Reliability Engineer
Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operations team.
Position Summary
The Senior Technology Site Reliability Engineer ("SRE") is responsible for ensuring the reliability, scalability, and performance of the firm's critical infrastructure and applications. The SRE blends software engineering with systems engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence. In addition to being technically advanced, the SRE will have a high degree of emotional intelligence and the ability to work as a team towards complex and layered objectives.
Responsibilities
- Monitor and maintain production systems to ensure high availability and performance
- Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements (SLA's), and error budgets
- Participate in on-call rotations and incident response, including root cause analysis and postmortems
- Develop and maintain infrastructure as code (IaC) using Terraform
- Automate deployment, scaling, and recovery processes to reduce manual intervention
- Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery
- Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.)
- Proactively identify and resolve system bottlenecks and reliability risks
- Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle
- Contribute to a culture of blameless post-mortems and continuous improvement
- Document procedures and share knowledge across teams
- All other duties as assigned or required
Skills and Experience
Required
- After orientation at Cooley LLP, exhibit proficiency in the Microsoft Office suite, iManage and other firm applications
- Ability to work extended and/or weekend hours, as required
- Ability to travel, as required
- 6+ years direct applicable experience (e.g. site reliability engineering or related field)
- Proficiency in Terraform and programming languages such as Python, Go, or Java
- Deep expertise in cloud platforms, particularly AWS, and container orchestration
- Strong background in distributed systems, performance tuning, and automation
- Hands‑on experience with configuration management tools such as Puppet, Chef, or Salt
Preferred
- Bachelor's Degree in Computer Science, Information Technology, Engineering, or associated discipline
- Experience working with advanced ETL data workflows including technologies such as AWS EMR, Azure Synapse, Azure Data Factory, or Apache Hive/Spark/Airflow
- Experience with IaC deployment of AKS/EKS/GKE architecture
- Experience with enterprise Data Lake environments using technologies such as DataBricks or Snowflake
Competencies
- Expert analytical/quantitative, problem‑solving, and deductive reasoning skills, experience performing advanced troubleshooting and root cause analysis of complex technical issues
- Excellent organizational, planning, and time management skills and ability to work independently and in a team environment to manage competing priorities and meet deadlines
- Advanced verbal and written communication skills with the ability to present findings, conclusions, alternatives, and information clearly and concisely
- Experience working with all levels of business professionals, management, stakeholders, and vendors with the ability to build effective relationships through trust and diplomacy
Compensation & Benefits
- Expected annual pay range for this full‑time position: $140,000 – $205,000 (final offer dependent on geographic location, applicable experience, and skillset)
- Competitive compensation and excellent benefits package
- Full range of elective benefits including medical, health savings account (with applicable medical plan), dental, vision, health and/or dependent care flexible spending accounts, pre‑tax commuter benefits, life insurance, AD&D, long‑term care coverage, backup care for children and/or adults, and other parental support benefits
- Firm‑paid life insurance, AD&D, LTD, short‑term medical benefits
- 21 days of Paid Time Off (PTO) and 10 paid holidays each year
- Generous parental leave and fertility benefits
- Detailed benefit orientation for new employees
Equal Opportunity Employer
Cooley offers a competitive compensation and excellent benefits package and is committed to fair and equitable employment practices. EOE.
Requirements
- Proficiency in the Microsoft Office suite, iManage and other firm applications
- Ability to work extended and/or weekend hours, as required
- Ability to travel, as required
- Proficiency in Terraform and programming languages such as Python, Go, or Java
- Deep expertise in cloud platforms, particularly AWS, and container orchestration
- Strong background in distributed systems, performance tuning, and automation
- Hands-on experience with configuration management tools such as Puppet, Chef, or Salt
Responsibilities
- Monitor and maintain production systems to ensure high availability and performance
- Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements (SLA's), and error budgets
- Participate in on-call rotations and incident response, including root cause analysis and postmortems
- Develop and maintain infrastructure as code (IaC) using Terraform
- Automate deployment, scaling, and recovery processes to reduce manual intervention
- Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery
- Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.)
- Proactively identify and resolve system bottlenecks and reliability risks
- Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle
- Contribute to a culture of blameless post-mortems and continuous improvement
- Document operational procedures and share knowledge across teams
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free