Skip to content
mimi

Senior Technology Site Reliability Engineer

Cooley LLP

US · On-site Full-time Senior $140k – $205k/yr 1w ago

About the role

Senior Technology Site Reliability Engineer

Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operations team.

Position Summary

The Senior Technology Site Reliability Engineer ("SRE") is responsible for ensuring the reliability, scalability, and performance of the firm's critical infrastructure and applications. The SRE blends software engineering with systems engineering to build and maintain automated, resilient, and observable systems that support high availability and operational excellence. In addition to being technically advanced, the SRE will have a high degree of emotional intelligence and the ability to work as a team towards complex and layered objectives.

Responsibilities

  • Monitor and maintain production systems to ensure high availability and performance
  • Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements (SLA's), and error budgets
  • Participate in on-call rotations and incident response, including root cause analysis and postmortems
  • Develop and maintain infrastructure as code (IaC) using Terraform
  • Automate deployment, scaling, and recovery processes to reduce manual intervention
  • Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery
  • Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.)
  • Proactively identify and resolve system bottlenecks and reliability risks
  • Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle
  • Contribute to a culture of blameless post-mortems and continuous improvement
  • Document procedures and share knowledge across teams
  • All other duties as assigned or required

Skills and Experience

Required

  • After orientation at Cooley LLP, exhibit proficiency in the Microsoft Office suite, iManage and other firm applications
  • Ability to work extended and/or weekend hours, as required
  • Ability to travel, as required
  • 6+ years direct applicable experience (e.g. site reliability engineering or related field)
  • Proficiency in Terraform and programming languages such as Python, Go, or Java
  • Deep expertise in cloud platforms, particularly AWS, and container orchestration
  • Strong background in distributed systems, performance tuning, and automation
  • Hands‑on experience with configuration management tools such as Puppet, Chef, or Salt

Preferred

  • Bachelor's Degree in Computer Science, Information Technology, Engineering, or associated discipline
  • Experience working with advanced ETL data workflows including technologies such as AWS EMR, Azure Synapse, Azure Data Factory, or Apache Hive/Spark/Airflow
  • Experience with IaC deployment of AKS/EKS/GKE architecture
  • Experience with enterprise Data Lake environments using technologies such as DataBricks or Snowflake

Competencies

  • Expert analytical/quantitative, problem‑solving, and deductive reasoning skills, experience performing advanced troubleshooting and root cause analysis of complex technical issues
  • Excellent organizational, planning, and time management skills and ability to work independently and in a team environment to manage competing priorities and meet deadlines
  • Advanced verbal and written communication skills with the ability to present findings, conclusions, alternatives, and information clearly and concisely
  • Experience working with all levels of business professionals, management, stakeholders, and vendors with the ability to build effective relationships through trust and diplomacy

Compensation & Benefits

  • Expected annual pay range for this full‑time position: $140,000 – $205,000 (final offer dependent on geographic location, applicable experience, and skillset)
  • Competitive compensation and excellent benefits package
  • Full range of elective benefits including medical, health savings account (with applicable medical plan), dental, vision, health and/or dependent care flexible spending accounts, pre‑tax commuter benefits, life insurance, AD&D, long‑term care coverage, backup care for children and/or adults, and other parental support benefits
  • Firm‑paid life insurance, AD&D, LTD, short‑term medical benefits
  • 21 days of Paid Time Off (PTO) and 10 paid holidays each year
  • Generous parental leave and fertility benefits
  • Detailed benefit orientation for new employees

Equal Opportunity Employer

Cooley offers a competitive compensation and excellent benefits package and is committed to fair and equitable employment practices. EOE.

Requirements

  • Proficiency in the Microsoft Office suite, iManage and other firm applications
  • Ability to work extended and/or weekend hours, as required
  • Ability to travel, as required
  • Proficiency in Terraform and programming languages such as Python, Go, or Java
  • Deep expertise in cloud platforms, particularly AWS, and container orchestration
  • Strong background in distributed systems, performance tuning, and automation
  • Hands-on experience with configuration management tools such as Puppet, Chef, or Salt

Responsibilities

  • Monitor and maintain production systems to ensure high availability and performance
  • Implement and manage service-level indicators (SLIs), objectives (SLO's), agreements (SLA's), and error budgets
  • Participate in on-call rotations and incident response, including root cause analysis and postmortems
  • Develop and maintain infrastructure as code (IaC) using Terraform
  • Automate deployment, scaling, and recovery processes to reduce manual intervention
  • Partner with DevOps to build and maintain CI/CD pipelines to support safe and efficient software delivery
  • Implement observability solutions using metrics, logs, traces, and alerting systems (Prometheus, Grafana, DataDog, etc.)
  • Proactively identify and resolve system bottlenecks and reliability risks
  • Work closely with Infrastructure, DevOps, Development, and security teams to embed reliability into the development lifecycle
  • Contribute to a culture of blameless post-mortems and continuous improvement
  • Document operational procedures and share knowledge across teams

Benefits

medicalhealth savings accountdental insurancevision insurancehealth flexible spending accountdependent care flexible spending accountpre-tax commuter benefitslife insuranceAD&Dlong-term care coveragebackup care for childrenbackup care for adultsparental support benefitsLTDshort term medical benefitsPaid Time Offpaid holidaysparental leavefertility benefits

Skills

AWSAWS EMRAD&DApache AirflowApache SparkAzure Data FactoryAzure SynapseChefCI/CDDataBricksDockerETLGoGrafanaGKEHiveIaCiManageJavaKubernetesLTDMicrosoft OfficePrometheusPuppetPythonSaltSite Reliability EngineeringSnowflakeTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free