Site Reliability Engineer

Cintrifuse

Springfield · On-site Full-time Senior $100k – $125k/yr 1mo ago

About the role

About Coterie

Through a partnership-based approach, Coterie helps insurance professionals unlock untapped revenue in the small commercial space. With an innovative quoting platform that delivers accurate pricing and bindable quotes in less than one minute, Coterie makes small business insurance effortless.

We are on a mission to build and foster a world-class team to bring speed, simplicity, and service to commercial insurance. We value integrity, humility, passion, and intelligence. If you want to push yourself and reshape a $200B+ market, we’re excited to talk to you!

What will the Site Reliability Engineer do?

We're looking for a Site Reliability Engineer who's passionate about building and maintaining reliable, scalable infrastructure and who thrives on making systems better every day. In this role, you'll join our SRE team to help keep our platforms running smoothly, improve our observability and incident response capabilities, and partner with development teams to deliver infrastructure that supports high-quality, reliable software.

You’ll play a key role in managing our cloud infrastructure, strengthening our CI/CD pipelines, and helping us get the most out of our monitoring and alerting tools, particularly Grafana. This is a great opportunity for a mid-level engineer ready to take ownership of meaningful infrastructure challenges.

Responsibilities

Manage and maintain cloud infrastructure on Azure, including Azure Kubernetes Service (AKS) clusters and supporting resources
Build, improve, and maintain CI/CD pipelines using Git Hub Actions to support reliable and repeatable deployments
Own and enhance our Grafana implementation; designing dashboards, configuring alerts, and supporting incident management workflows
Monitor system health, triage incidents, and drive root cause analysis to prevent recurrence
Collaborate with development teams to define and track SLIs, SLOs, and error budgets that align with business goals
Contribute to infrastructure-as-code practices using Pulumi
Identify and resolve reliability risks through capacity planning, performance tuning, and proactive system improvements
Participate in an on-call rotation to support production systems and respond to incidents
Document runbooks, operational procedures, and architectural decisions to support team knowledge sharing

What we are looking for:

3+ years of experience in a Site Reliability Engineering, Dev Ops, or Infrastructure role
Strong hands‑on experience with:
- Azure Cloud services and resource management
- Kubernetes and AKS administration, including deployments, networking, and troubleshooting
- Git Hub Actions for CI/CD pipeline development and maintenance
Solid experience with Grafana, including dashboard creation, alerting configuration, and incident management
Hands‑on experience with Prometheus, Loki, or other observability tools in the Grafana ecosystem
Proficiency in at least one scripting or programming language such as Python or Bash
Understanding of networking fundamentals, DNS, load balancing, and container orchestration concepts
Strong analytical and communication skills; able to diagnose complex system issues and clearly communicate findings
Demonstrated ability to collaborate across teams and contribute to a culture of reliability
Experience working in an agile environment with modern Dev Ops practices

What will make you stand out:

Experience working at a startup or in a fast‑paced, cross‑functional environment
Familiarity with the insurance industry or other regulated sectors
Experience with infrastructure-as-code tools such as Terraform or Pulumi
Familiarity with service mesh technologies (e.g., Istio)

Our interview process:

Our hiring process generally consists of 4 phases. The goal is to provide an opportunity for us to learn more about our candidates while allowing them to get to know us as well!

Phase 1

Qualified candidates will first meet with a member of our People Operations team for a phone interview. This discussion is a high‑level conversation to understand more about your background and interests and for us to share more about Coterie and the position.
Phase 2

Selected candidates will be invited to meet with our Hiring Manager for a 2nd…

Skills

AKSAzureBashCI/CDDockerGrafanaGit Hub ActionsInfrastructure as CodeIstioKubernetesLokiPrometheusPulumiPythonTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer

About the role

About Coterie

What will the Site Reliability Engineer do?

Responsibilities

What we are looking for:

What will make you stand out:

Our interview process:

Skills

Similar roles

MCP Engineer / AI Backend Engineer

Senior Database Engineer

Team Leads

Don't send a generic resume