All jobs

Senior Site Reliability Engineer

Thinkific

Mira Road · On-site Full-time Senior $111k – $167k/yr 2mo ago

Apply with a tailored resume Save job

About the role

About

Join to apply for the Senior Site Reliability Engineer role at Thinkific

Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific.

We’re looking for a Senior Site Reliability Engineer (SRE) to join us this role, you’ll take ownership of scaling, securing, and optimizing the infrastructure that powers thousands of online course creators around the world.

In this role, you will improve the performance, reliability, and security of our platform by partnering with cross-functional teams, driving SRE best practices, and ensuring operational excellence. As a senior member of the team, you’ll mentor others, lead key reliability initiatives and be a hands-on contributor in infrastructure projects, and act as a domain expert in modern cloud-native practices, with a specific emphasis on Kubernetes, cloud infrastructure (AWS), observability, and service reliability.

Your goal will be to help guide and execute on projects related to your technical domain. Here’s how you’ll accomplish this:

Responsibilities

Own and improve technical domains across our infrastructure, ensuring high standards of system reliability, performance, scalability, and security.
Design and implement scalable infrastructure using Kubernetes, AWS services (EKS, RDS, S3, IAM, ALB, etc.), and Infrastructure-as-Code tools like Terraform and Helm.
Enhance and maintain deployment pipelines, enabling teams to release with speed, confidence, and security.
Participate in and lead incident response efforts, ensuring blameless postmortems and continuous learning.
Collaborate with development teams to define SLOs, SLIs, and error budgets, promoting reliability-focused design from the start.
Automate operational tasks and improve developer experience with scripts and tools written in Ruby, Python, Node.js, or Bash.
Maintain observability using tools such as Datadog, New Relic, Prometheus, Grafana, and Sentry—ensuring monitoring and alerting align with meaningful SLOs.
Support and optimize distributed systems including relational and non-relational databases, message queues, and asynchronous architectures.
Mentor and coach other engineers, helping raise the technical bar and foster a culture of collaboration and operational excellence.
Participate in on-call rotation to help maintain a high level of service reliability.

Qualifications

The person we have in mind likely:

Has 5+ years of software or infrastructure engineering experience, with 3+ years in Site Reliability or Dev Ops-focused roles
Has strong experience operating Kubernetes in production environments
Proven AWS experience with infrastructure and services such as EKS, RDS, S3, IAM, and ALB
Proficiency with Infrastructure-as-Code (Terraform, Helm) and automation tools.
Strong scripting/coding ability in Ruby, Python, Node.js, or Bash
Experience with monitoring and observability tools (New Relic, Datadog, Prometheus, Grafana, etc.)
Solid understanding of networking, TLS, encryption protocols, and distributed systems.
Experience improving CI/CD pipelines and secure software supply chains
Familiarity with Cloudflare, CDN configuration, and load balancing strategies.
Strong problem-solving skills, ownership mentality, and the ability to thrive in a fast-paced environment
Enjoys collaborating across teams and helping shape engineering roadmaps and architectural direction
Brings a strong ownership mentality, cares deeply about developer experience and operational excellence, and thrives in a fast-paced environment

Nice to Have

These things would also be nice, but we think you could learn them on the job:

Experience working with Ruby on Rails and/or Node.js applications in production
Familiarity with Cloudflare, load balancing strategies, and CDN configuration
Experience improving CI/CD pipelines and secure software supply chains
CKA certification or equivalent Kubernetes expertise.

Compensation

We’re committed to fair and transparent pay that reflects both where you are and where you can grow to. This role has a salary range of $111,100 – $138,900 – $166,700 in Canada, designed to capture the full journey from…

Skills

AWSBashCDNCloudflareDatadogEKSGrafanaHelmIAMKubernetesNode.jsNew RelicPrometheusPythonRDSRubyS3SentryTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Senior Site Reliability Engineer

About the role

About

Responsibilities

Qualifications

Nice to Have

Compensation

Skills

Similar roles

Java Backend Engineer (all gender)

Backend Engineer (Bangalore)

Senior Sales Engineer

Don't send a generic resume