Site Reliability Engineer - Full Remote (EU only)
Jobtome
About the role
About the company
At Jobtome - we are building a modern, cloud-native recruitment and marketing platform used at scale across multiple countries and brands.
Our systems power high-traffic job distribution, integrations with external partners, and real-time data pipelines, with a strong focus on reliability, observability, and automation.
Engineering is a core function of the company: we value ownership, pragmatic decision-making, and long-term technical excellence over short-term fixes.
The role
As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our production systems.
You will work closely with Backend, Frontend, and Product teams to:
- design resilient architectures
- define reliability standards
- improve observability and incident response
- reduce operational toil through automation
This is not a pure ops role: you will contribute to codebases, collaborate on system design, and help evolve our engineering culture toward SRE best practices.
What you will do
- Design, implement, and maintain reliable and scalable cloud infrastructure
- Define and evolve SLIs, SLOs, and error budgets
- Improve monitoring, alerting, and observability across services
- Lead and participate in incident response, post-mortems, and root-cause analysis
- Automate repetitive operational tasks to reduce toil
- Collaborate with Backend engineers on service design, scalability, and failure modes
- Improve CI/CD pipelines, deployment strategies, and release safety
- Contribute to infrastructure as code and platform tooling
- Act as a reliability advocate across the engineering organization
Tech stack
- Cloud: Google Cloud Platform (preferred), AWS
- Containers & orchestration: Docker, Kubernetes (GKE)
- Infrastructure as Code: Terraform
- CI/CD: GitLab CI/CD
- Observability: Cloud Monitoring, Logging, Prometheus, Grafana
- Languages: Go, Python, Bash
- Networking & security: IAM, VPCs, service accounts, secrets management
What we expect from a senior SRE
- Strong experience running production systems at scale
- Solid understanding of distributed systems and failure modes
- Proven experience with SLO-driven reliability
- Strong coding skills
- Cloud infrastructure automation experience
- Ability to debug complex cross-system issues
- Ownership mindset and strong communication skills
- Pragmatic approach to reliability, speed, and cost trade-offs
Working model
- Flexible working hours
- Remote-friendly setup
- Small autonomous teams
- Direct collaboration with product and leadership
Requirements
- Strong experience running production systems at scale
- Solid understanding of distributed systems and failure modes
- Proven experience with SLO-driven reliability
- Strong coding skills
- Cloud infrastructure automation experience
- Ability to debug complex cross-system issues
- Ownership mindset and strong communication skills
- Pragmatic approach to reliability, speed, and cost trade-offs
Responsibilities
- Design, implement, and maintain reliable and scalable cloud infrastructure
- Define and evolve SLIs, SLOs, and error budgets
- Improve monitoring, alerting, and observability across services
- Lead and participate in incident response, post-mortems, and root-cause analysis
- Automate repetitive operational tasks to reduce toil
- Collaborate with Backend engineers on service design, scalability, and failure modes
- Improve CI/CD pipelines, deployment strategies, and release safety
- Contribute to infrastructure as code and platform tooling
- Act as a reliability advocate across the engineering organization
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free