Senior Site Reliability Engineer

Brainhunter Systems Ltd

Kitchener · Hybrid Full-time Senior 3mo ago

About the role

Senior Site Reliability Engineer (SRE)

Seeking a Senior SRE Consultant who can own reliability, performance, own infrastructure across cloud, off/on-prem infrastructure; design resilient systems, automate deployments, and ensure performance at scale. Must be hands-on with off/on-prem infrastructure. We’re building next-gen sweepstakes gaming experiences that are fast, reliable, and highly scalable. As a Senior SRE, you’ll own the infrastructure that powers everything—primarily across on-prem and hybrid environments—ensuring our systems are resilient, performant, and built to scale.

Key Accountabilities:

Design, build, and operate on-prem and hybrid infrastructure, with potential integration into cloud environments over time
Architect and maintain highly available, resilient systems for real-time, high-traffic gaming workloads
Automate deployments, infrastructure provisioning, and operational workflows (CI/CD, IaC where applicable)
Monitor system performance, uptime, and reliability—proactively identifying and resolving issues
Implement observability best practices (logging, metrics, tracing, alerting)
Improve system resilience through redundancy, failover strategies, and disaster recovery planning
Partner closely with backend and platform teams to optimize system performance and reliability
Own incident response, postmortems, and continuous improvement of system stability

Qualifications and Skillset for this Role:

Strong experience in Site Reliability Engineering, DevOps, or infrastructure engineering, with a focus on on-prem or hybrid environments
Deep understanding of physical infrastructure, networking, and distributed systems
Experience managing servers, virtualization, and data center environments
Hands-on experience with automation, scripting, and deployment workflows
Strong troubleshooting skills across systems, networking, and performance bottlenecks
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
Solid understanding of security, redundancy, and system design for uptime and resilience
Comfortable working autonomously in a fast-paced startup environment
Exposure to cloud platforms (AWS, GCP, Azure) and hybrid infrastructure models
Experience with containers and orchestration (Docker, Kubernetes)
Familiarity with backend systems (Node.js / TypeScript environments)
Experience supporting real-time or high-concurrency systems (gaming, fintech, etc.)

Why This Role:

Own and shape the core infrastructure of a rapidly scaling platform
High ownership: define how systems are built, deployed, and operated
Work on real infrastructure challenges beyond just cloud abstractions
Startup speed: ship fast and see impact immediately
Fully remote, flexible environment

How to Apply:

Please email your up-to-date Resume/CV to

We appreciate all the applicants for their interest in working with us; however, only those candidates shortlisted for the next steps in the hiring process will be contacted.

Thank you, and have a wonderful day!

Skills

AWSAzureCI/CDDatadogDockerGCPGrafanaIaCKubernetesNode.jsPrometheusTypeScript

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Senior Site Reliability Engineer

About the role

Senior Site Reliability Engineer (SRE)

Key Accountabilities:

Qualifications and Skillset for this Role:

Why This Role:

How to Apply:

Skills

Similar roles

backend developer

Fullstack Software Architect / Lead Engineer

Java Backend Engineer (all gender)

Don't send a generic resume