Skip to content
mimi

Senior Site Reliability Engineer

Brainhunter Systems Ltd

Kitchener · Hybrid Full-time Senior 4d ago

About the role

Senior Site Reliability Engineer (SRE)

Seeking a Senior SRE Consultant who can own reliability, performance, own infrastructure across cloud, off/on-prem infrastructure; design resilient systems, automate deployments, and ensure performance at scale. Must be hands-on with off/on-prem infrastructure. We’re building next-gen sweepstakes gaming experiences that are fast, reliable, and highly scalable. As a Senior SRE, you’ll own the infrastructure that powers everything—primarily across on-prem and hybrid environments—ensuring our systems are resilient, performant, and built to scale.

Key Accountabilities:

  • Design, build, and operate on-prem and hybrid infrastructure, with potential integration into cloud environments over time
  • Architect and maintain highly available, resilient systems for real-time, high-traffic gaming workloads
  • Automate deployments, infrastructure provisioning, and operational workflows (CI/CD, IaC where applicable)
  • Monitor system performance, uptime, and reliability—proactively identifying and resolving issues
  • Implement observability best practices (logging, metrics, tracing, alerting)
  • Improve system resilience through redundancy, failover strategies, and disaster recovery planning
  • Partner closely with backend and platform teams to optimize system performance and reliability
  • Own incident response, postmortems, and continuous improvement of system stability

Qualifications and Skillset for this Role:

  • Strong experience in Site Reliability Engineering, DevOps, or infrastructure engineering, with a focus on on-prem or hybrid environments
  • Deep understanding of physical infrastructure, networking, and distributed systems
  • Experience managing servers, virtualization, and data center environments
  • Hands-on experience with automation, scripting, and deployment workflows
  • Strong troubleshooting skills across systems, networking, and performance bottlenecks
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
  • Solid understanding of security, redundancy, and system design for uptime and resilience
  • Comfortable working autonomously in a fast-paced startup environment
  • Exposure to cloud platforms (AWS, GCP, Azure) and hybrid infrastructure models
  • Experience with containers and orchestration (Docker, Kubernetes)
  • Familiarity with backend systems (Node.js / TypeScript environments)
  • Experience supporting real-time or high-concurrency systems (gaming, fintech, etc.)

Why This Role:

  • Own and shape the core infrastructure of a rapidly scaling platform
  • High ownership: define how systems are built, deployed, and operated
  • Work on real infrastructure challenges beyond just cloud abstractions
  • Startup speed: ship fast and see impact immediately
  • Fully remote, flexible environment

How to Apply:

Please email your up-to-date Resume/CV to

We appreciate all the applicants for their interest in working with us; however, only those candidates shortlisted for the next steps in the hiring process will be contacted.

Thank you, and have a wonderful day!

Skills

AWSAzureCI/CDDatadogDockerGCPGrafanaIaCKubernetesNode.jsPrometheusTypeScript

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free