Skip to content
mimi

Platform / Site Reliability Engineer (SRE)

HRB

Kitchener · On-site Full-time Senior 1mo ago

About the role

Our client is transforming industries through cutting-edge technology. Their platform leverages AI, automation, and scalable systems to solve complex real-world problems.

As a Platform / Site Reliability Engineer (SRE), you will play a key role in establishing and enhancing the engineering platform. You’ll help ensure the reliability, scalability, and efficiency of our systems while developing tools that improve engineering productivity.

You will help define and shape the platform strategy, set best practices, and drive initiatives that enhance developer experience, system performance, and operational efficiency.

What You’ll Be Doing

  • DevOps & Infrastructure: Design, implement, and maintain scalable infrastructure to support engineering needs.
  • CI/CD Optimization: Improve continuous integration and deployment pipelines using AWS CDK, including requirements for deployment and database migration tooling.
  • Release Tracking & Deployment: Establish visibility into release cycles, implement automation to streamline deployments, and ensure smooth rollouts.
  • Site Reliability & Observability: Implement monitoring, logging, and alerting systems to ensure high availability and performance.
  • Internal Tooling: Build and maintain tools that improve developer efficiency, automate repetitive tasks, and enhance productivity.
  • Security & Compliance: Ensure infrastructure and deployments align with security best practices, with attention to SoC, ISO, and GDPR standards.

Experience

  • 7+ years of technical experience, with 5+ years as an SRE or similar role. Startup experience is a plus.
  • Deep expertise in AWS, including Fargate and Kubernetes for container orchestration.
  • Strong experience with CI/CD pipelines, particularly using AWS CDK.
  • Proficiency with observability tools (Datadog, Prometheus, Grafana).
  • Strong knowledge of scaling strategies and highly available architectures.
  • Proficiency in scripting/automation with Python, Bash, or TypeScript.
  • Familiarity with security best practices and compliance frameworks (SoC, ISO, GDPR).
  • Strong collaboration skills and ability to work cross-functionally.

Our Tech Stack

  • Infrastructure: AWS, Fargate, Redis, PostgreSQL, SQS, CDK, GitHub, Retool
  • Backend: Django REST framework, Celery
  • Frontend: Next.js, Tailwind CSS
  • LLM Integrations: OpenAI, Claude, AWS Bedrock

Skills

AWSAWS BedrockAWS CDKAWS FargateBashCeleryDatadogDjango REST frameworkDockerGDPRGrafanaGitHubISOKubernetesNext.jsOpenAIPostgreSQLPrometheusPythonRedisRetoolSoCSQSTailwind CSSTypeScript

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free