Skip to content
mimi

Site Reliability Engineer for AI and DevOps Support

PowerToFly

Mississauga · On-site Full-time Senior 1mo ago

About the role

About the Role

Enhance AI and DevOps platform stability as a dedicated Site Reliability Engineer. Collaborate with cross‑functional teams to resolve incidents and boost operational efficiency in a dynamic support environment. This position seeks an experienced SRE to aid our AI and DevOps Platform Support team. The role involves assisting with application stability, improving service levels, and coordinating with offshore managed services. Key skills include troubleshooting, communication, and a solid understanding of platform operations.

Key Responsibilities

  • Resolve incidents to maintain platform stability
  • Coordinate daily operational activities and vendor interactions
  • Support onboarding activities using established standards
  • Contribute to performance tuning and cost‑efficiency initiatives
  • Participate in resilience and disaster recovery activities
  • Drive platform enhancement initiatives while ensuring reliability and performance in a fast‑paced environment

Requirements

  • 5–8 years in technical support or platform operations
  • Familiarity with Kubernetes and CI/CD tools
  • Strong communication and documentation skills
  • Experience with database technologies like PostgreSQL or MongoDB
  • Knowledge of scripting in languages like Python or Java

Skills

CI/CDDockerJavaKubernetesMongoDBPostgresPython

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free