PG
Site Reliability Engineer
Pacer Group
Longueuil · On-site Full-time Senior 1w ago
About the role
About
Requirements
- 7-8 years of experience in SRE / Infrastructure / ops for large-scale systems
- Experience in supporting IaaS platforms
- Exp. in infrastructure supporting GenAI applications
- Should have strong programming/scripting skills (Python, Go, Java)
- Experience with containerization (Docker) and orchestration (Kubernetes, etc.) tools
- Exp. with IaC (Terraform, Helm, CloudFormation, Ansible, etc.)
- Knowledge of GPU / AI compute clusters
- Exp. with monitoring/ alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
- Networking & systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
Skills
AnsibleCloudFormationDockerDatadogELKEFKGrafanaGoHelmIaaSIaCJavaKubernetesPrometheusPythonSRETerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free