Site Reliability Engineer (OpenShift & Infrastructure)

Accion Labs

Winnipeg · On-site Contract Mid Level 1mo ago

About the role

Install, configure, upgrade, and administer OpenShift clusters (OCP) in on-premise and cloud environments.
Manage OCP internal networking, ingress, egress, and cluster services.
Configure and integrate LDAP authentication and access management.
Implement TLS and MTLS encryption, and manage certificate lifecycle for secure communications.
Implement GitOps workflows using ArgoCD for continuous delivery and environment consistency.
Automate platform and application provisioning using Terraform and Ansible.
Configure and maintain F5 LTM load balancers.
Configure and manage DNS, networking, and subnets.
Build and manage monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, ELK).
Define and enforce SLIs/SLOs and error budgets for services running on OCP.
Lead incident response, root cause analysis (RCA), and postmortems.
Build automation for self‑healing, scaling, and zero-touch operations.
Ensure high availability, disaster recovery, and failover strategies are implemented.
Secure platform and workloads following enterprise security standards.
Support application deployments and CI/CD pipelines on OpenShift.
Troubleshoot networking, cluster, and deployment issues end-to-end.
Apply SRE best practices to improve reliability, scalability, and performance.
Collaborate with development and platform teams to optimize system operations.

AnsibleArgoCDAWS LambdaCertificate ManagementCI/CDDockerELKF5 LTMGitOpsGrafanaLDAPMTLSNetworkingOpenShiftPrometheusRCASRETerraformTLS

caci international inc

$113k – $238k/yr

Sompo

$160k – $175k/yr

Pyramid Technology Solutions

$100k – $125k/yr

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.