Senior Site Reliability Engineer / Devops

Tangentia

Toronto · Hybrid Full-time Senior 1w ago

Apply with a tailored resume Save job

About the role

Role

Senior DevOps & Site Reliability Engineer

Location

Toronto, ON

Interview Mode

Virtual

Key Responsibilities

Oversee the reliability, availability, and performance of Apigee Hybrid and Google Distributed Cloud environments, ensuring robust SRE practices.
Manage and automate certificate management processes, including renewals, deployments, and compliance checks.
Plan and execute upgrades and maintenance activities for Apigee Hybrid and distributed cloud infrastructure, minimizing downtime and ensuring seamless transitions.
Implement and maintain monitoring solutions using Dynatrace and Splunk, proactively identifying and resolving issues to ensure system health and performance.
Troubleshoot complex production incidents, perform root cause analysis, and drive incident resolution to restore service quickly and prevent recurrence.
Develop and maintain automation scripts and Ansible playbooks for operational efficiency, including tasks such as Kubernetes context retrieval, proxy configuration, and container management.
Collaborate with cross-functional teams to ensure security, compliance, and best practices are followed across all SRE activities.
Mentor and guide team members in SRE methodologies, fostering a culture of continuous improvement and operational excellence.

Required Skills

3 years of experience in Site Reliability Engineering or related roles.
Experience with Apigee Hybrid, Google Distributed Cloud, Azure, GCP, and Kubernetes.
Advanced DevOps and SRE skills: CI/CD, automation, monitoring, infrastructure as code.
Certificate management scripting and automation.
Proficiency with Ansible for configuration management and orchestration.
Experience with APM tools such as Dynatrace, Splunk
Programming experience with python

Technologies

Ansible (Software)
Apigee Hybrid
API Management
Azure Kubernetes Service (AKS)
CI/CD
Dynatrace APM
Google Anthos
Kubernetes
Public Key Infrastructure
Python (Programming Language)
Red Hat Enterprise Linux (RHEL)
Site Reliability Engineering
Splunk
Terraform
VMware Tangentia

Requirements

3 years of experience in Site Reliability Engineering or related roles.
Experience with Apigee Hybrid, Google Distributed Cloud, Azure, GCP, and Kubernetes.
Advanced DevOps and SRE skills: CI/CD, automation, monitoring, infrastructure as code.
Certificate management scripting and automation.
Proficiency with Ansible for configuration management and orchestration.
Experience with APM tools such as Dynatrace, Splunk
Programming experience with python

Responsibilities

Oversee the reliability, availability, and performance of Apigee Hybrid and Google Distributed Cloud environments, ensuring robust SRE practices.
Manage and automate certificate management processes, including renewals, deployments, and compliance checks.
Plan and execute upgrades and maintenance activities for Apigee Hybrid and distributed cloud infrastructure, minimizing downtime and ensuring seamless transitions.
Implement and maintain monitoring solutions using Dynatrace and Splunk, proactively identifying and resolving issues to ensure system health and performance.
Troubleshoot complex production incidents, perform root cause analysis, and drive incident resolution to restore service quickly and prevent recurrence.
Develop and maintain automation scripts and Ansible playbooks for operational efficiency, including tasks such as Kubernetes context retrieval, proxy configuration, and container management.
Collaborate with cross-functional teams to ensure security, compliance, and best practices are followed across all SRE activities.
Mentor and guide team members in SRE methodologies, fostering a culture of continuous improvement and operational excellence.

Skills

AnsibleAPI ManagementApigee HybridAzureAzure Kubernetes Service (AKS)CI/CDDockerDynatrace APMGCPGoogle AnthosGoogle Distributed CloudInfrastructure as CodeKubernetesMonitoringPythonPublic Key InfrastructureRed Hat Enterprise Linux (RHEL)Site Reliability EngineeringSplunkTerraformVMware Tangentia

Similar roles

Platform Engineer

Oteemo Inc.

Senior Cloud / DevSecOps Engineer -Top Secret Clearance Required

Northwest Talent Solutions LLC

$140k – $190k/yr

Engineering Lead

Taurus SA

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free