Mid-Level DevOps Engineer

Kaav, Inc.

Centennial · On-site Full-time Mid Level 2mo ago

About the role

About

We are seeking a Mid-Level DevOps Engineer with Site Reliability Engineering (SRE) experience to contribute to the transition of Crew Management Applications to a web-based SaaS model hosted on AWS. The successful candidate will work under the guidance of a Senior DevOps Engineer, supporting critical system reliability, automation, and monitoring tasks while actively contributing to the successful implementation of key deliverables.

Job Duties

Support Key Deliverables: Assist in implementing metrics collection, developing dashboards, conducting reliability audits, and creating runbooks as outlined in the project goals.
Collaboration: Work closely with the Senior DevOps Engineer, development teams, and support teams to ensure seamless operations and effective communication between stakeholders.
CI/CD and Automation: Contribute to the development and optimization of CI/CD pipelines and automation scripts to support efficient and consistent deployments.
Observability Implementation: Assist in configuring and maintaining monitoring solutions using OpenTelemetry and Grafana to enhance system visibility.
Production Support: Participate in 24/7 Tier II production support on a rotational basis, addressing technical escalations and contributing to system stability.
Documentation: Collaborate in the preparation of technical documentation, including runbooks, playbooks, and training materials for Tier I and II support teams.
Dashboards and Metrics: Support the development of Grafana dashboards for monitoring services, including Kubernetes platform components and internally developed services.
Issue Investigation: Assist in identifying and resolving issues reported from lower-tier support teams, ensuring timely resolution and minimizing customer impact.
Game Day Scenarios: Participate in the execution of Game Day scenarios to prepare for potential system failures and improve operational readiness.
Reliability Contributions: Work on tasks related to reliability audits, including submitting merge requests for simpler issues and escalating more complex problems to senior team members.

Job Requirements

Experience: 3-5 years in DevOps, SRE, or related roles with a focus on cloud-hosted, microservices-based environments.
Technologies: Familiarity with Kubernetes, AWS EKS, Terraform, ArgoCD, OpenTelemetry, and Grafana.
DevOps Practices: Basic understanding of CI/CD pipelines and infrastructure-as-code (IaC) principles.
Incident Management: Experience in troubleshooting and resolving technical issues in production environments.
Collaboration: Ability to work effectively as part of a team under the direction of senior engineers.
Documentation: Basic skills in technical writing, including the ability to contribute to incident runbooks and operational playbooks.
On-Call Readiness: Willingness to participate in 24/7 rotational production support as required.

Desired Skills & Experience

Exposure to GitOps practices and tools like GitLab.
Experience contributing to dashboards and monitoring systems for production environments.
Familiarity with automated remediation processes and system optimization practices.
Background in supporting SaaS environments or cloud migrations.

Additional Information

This is a high PRIORITY requisition.
This is a PROACTIVE requisition
Background Check: No
Drug Screen: No

Skills

AWS EKSArgoCDCI/CDDevOpsGrafanaIaCKubernetesOpenTelemetrySRETerraform

Similar roles

Senior Database Engineer

Glencore AG

Team Leads

imagino

€70k – €110k/yr

Staff Engineer

imagino

€70k – €110k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free