Skip to content
mimi

Mid-Level DevOps Engineer

Kaav, Inc.

Centennial · On-site Full-time Mid Level 2w ago

About the role

About

We are seeking a Mid-Level DevOps Engineer with Site Reliability Engineering (SRE) experience to contribute to the transition of Crew Management Applications to a web-based SaaS model hosted on AWS. The successful candidate will work under the guidance of a Senior DevOps Engineer, supporting critical system reliability, automation, and monitoring tasks while actively contributing to the successful implementation of key deliverables.

Job Duties

  • Support Key Deliverables: Assist in implementing metrics collection, developing dashboards, conducting reliability audits, and creating runbooks as outlined in the project goals.
  • Collaboration: Work closely with the Senior DevOps Engineer, development teams, and support teams to ensure seamless operations and effective communication between stakeholders.
  • CI/CD and Automation: Contribute to the development and optimization of CI/CD pipelines and automation scripts to support efficient and consistent deployments.
  • Observability Implementation: Assist in configuring and maintaining monitoring solutions using OpenTelemetry and Grafana to enhance system visibility.
  • Production Support: Participate in 24/7 Tier II production support on a rotational basis, addressing technical escalations and contributing to system stability.
  • Documentation: Collaborate in the preparation of technical documentation, including runbooks, playbooks, and training materials for Tier I and II support teams.
  • Dashboards and Metrics: Support the development of Grafana dashboards for monitoring services, including Kubernetes platform components and internally developed services.
  • Issue Investigation: Assist in identifying and resolving issues reported from lower-tier support teams, ensuring timely resolution and minimizing customer impact.
  • Game Day Scenarios: Participate in the execution of Game Day scenarios to prepare for potential system failures and improve operational readiness.
  • Reliability Contributions: Work on tasks related to reliability audits, including submitting merge requests for simpler issues and escalating more complex problems to senior team members.

Job Requirements

  • Experience: 3-5 years in DevOps, SRE, or related roles with a focus on cloud-hosted, microservices-based environments.
  • Technologies: Familiarity with Kubernetes, AWS EKS, Terraform, ArgoCD, OpenTelemetry, and Grafana.
  • DevOps Practices: Basic understanding of CI/CD pipelines and infrastructure-as-code (IaC) principles.
  • Incident Management: Experience in troubleshooting and resolving technical issues in production environments.
  • Collaboration: Ability to work effectively as part of a team under the direction of senior engineers.
  • Documentation: Basic skills in technical writing, including the ability to contribute to incident runbooks and operational playbooks.
  • On-Call Readiness: Willingness to participate in 24/7 rotational production support as required.

Desired Skills & Experience

  • Exposure to GitOps practices and tools like GitLab.
  • Experience contributing to dashboards and monitoring systems for production environments.
  • Familiarity with automated remediation processes and system optimization practices.
  • Background in supporting SaaS environments or cloud migrations.

Additional Information

  • This is a high PRIORITY requisition.
  • This is a PROACTIVE requisition
  • Background Check: No
  • Drug Screen: No

Skills

AWS EKSArgoCDCI/CDDevOpsGrafanaIaCKubernetesOpenTelemetrySRETerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free