KI
Mid-Level DevOps Engineer
Kaav, Inc.
Centennial · On-site Full-time Mid Level 2w ago
About the role
About
We are seeking a Mid-Level DevOps Engineer with Site Reliability Engineering (SRE) experience to contribute to the transition of Crew Management Applications to a web-based SaaS model hosted on AWS. The successful candidate will work under the guidance of a Senior DevOps Engineer, supporting critical system reliability, automation, and monitoring tasks while actively contributing to the successful implementation of key deliverables.
Job Duties
- Support Key Deliverables: Assist in implementing metrics collection, developing dashboards, conducting reliability audits, and creating runbooks as outlined in the project goals.
- Collaboration: Work closely with the Senior DevOps Engineer, development teams, and support teams to ensure seamless operations and effective communication between stakeholders.
- CI/CD and Automation: Contribute to the development and optimization of CI/CD pipelines and automation scripts to support efficient and consistent deployments.
- Observability Implementation: Assist in configuring and maintaining monitoring solutions using OpenTelemetry and Grafana to enhance system visibility.
- Production Support: Participate in 24/7 Tier II production support on a rotational basis, addressing technical escalations and contributing to system stability.
- Documentation: Collaborate in the preparation of technical documentation, including runbooks, playbooks, and training materials for Tier I and II support teams.
- Dashboards and Metrics: Support the development of Grafana dashboards for monitoring services, including Kubernetes platform components and internally developed services.
- Issue Investigation: Assist in identifying and resolving issues reported from lower-tier support teams, ensuring timely resolution and minimizing customer impact.
- Game Day Scenarios: Participate in the execution of Game Day scenarios to prepare for potential system failures and improve operational readiness.
- Reliability Contributions: Work on tasks related to reliability audits, including submitting merge requests for simpler issues and escalating more complex problems to senior team members.
Job Requirements
- Experience: 3-5 years in DevOps, SRE, or related roles with a focus on cloud-hosted, microservices-based environments.
- Technologies: Familiarity with Kubernetes, AWS EKS, Terraform, ArgoCD, OpenTelemetry, and Grafana.
- DevOps Practices: Basic understanding of CI/CD pipelines and infrastructure-as-code (IaC) principles.
- Incident Management: Experience in troubleshooting and resolving technical issues in production environments.
- Collaboration: Ability to work effectively as part of a team under the direction of senior engineers.
- Documentation: Basic skills in technical writing, including the ability to contribute to incident runbooks and operational playbooks.
- On-Call Readiness: Willingness to participate in 24/7 rotational production support as required.
Desired Skills & Experience
- Exposure to GitOps practices and tools like GitLab.
- Experience contributing to dashboards and monitoring systems for production environments.
- Familiarity with automated remediation processes and system optimization practices.
- Background in supporting SaaS environments or cloud migrations.
Additional Information
- This is a high PRIORITY requisition.
- This is a PROACTIVE requisition
- Background Check: No
- Drug Screen: No
Skills
AWS EKSArgoCDCI/CDDevOpsGrafanaIaCKubernetesOpenTelemetrySRETerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free