Skip to content
mimi

Senior DevOps Engineer

GDH

Remote · US Full-time Senior $41 – $44/hr 3w ago

About the role

Role Summary

This position is a senior-level DevOps Engineer responsible for supporting and optimizing cloud-based collaboration platforms. The role involves operating, scaling, and maintaining observability platforms, Kubernetes environments, and automated deployment pipelines to ensure reliable and efficient large-scale distributed systems. The ideal candidate possesses extensive production experience, a strong operational discipline, and a focus on automation and reliability.

Responsibilities

  • Design, develop, and maintain observability platforms, including logging, metrics, and tracing solutions for web services.
  • Manage, operate, and optimize multi-region Kubernetes clusters to support high availability and scalability.
  • Own and enhance continuous integration and continuous delivery (CI/CD) pipelines utilizing Argo CD and Helm.
  • Implement infrastructure as code using Terraform on Amazon Web Services (AWS).
  • Operate monitoring and logging ecosystems such as OpenSearch or ELK, Prometheus, Grafana, Splunk, and Kafka.
  • Develop automation tools to proactively detect, troubleshoot, and resolve production issues.
  • Enforce security standards through vulnerability management, platform hardening, and compliance checks.
  • Collaborate with application, platform, and security teams to improve system reliability and performance.
  • Participate in on-call rotations and lead incident response activities to ensure rapid resolution of issues.
  • Contribute to system architecture design, operational best practices, and review processes for distributed systems.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field.
  • Minimum of eight years of experience in DevOps, Site Reliability Engineering, or platform engineering roles.
  • Extensive experience operating large-scale Kubernetes environments, with proficiency in container orchestration and resource tuning.
  • Hands-on expertise with Helm chart management, multi-cluster operations, and pod scheduling.
  • Strong knowledge of observability stacks such as OpenSearch/Elasticsearch, PrometheMimir, Grafana, Loki, Splunk, or Logstash.
  • Proven experience designing ingestion pipelines, query optimization, and capacity planning for telemetry systems.
  • Proficiency with infrastructure as code tools like Terraform or Ansible on AWS.
  • Working knowledge of scripting and automation languages such as Python, Golang, or Bash.
  • Experience supporting 24/7 production environments, including incident management, alert triage, and post-incident review processes.
  • Ability to work in a fast-paced environment with strong problem-solving skills.

Compensation

Publishing Pay Range: $41.16 - $43.68 hourly

Location

This is a fully remote role and can be performed from an approved location.

Skills

AnsibleArgo CDAWSBashCI/CDDockerELKElasticsearchGrafanaGolangHelmInfrastructure as CodeKafkaKubernetesLogstashLokiOpenSearchPrometheusPythonSplunkTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free