G
Senior DevOps Engineer
GDH
Remote · US Full-time Senior $41 – $44/hr 3w ago
About the role
Role Summary
This position is a senior-level DevOps Engineer responsible for supporting and optimizing cloud-based collaboration platforms. The role involves operating, scaling, and maintaining observability platforms, Kubernetes environments, and automated deployment pipelines to ensure reliable and efficient large-scale distributed systems. The ideal candidate possesses extensive production experience, a strong operational discipline, and a focus on automation and reliability.
Responsibilities
- Design, develop, and maintain observability platforms, including logging, metrics, and tracing solutions for web services.
- Manage, operate, and optimize multi-region Kubernetes clusters to support high availability and scalability.
- Own and enhance continuous integration and continuous delivery (CI/CD) pipelines utilizing Argo CD and Helm.
- Implement infrastructure as code using Terraform on Amazon Web Services (AWS).
- Operate monitoring and logging ecosystems such as OpenSearch or ELK, Prometheus, Grafana, Splunk, and Kafka.
- Develop automation tools to proactively detect, troubleshoot, and resolve production issues.
- Enforce security standards through vulnerability management, platform hardening, and compliance checks.
- Collaborate with application, platform, and security teams to improve system reliability and performance.
- Participate in on-call rotations and lead incident response activities to ensure rapid resolution of issues.
- Contribute to system architecture design, operational best practices, and review processes for distributed systems.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- Minimum of eight years of experience in DevOps, Site Reliability Engineering, or platform engineering roles.
- Extensive experience operating large-scale Kubernetes environments, with proficiency in container orchestration and resource tuning.
- Hands-on expertise with Helm chart management, multi-cluster operations, and pod scheduling.
- Strong knowledge of observability stacks such as OpenSearch/Elasticsearch, PrometheMimir, Grafana, Loki, Splunk, or Logstash.
- Proven experience designing ingestion pipelines, query optimization, and capacity planning for telemetry systems.
- Proficiency with infrastructure as code tools like Terraform or Ansible on AWS.
- Working knowledge of scripting and automation languages such as Python, Golang, or Bash.
- Experience supporting 24/7 production environments, including incident management, alert triage, and post-incident review processes.
- Ability to work in a fast-paced environment with strong problem-solving skills.
Compensation
Publishing Pay Range: $41.16 - $43.68 hourly
Location
This is a fully remote role and can be performed from an approved location.
Skills
AnsibleArgo CDAWSBashCI/CDDockerELKElasticsearchGrafanaGolangHelmInfrastructure as CodeKafkaKubernetesLogstashLokiOpenSearchPrometheusPythonSplunkTerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free