Skip to content
mimi

Site Reliability and Operations Engineer

Compunnel Inc.

Hoboken · On-site Full-time 5d ago

About the role

You will manage and optimize Kubernetes-based distributed caching and compute grid systems to ensure high availability and scalability. Responsibilities Design, build, and enhance distributed caching and compute grid solutions on Kubernetes and OpenShift platforms. Orchestrate microservices and container workloads using Docker and Helm. Implement observability and monitoring frameworks using Prometheus, Grafana, ELK, or OpenTelemetry. Automate infrastructure provisioning and deployments using Ansible and Helm Charts. Troubleshoot complex system and infrastructure issues within Kubernetes environments. Support CI/CD processes using Jenkins, ArgoCD, and GitHub Actions. Required Skills 5+ years of experience in infrastructure or site reliability engineering. Deep expertise with Kubernetes and OpenShift in on-prem and cloud environments. Proficiency in Java, Go, or Python. Hands-on experience with Docker and Helm. Proven experience with CI/CD tools and pipeline integration. Expertise in observability using Prometheus, Grafana, Loki, and Jaeger. Experience with service meshes such as Istio or Linkerd. Knowledge of multi-cluster and hybrid cloud Kubernetes deployments. Solid understanding of networking, security practices, and performance optimization. Preferred Skills Experience with high-performance computing platforms or grid computing frameworks. Familiarity with distributed caching strategies and data sharding. Relevant certifications such as CKAD, CKA, or Red Hat Certified Specialist in OpenShift.

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free