All jobs

Site Reliability and Operations Engineer

Compunnel Inc.

Hoboken · On-site Full-time 1mo ago

Apply with a tailored resume Save job

About the role

Responsibilities

Design, build, and enhance distributed caching and compute grid solutions on Kubernetes and OpenShift platforms.
Orchestrate microservices and container workloads using Docker and Helm.
Implement observability and monitoring frameworks using Prometheus, Grafana, ELK, or OpenTelemetry.
Automate infrastructure provisioning and deployments using Ansible and Helm Charts.
Troubleshoot complex system and infrastructure issues within Kubernetes environments.
Support CI/CD processes using Jenkins, ArgoCD, and GitHub Actions.

Required Skills

5+ years of experience in infrastructure or site reliability engineering.
Deep expertise with Kubernetes and OpenShift in on-prem and cloud environments.
Proficiency in Java, Go, or Python.
Hands-on experience with Docker and Helm.
Proven experience with CI/CD tools and pipeline integration.
Expertise in observability using Prometheus, Grafana, Loki, and Jaeger.
Experience with service meshes such as Istio or Linkerd.
Knowledge of multi-cluster and hybrid cloud Kubernetes deployments.
Solid understanding of networking, security practices, and performance optimization.

Preferred Skills

Experience with high-performance computing platforms or grid computing frameworks.
Familiarity with distributed caching strategies and data sharding.
Relevant certifications such as CKAD, CKA, or Red Hat Certified Specialist in OpenShift.

Skills

AnsibleArgoCDCKADCKADockerELKGitGitHub ActionsGrafanaGoHelmIstioJaegerJavaJenkinsKubernetesLinkerdLokiOpenShiftOpenTelemetryPrometheusPythonRed Hat Certified Specialist in OpenShift

Similar roles

MCP Engineer / AI Backend Engineer

Ruby Labs

Senior Database Engineer

Glencore AG

Team Leads

imagino

€70k – €110k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free