Kubernetes Platform Engineer (hybrid) - 2010276

Careerport

France · Hybrid Full-time Mid Level $127k – $182k/yr 2mo ago

About the role

Meet the Team

Join our Platform Engineering Team of experienced Kubernetes engineers who design, build, and operate large-scale on-premises Kubernetes environments. Our mission is to deliver a highly reliable, scalable, and GPU-enabled platform to support AI/ML workloads, while applying intelligent automation (AIOps) to improve platform operations. As part of this team, you will directly manage the Kubernetes control plane, extend platform capabilities via controllers and operators, and implement automation to detect, predict, and self-heal operational issues. Candidates must have hands-on, on-prem control plane experience and able to work within a hybrid work model on site, as needed.

Your Impact / Responsibilities

As a Kubernetes Platform Engineer, you will:

Kubernetes Control Plane & Platform Engineering

Design, build, and operate self-managed Kubernetes clusters (OpenShift / Anthos)
Manage and maintain etcd (backup, restore, quorum management, defrag)
Perform control plane upgrades and lifecycle management
Tune API server, scheduler, and controller manager for performance and reliability
Debug node-level and control-plane issues across large clusters
Implement networking (CNI), storage (CSI), and ingress integrations

AIOps & Intelligent Automation

Implement and extend runbook automation frameworks to reduce operational toil
Integrate AI agents that monitor cluster telemetry, detect anomalies, and trigger automated workflows (e.g., Slack notifications, remediation scripts)
Apply statistical or ML-based models on operational data from Splunk, Prometheus, and Kubernetes to predict failures, capacity saturation, or workload misbehavior
Build self-healing controllers and automated remediation pipelines
Implement predictive capacity planning and intelligent alert suppression workflows

Platform Extensions & Automation

Build Kubernetes controllers and operators (Go + controller-runtime)
Develop CRDs and admission webhooks to extend platform functionality
Automate cluster lifecycle and multi-cluster operations
Implement policies for workload isolation, governance, and compliance

AI/ML Workload Enablement

Enable GPU and high-performance infrastructure for AI/ML workloads
Optimize scheduler and resource allocation for memory- and compute-intensive workloads
Support orchestration of AI/ML pipelines

Minimum Qualifications

5+ years of software engineering experience
3+ years operating Kubernetes in production with hands-on control plane experience
Experience managing etcd (backup, restore, recovery) and performing control plane upgrades
Strong Go programming skills
Experience building Kubernetes operators/controllers and developing CRDs/webhooks
Deep understanding of scheduler, API server, controller loops, and reconciliation
Experience debugging and troubleshooting large-scale distributed systems
Candidates without on-prem or self-managed Kubernetes control plane experience will not be considered.

Preferred Qualifications

Experience in bare-metal or on-prem infrastructure
Experience supporting GPU-enabled workloads in Kubernetes
Exposure to building internal developer platforms
Contributions to CNCF or Kubernetes open-source projects
Hands-on experience with AI/ML-assisted operational automation (AIOps)
Experience applying statistical or ML techniques to operational data for platform reliability

Why Cisco?

At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.

Message to applicants applying to work in the U.S. and/or Canada:

The starting salary range posted for this position is $126,500.00 to $182,000.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, t

Skills

AIAIOpsAnthosCSICNIDockerGoGPUKubernetesMLOpenShiftPrometheusSplunk

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free