Kubernetes Platform Engineer (hybrid) - 2010276
Careerport
About the role
Meet the Team
Join our Platform Engineering Team of experienced Kubernetes engineers who design, build, and operate large-scale on-premises Kubernetes environments. Our mission is to deliver a highly reliable, scalable, and GPU-enabled platform to support AI/ML workloads, while applying intelligent automation (AIOps) to improve platform operations. As part of this team, you will directly manage the Kubernetes control plane, extend platform capabilities via controllers and operators, and implement automation to detect, predict, and self-heal operational issues. Candidates must have hands-on, on-prem control plane experience and able to work within a hybrid work model on site, as needed.
Your Impact / Responsibilities
As a Kubernetes Platform Engineer, you will:
Kubernetes Control Plane & Platform Engineering
- Design, build, and operate self-managed Kubernetes clusters (OpenShift / Anthos)
- Manage and maintain etcd (backup, restore, quorum management, defrag)
- Perform control plane upgrades and lifecycle management
- Tune API server, scheduler, and controller manager for performance and reliability
- Debug node-level and control-plane issues across large clusters
- Implement networking (CNI), storage (CSI), and ingress integrations
AIOps & Intelligent Automation
- Implement and extend runbook automation frameworks to reduce operational toil
- Integrate AI agents that monitor cluster telemetry, detect anomalies, and trigger automated workflows (e.g., Slack notifications, remediation scripts)
- Apply statistical or ML-based models on operational data from Splunk, Prometheus, and Kubernetes to predict failures, capacity saturation, or workload misbehavior
- Build self-healing controllers and automated remediation pipelines
- Implement predictive capacity planning and intelligent alert suppression workflows
Platform Extensions & Automation
- Build Kubernetes controllers and operators (Go + controller-runtime)
- Develop CRDs and admission webhooks to extend platform functionality
- Automate cluster lifecycle and multi-cluster operations
- Implement policies for workload isolation, governance, and compliance
AI/ML Workload Enablement
- Enable GPU and high-performance infrastructure for AI/ML workloads
- Optimize scheduler and resource allocation for memory- and compute-intensive workloads
- Support orchestration of AI/ML pipelines
Minimum Qualifications
- 5+ years of software engineering experience
- 3+ years operating Kubernetes in production with hands-on control plane experience
- Experience managing etcd (backup, restore, recovery) and performing control plane upgrades
- Strong Go programming skills
- Experience building Kubernetes operators/controllers and developing CRDs/webhooks
- Deep understanding of scheduler, API server, controller loops, and reconciliation
- Experience debugging and troubleshooting large-scale distributed systems
- Candidates without on-prem or self-managed Kubernetes control plane experience will not be considered.
Preferred Qualifications
- Experience in bare-metal or on-prem infrastructure
- Experience supporting GPU-enabled workloads in Kubernetes
- Exposure to building internal developer platforms
- Contributions to CNCF or Kubernetes open-source projects
- Hands-on experience with AI/ML-assisted operational automation (AIOps)
- Experience applying statistical or ML techniques to operational data for platform reliability
Why Cisco?
At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint. Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere. We are Cisco, and our power starts with you.
Message to applicants applying to work in the U.S. and/or Canada:
The starting salary range posted for this position is $126,500.00 to $182,000.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits. Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, t
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free