Senior Azure Cloud Engineer (AKS Specialist)
Jobs via Dice
About the role
Role Overview
We are seeking a highly skilled Senior Azure Cloud Engineer with deep, hands-on expertise in Azure Kubernetes Service (AKS). This role is designed for a platform-focused engineer who takes true ownership of the container ecosystem from initial cluster deployment and complex networking to long-term operational excellence.
You will be a key player in managing our cloud infrastructure, with a specific focus on supporting high-demand AI/ML workloads running within Kubernetes. The ideal candidate is a Dallas-based professional who thrives on optimizing performance, ensuring seamless scaling, and maintaining the "day-to-day" health of a mission-critical Azure environment.
Key Responsibilities
- AKS Lifecycle Management: Lead the end-to-end deployment, configuration, and management of production-grade AKS clusters.
- Networking & Ingress: Design and maintain robust Azure networking architectures, including VNet integration, Azure CNI, and sophisticated Ingress controllers (Nginx, AGIC).
- Scaling & Optimization: Implement and manage auto-scaling strategies (HPA, VPA, and Cluster Autoscaler) to handle fluctuating workloads efficiently.
- AI/ML Support: Provide platform-level support for AI/ML workloads, ensuring GPU node pools and specialized compute resources are optimized for model training and inference.
- Operational Excellence: Drive "Day 2" operations, including patch management, cluster upgrades, observability (Azure Monitor/Log Analytics), and proactive troubleshooting.
- Infrastructure as Code (IaC): Automate all infrastructure provisioning using Terraform or Bicep to ensure repeatable and consistent environments.
- Security & Governance: Enforce Azure Policy, RBAC, and Secret Management (Azure Key Vault) to maintain a secure and compliant container platform.
Required Skills & Qualifications
- Azure Mastery: Extensive experience with the Azure ecosystem (Compute, Storage, Networking, Identity).
- AKS Specialist: Deep technical knowledge of Kubernetes internals, including scheduling, storage classes, and service mesh (Istio/Linkerd).
- Advanced Networking: Hands-on experience with Azure Load Balancers, Application Gateways, Private Links, and DNS resolution within AKS.
- Automation: Strong proficiency in Terraform and CI/CD workflows (GitHub Actions or Azure DevOps).
- Scripting: Fluent in Python or Bash for operational automation and custom tooling.
- Education: Bachelor s Degree in Computer Science, Engineering, or a related technology field.
Preferred / Nice To Have
- AI/ML Exposure: Familiarity with running MLOps frameworks or AI platforms (like Azure Machine Learning) on Kubernetes.
- Certifications: Microsoft Certified: Azure Solutions Architect Expert or CKA (Certified Kubernetes Administrator).
- Monitoring: Experience with Prometheus, Grafana, or Azure Managed Grafana.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free