Senior OpenStack Engineer
About the role
About the Role
We are seeking a deeply technical Senior OpenStack Engineer to design, build, automate, scale, and operate large-scale production OpenStack environments powering enterprise private clouds, MSP platforms, and high-performance digital twin lab infrastructures.
This is not a UI-driven admin role. We are looking for engineers who understand OpenStack at the service, database, messaging, hypervisor, and packet‑flow layers — individuals who can troubleshoot RabbitMQ queues, debug Neutron agents, tune Ceph latency, and automate full cloud deployments from bare metal upward.
You will work on multi‑region architectures, high‑availability designs, NVMe storage fabrics, SDN integrations, and hybrid cloud platforms supporting global customers.
Primary Responsibilities
1. OpenStack Architecture & Platform Engineering
- Design production‑grade OpenStack environments across controller, compute, and storage nodes.
- Architect HA control planes using HAProxy, Keepalived, Galera, and RabbitMQ clustering.
- Build scalable cell‑based Nova architectures.
- Implement multi‑region replication strategies.
- Perform platform capacity modeling and growth forecasting.
2. Compute Virtualization (Nova)
- Nova scheduler tuning and filters.
- CPU pinning and isolation.
- NUMA topology alignment.
- HugePages configuration.
- Live migrations and evacuations.
- GPU passthrough and SR‑IOV provisioning.
Hypervisor stack includes KVM, QEMU, Libvirt, and VirtIO.
3. Networking & SDN (Neutron)
- ML2 plugin architecture.
- OVS, OVN, Linux Bridge deployments.
- VXLAN, Geneve, VLAN overlays.
- DVR and L3 routing.
- Floating IP NAT design.
- SR‑IOV and DPDK acceleration.
- Integration with BGP EVPN, MPLS, VRFs, and SD‑WAN.
4. Storage Engineering
Ceph (Primary Requirement)
- RBD block storage.
- CephFS and RGW object storage.
- CRUSH map tuning.
- Placement group optimization.
- BlueStore performance tuning.
- NVMe and SSD tiering.
Additional exposure to Linstor, DRBD, iSCSI, and NVMe‑oF preferred.
5. Image & Lifecycle Services
- Glance image pipelines.
- QCOW2 optimization.
- Cloud‑init automation.
- Golden image lifecycle management.
6. Identity & Access (Keystone)
- RBAC modeling.
- LDAP/AD integration.
- SAML/SSO federation.
- Token lifecycle management.
7. Orchestration & Automation
- Heat orchestration templates.
- Terraform automation.
- Ansible playbooks.
- CI/CD for infrastructure.
Deployment frameworks include Kolla‑Ansible, OpenStack‑Ansible, TripleO, and MAAS/Juju.
8. Kubernetes & Containerized Control Planes
- Operate OpenStack on Kubernetes.
- Helm/Operator‑based deployments.
- Pod and persistent volume troubleshooting.
9. Bare Metal Provisioning (Ironic)
- PXE/iPXE pipelines.
- Hardware introspection.
- Integration with MAAS/Foreman.
10. Observability & Reliability Engineering
- Prometheus and Grafana monitoring.
- ELK logging pipelines.
- Incident response and RCA.
- SLA tracking and alert tuning.
11. Upgrade & Lifecycle Management
- Major version upgrades.
- Rolling compute upgrades.
- Database migrations.
- Zero‑downtime patching.
Required Technical Experience
- 8–12+ years Linux systems engineering.
- 5+ years OpenStack production operations.
- Strong KVM virtualization expertise.
- Networking: BGP, VXLAN, EVPN.
- Storage: Ceph production operations.
- Databases: MariaDB/Galera.
- Messaging: RabbitMQ.
- Automation: Ansible/Terraform.
- Scripting: Python/Bash.
Preferred Skills
- Platform9 / Canonical / Red Hat OpenStack.
- Ironic bare‑metal provisioning.
- DPDK / SR‑IOV acceleration.
- GPU workloads.
- Hybrid cloud integrations.
Work Model Requirements
- Remote within India.
- Mandatory U.S. EST shift overlap.
- Night shift operations required.
- On‑
Requirements
- 8–12+ years Linux systems engineering.
- 5+ years OpenStack production operations.
- Strong KVM virtualization expertise.
- Networking: BGP, VXLAN, EVPN.
- Storage: Ceph production operations.
- Databases: MariaDB/Galera.
- Messaging: RabbitMQ.
- Automation: Ansible/Terraform.
- Scripting: Python/Bash.
Responsibilities
- Design production-grade OpenStack environments across controller, compute, and storage nodes.
- Architect HA control planes using HAProxy, Keepalived, Galera, and RabbitMQ clustering.
- Build scalable cell-based Nova architectures.
- Implement multi-region replication strategies.
- Perform platform capacity modeling and growth forecasting.
- Nova scheduler tuning and filters.
- CPU pinning and isolation.
- NUMA topology alignment.
- HugePages configuration.
- Live migrations and evacuations.
- GPU passthrough and SR-IOV provisioning.
- ML2 plugin architecture.
- OVS, OVN, Linux Bridge deployments.
- VXLAN, Geneve, VLAN overlays.
- DVR and L3 routing.
- Floating IP NAT design.
- SR-IOV and DPDK acceleration.
- Integration with BGP EVPN, MPLS, VRFs, and SD-WAN.
- RBD block storage.
- CephFS and RGW object storage.
- CRUSH map tuning.
- Placement group optimization.
- BlueStore performance tuning.
- NVMe and SSD tiering.
- Glance image pipelines.
- QCOW2 optimization.
- Cloud-init automation.
- Golden image lifecycle management.
- RBAC modeling.
- LDAP/AD integration.
- SAML/SSO federation.
- Token lifecycle management.
- Heat orchestration templates.
- Terraform automation.
- Ansible playbooks.
- CI/CD for infrastructure.
- Operate OpenStack on Kubernetes.
- Helm/Operator-based deployments.
- Pod and persistent volume troubleshooting.
- PXE/iPXE pipelines.
- Hardware introspection.
- Integration with MAAS/Foreman.
- Prometheus and Grafana monitoring.
- ELK logging pipelines.
- Incident response and RCA.
- SLA tracking and alert tuning.
- Major version upgrades.
- Rolling compute upgrades.
- Database migrations.
- Zero-downtime patching.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free