Skip to content
mimi

AI Platform SRE

HMG AMERICA LLC

US · Hybrid Contract Senior Yesterday

About the role

About

AI Platform SRE

Responsibilities

  • Own end-to-end delivery of major platform initiatives—from architecture and design through deployment and post-launch optimization
  • Lead deep technical ownership of Kubernetes environments, including cluster management, networking, operators, container lifecycle, and multi-tenant orchestration
  • Design and build scalable, reliable distributed systems and cloud-native infrastructure on AWS and/or GCP
  • Drive engineering excellence through code quality, design reviews, automation, and CI/CD best practices
  • Collaborate cross-functionally with Product, AI, and Security teams to align technical solutions with business objectives
  • Mentor engineers and guide architectural decisions, trade-offs, and delivery approaches
  • Partner with leadership to shape engineering strategy, roadmap planning, and platform evolution

Required Qualifications

  • 9+ years of experience in software engineering, with a strong focus on backend systems and infrastructure
  • Proficiency in Python and/or Go, with a track record of delivering production-grade systems
  • Deep, hands-on experience with Kubernetes, including building and operating clusters in production environments
  • Proven expertise in designing and managing distributed systems at scale
  • Strong experience with cloud platforms (AWS and/or GCP), including compute, networking, storage, and IAM
  • Experience with Infrastructure as Code (Terraform or similar) and CI/CD pipelines
  • Familiarity with applied AI tools and ecosystems, such as agent frameworks, AI gateways, or models like Claude and LiteLLM
  • Strong system design skills and architectural decision-making capability
  • Excellent communication and collaboration skills across engineering, product, and security teams

Preferred Qualifications

  • Experience with observability tools such as Prometheus, Grafana, Datadog, and OpenTelemetry
  • Exposure to multi-cloud or hybrid infrastructure environments
  • Knowledge of API gateways, AI gateways, and policy frameworks (e.g., ABAC, OPA)
  • Experience in service mesh architectures or platform-as-a-service design
  • Demonstrated ability to improve engineering productivity and operational efficiency at scale

Skills

AWSCI/CDCloud NativeCloud PlatformsContainer OrchestrationDockerGCPGoInfrastructure as CodeKubernetesPythonTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free