Skip to content
mimi

SRE-Application/Platform

Ampstek

Bloomfield · On-site Contract 3w ago

About the role

About

Role: SRE-Application/Platform

Location: Bloomfield , CT(Onsite)

Duration : Long Term Contract

Responsibilities

  • Ensure 24x7 system reliability, incident response, and operational readiness for global applications.
  • Lead troubleshooting efforts during outages/performance incidents; perform root cause analysis (RCA) and implement preventive actions.
  • Define and maintain operational metrics and reliability goals (availability, latency, throughput, resource utilization).
  • Improve system stability via proactive monitoring, alerting, and capacity planning

Big Data & Streaming Support

  • Support deployments and operations across: AWS Cloud, Kubernetes, containerized environments
  • Implement and maintain cluster reliability in Kubernetes environments: Resource quotas, access control, permissions, namespace management

Qualifications

  • Experience in monitoring, troubleshooting, performance tuning, capacity planning, and automation, along with strong exposure to distributed data processing frameworks like Spark, Flink, and Kafka.
  • Hadoop Cluster Administration & Operations

Skills

AWS CloudFlinkHadoopKafkaKubernetesSpark

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free