TG
Platform Site Reliability Specialist
Tandym Group
New York · Hybrid Full-time Mid Level 1mo ago
About the role
About
A recognized media services company is seeking a new Platform Site Reliability Specialist to join their growing team, focusing on the automation of monitoring and observability solutions that ensure infrastructure reliability.
This is a Hybrid opportunity requiring the qualified professional to work onsite at least 3 days a week.
Responsibilities
- Implementing and maintaining automated workflows to streamline manual processes.
- Building comprehensive dashboards for real-time system visibility.
- Developing proactive alerting systems to identify performance bottlenecks.
- Standardizing monitoring protocols for consistent service reliability.
- Providing rapid response and resolution during system disruptions
- Performing other duties, as needed
Qualifications
- 3+ years of experience in Technical Operations and Automation roles.
- High School Diploma / GED
- Proficiency with observability tools like Datadog or Zabbix
- Automation scripting capabilities
- Strong analytical and problem-solving skills
- Ability to participate in a 24/7 on-call rotation
- Attention to detail.
- Effective communication skills
Desired Qualifications
- Bachelor's Degree in a Computer-related field
- Experience in Media and Entertainment, particularly live-streaming or CDN management
- Solid understanding of Agile frameworks
Skills
DatadogZabbix
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free