P
Senior SRE/DevOps
Playson
Remote (Global) Full-time Senior Yesterday
About the role
About
Founded in 2012, Playson is a leading iGaming supplier recognised worldwide. We provide our customers with a high‑end micro‑service‑based platform as a service that aims to process billions of financial transactions per day. We provide a cross‑regional setup and are chasing latency reduction down to zero. We highly invest in delivering the best game experience and smooth connection regardless of the internet coverage and bandwidth of the game clients.
We are currently seeking an experienced Senior SRE/DevOps to join our dynamic Platform Tribe.
Responsibilities
- Manage day‑to‑day alerts, system checks, and issue escalation as necessary.
- Provide 24x7 on‑call support for critical SaaS events.
- Document issues and remediation steps.
- Proactively create monitors within the EKS/K8s ecosystem.
- Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
- Enhance infrastructure health by implementing checks and scripts to address known issues.
- Maintain and develop deployment code.
- Implement/integrate new technologies into our Cloud Infrastructure.
- Collaborate with other teams to provide top‑notch support and assistance.
- Prioritise customer focus in planning deployments/updates, ensuring minimal impact.
- Conduct RCA and take necessary corrective actions to prevent issue recurrence.
- Assign alert‑related actions to the appropriate team after investigation.
- Handle support requests for environment‑specific actions.
Requirements
- Proficiency in Kubernetes (deployment, scaling, troubleshooting).
- Experience with configuration management tools like FluxCD/ArgoCD.
- Strong experience with issue processing (RCA, Postmortems).
- Familiarity with AWS, Terraform, Docker, CI/CD.
- Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
- Strong understanding of networking concepts and protocols.
- Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
- Proficiency in Git or other version control systems.
- Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps.
- Ownership, proactiveness, persistence, and passion for maintaining a high‑traffic online platform.
Benefits
- Competitive Salary and annual performance/salary reviews
- Realistic and transparent Bonus system ~20%, paid quarterly
- Unlimited paid vacation leave & paid sick leave
- Flexible work schedule to accommodate your needs
- 100% Remote
- Medical Insurance for you +1
- Financial Support for Life Events & Extended Parental Leave
- Paid professional development courses and trainings
- B2B contracts
Recruitment Process
- HR Interview (30‑45 min)
- Meeting with a Product Owner (60 min)
- Technical interview (90 min)
- Final Interview with CTO & Software Architect (60 min)
Requirements
- Proficiency in Kubernetes (deployment, scaling, troubleshooting).
- Experience with configuration management tools like FluxCD/ArgoCD.
- Strong experience with issue processing (RCA, Postmortems).
- Familiarity with AWS, Terraform, Docker, CI/CD.
- Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
- Strong understanding of networking concepts and protocols.
- Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
- Proficiency in Git or other version control systems.
- Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps.
Responsibilities
- Manage day-to-day alerts, system checks, and issue escalation as necessary.
- Provide 24x7 on-call support for critical SaaS events.
- Document issues and remediation steps.
- Proactively create monitors within the EKS/K8s ecosystem.
- Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
- Enhance infrastructure health by implementing checks and scripts to address known issues.
- Maintain and develop deployment code.
- Implement/integrate new technologies into our Cloud Infrastructure.
- Collaborate with other teams to provide top-notch support and assistance.
- Prioritise customer focus in planning deployments/updates, ensuring minimal impact.
- Conduct RCA and take necessary corrective actions to prevent issue recurrence.
- Assign alert-related actions to the appropriate team after investigation.
- Handle support requests for environment-specific actions.
Benefits
Medical InsurancePaid professional development courses and trainingsUnlimited paid vacation leavePaid sick leaveExtended Parental Leave
Skills
AWSArgoCDCI/CDDataDogDockerElasticsearchFluxCDGitGoGrafanaHelmKibanaKubernetesLogstashNodeJSPagerDutyPrometheusPythonTerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free