Site Reliability Engineer (SRE)

WhatJobs Direct

Jimeta · On-site Full-time 4mo ago

About the role

Our client is a leading technology firm focused on building scalable and resilient infrastructure. We are seeking an experienced Site Reliability Engineer (SRE) to join our dedicated operations team in Yola, Adamawa, NG . This critical role involves ensuring the high availability, performance, and scalability of our production systems. The SRE will work on a variety of challenges, from automating operational tasks to designing robust monitoring and alerting systems. This position is based on-site and requires a strong understanding of distributed systems and cloud computing.

Responsibilities: Design, build, and maintain reliable, scalable, and high-performance production systems. Develop automation tools and scripts to streamline operational processes, including deployment, monitoring, and incident response. Implement and manage infrastructure as code (IaC) using tools like Terraform or Ansible. Monitor system performance, identify bottlenecks, and implement optimizations. Respond to production incidents, perform root cause analysis, and implement preventative measures. Collaborate with development teams to improve application reliability and operational readiness. Manage cloud infrastructure (e.g., AWS, Azure, GCP) and container orchestration platforms (e.g., Kubernetes). Develop and maintain comprehensive documentation for systems and processes. Participate in on-call rotation to support production systems. Continuously evaluate and recommend new technologies and best practices to enhance system reliability. Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. Proven experience in a Site Reliability Engineering, Systems Administration, or DevOps role. Strong proficiency in at least one programming or scripting language (e.g., Python, Go, Bash). Hands-on experience with cloud platforms (AWS, Azure, GCP). Experience with containerization technologies (Docker, Kubernetes). Solid understanding of networking concepts, operating systems (Linux), and distributed systems. Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Familiarity with CI/CD pipelines and tools. Excellent troubleshooting and problem-solving skills. Strong communication and collaboration abilities.

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer (SRE)

About the role

Similar roles

Regional Asset Manager

backend developer

AR/VR iOS/Android App Developer

Don't send a generic resume