Skip to content
mimi

Senior Site Reliability Engineer - Remote

Moniepoint Incorporated

Nigeria · On-site Full-time Senior Today

About the role

Job Summary • We are seeking an experienced Site Reliability Engineer (SRE) responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience. • The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.

Responsibilities • Participate in on-call rotations as the primary technical lead for detecting, triaging, and resolving service degradation, outages, or reliability issues across all environments. • Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders and leading/documenting blameless Root Cause Analyses (RCAs) to identify the root causes of issues and drive long-term fixes. • Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability and operations across both applications and infrastructure to improve efficiency and system resilience. • Create and maintain monitoring dashboards and alerts to monitor application and infrastructure health. • Participate in feature development discussions to ensure services are built with observability from the ground up. • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in collaboration with Product and Engineering teams. • Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.

Requirements • Minimum of 4 years of experience supporting enterprise applications in an SRE or similar role. • Knowledge of distributed systems, microservices architecture and software design patterns. • Experience with cloud platforms such as AWS, GCP, or Azure. • Strong knowledge of Kubernetes and container orchestration tools. • Experience using application performance monitoring tools, OpenTelemetry, and observability platforms such as New Relic, Datadog, ELK, or SigNoz • Excellent problem-solving and troubleshooting skills as an on-call engineer, with the ability to resolve complex infrastructure and application issues. • Proficient in setting up and maintaining monitoring dashboards and alerts using Grafana and Prometheus.

How to Apply Interested and qualified candidates should:

Share this job:

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free