JV
Resiliency and Recovery Engineer - Tech Lead
Jobs via Dice
Charlotte · On-site Full-time Lead 1w ago
About the role
About
PROLIM Global Corporation is seeking the following.
Responsibilities
- Own end‑to‑end application reliability, availability, and performance for client‑critical systems.
- Define and govern SLIs, SLOs, and error budgets aligned with business and regulatory expectations.
- Lead production support and incident management, acting as Incident Commander for P1/P2 issues.
- Ensure robust monitoring, alerting, logging, and observability across application landscapes.
- Drive automation and self‑healing to reduce manual toil and improve operational efficiency.
- Partner with development and DevOps teams to embed SRE practices into CI/CD and release pipelines.
- Oversee change and release readiness, ensuring risk‑based production deployments.
- Provide on‑site client leadership, serving as the primary SRE point of contact and trusted advisor.
- Conduct and govern post‑incident reviews (RCA/PIR) and ensure preventive actions are implemented.
- Ensure compliance with security, audit, and regulatory controls relevant to the client environment.
- Lead and mentor onshore and offshore SRE/support teams, ensuring SLA adherence and skill uplift.
- Report operational KPIs, reliability trends, and improvement roadmaps to client and internal leadership.
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free