Senior Site Reliability Engineer (Systems Operations Engineer)
Openkyber
About the role
About the Role
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to support key Shared Services Operations Technology platforms, including Payment Evaluations, Regulatory Operations, Financial Crimes, and Business & Real Estate Evaluation. You will be part of a team responsible for maintaining availability, performance, and reliability across ~85 applications that support KYC, AML, and other critical financial-crimes-related workloads. This role blends software engineering, systems operations, and cloud-native reliability practices to drive automation, enhance resilience, and support modernization across a large enterprise ecosystem. You will also help evolve AIOps capabilities, including predictive alerting, self-healing workflows, and AI/ML-driven incident analysis. Some occasional weekend work or overtime may be required for critical system support.
What You'll Do
Site Reliability & Operations
- Lead SRE practices that enhance system availability, performance, and scalability across multi-cloud environments.
- Support and improve critical applications and customer journeys; lead incident response and blameless postmortems.
- Conduct root-cause analysis and drive long-term remediation of recurrent issues.
- Define and enforce operational readiness and Non-Functional Requirements (NFRs) during platform modernization.
Automation & Tooling
- Design and implement automation to eliminate operational toil and improve service reliability.
- Build frameworks for automated SLO/SLI tracking, availability metrics, error budgeting, and customer impact analysis.
- Implement self-healing and autonomic systems using AI/ML, RPA, and intelligent monitoring.
Monitoring, Observability & AIOps
- Develop and enhance monitoring, alerting, and observability capabilities.
- Drive adoption of AIOps platforms to support anomaly detection, predictive alerting, and automated incident resolution.
Collaboration & Leadership
- Collaborate with platform teams, product owners, and technology partners across the COO Technology organization.
- Mentor peers and champion SRE best practices across engineering teams.
- Identify process gaps across domains and recommend scalable, long-term improvements.
Required Qualifications
- 5+ years in Systems Engineering, Site Reliability Engineering, Technology Architecture, or related fields (or equivalent military/training/education experience).
- 2+ years performing as part of an SRE team.
- Strong written and verbal communication skills.
Technical Skills
Software Development
- Proficiency in Python and/or Java/J2EE.
- Experience with REST APIs, microservices, Kafka/MQ, and modern integration patterns.
- Familiarity with JavaScript frameworks (React, Bootstrap).
- Strong SQL skills and database schema design experience.
Infrastructure & Cloud
- Expertise with Linux and container orchestration (Kubernetes, OpenShift/OCP strongly preferred).
- Experience with PCF, AWS, Google Cloud Platform, or Azure environments.
CI/CD & Automation Tools:
- Jenkins, GitLab, SonarQube, Artifactory, Ansible.
Observability & AIOps Tools:
- Grafana, Prometheus, Splunk/ELK, AppDynamics, Elastic, ThousandEyes, Aternity, Google Cloud Logging.
AIOps Platforms:
- Moogsoft, AI/ML-based analytics frameworks.
Operations & Data ITSM Tools:
- ServiceNow, Remedy, IBM Netcool.
Databases:
- Oracle, DB2, SQL Server, MongoDB, Hadoop/Cloudera, Spark, Teradata.
Foundational AI Knowledge
- Understanding of common AI/ML concepts (classification, regression, clustering, anomaly detection).
- Ability to work with structured/unstructured data for model evaluation.
- Awareness of ethical/operational considerations in AI systems.
- Experience integrating AI into automation workflows is a plus.
Preferred Qualifications
- Experience with AutoSys.
- Prior experience in corporate banking or financial services.
- Strong interest in AI-driven operations and AIOps.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free