Skip to content
mimi

Senior Site Reliability Engineer (Systems Operations Engineer)

Openkyber

US · Hybrid Contract Senior $61 – $66/hr Today

About the role

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to support key Shared Services Operations Technology platforms, including Payment Evaluations, Regulatory Operations, Financial Crimes, and Business & Real Estate Evaluation. You will be part of a team responsible for maintaining availability, performance, and reliability across ~85 applications that support KYC, AML, and other critical financial-crimes-related workloads. This role blends software engineering, systems operations, and cloud-native reliability practices to drive automation, enhance resilience, and support modernization across a large enterprise ecosystem. You will also help evolve AIOps capabilities, including predictive alerting, self-healing workflows, and AI/ML-driven incident analysis. Some occasional weekend work or overtime may be required for critical system support.

What You'll Do

Site Reliability & Operations

  • Lead SRE practices that enhance system availability, performance, and scalability across multi-cloud environments.
  • Support and improve critical applications and customer journeys; lead incident response and blameless postmortems.
  • Conduct root-cause analysis and drive long-term remediation of recurrent issues.
  • Define and enforce operational readiness and Non-Functional Requirements (NFRs) during platform modernization.

Automation & Tooling

  • Design and implement automation to eliminate operational toil and improve service reliability.
  • Build frameworks for automated SLO/SLI tracking, availability metrics, error budgeting, and customer impact analysis.
  • Implement self-healing and autonomic systems using AI/ML, RPA, and intelligent monitoring.

Monitoring, Observability & AIOps

  • Develop and enhance monitoring, alerting, and observability capabilities.
  • Drive adoption of AIOps platforms to support anomaly detection, predictive alerting, and automated incident resolution.

Collaboration & Leadership

  • Collaborate with platform teams, product owners, and technology partners across the COO Technology organization.
  • Mentor peers and champion SRE best practices across engineering teams.
  • Identify process gaps across domains and recommend scalable, long-term improvements.

Required Qualifications

  • 5+ years in Systems Engineering, Site Reliability Engineering, Technology Architecture, or related fields (or equivalent military/training/education experience).
  • 2+ years performing as part of an SRE team.
  • Strong written and verbal communication skills.

Technical Skills

Software Development

  • Proficiency in Python and/or Java/J2EE.
  • Experience with REST APIs, microservices, Kafka/MQ, and modern integration patterns.
  • Familiarity with JavaScript frameworks (React, Bootstrap).
  • Strong SQL skills and database schema design experience.

Infrastructure & Cloud

  • Expertise with Linux and container orchestration (Kubernetes, OpenShift/OCP strongly preferred).
  • Experience with PCF, AWS, Google Cloud Platform, or Azure environments.

CI/CD & Automation Tools:

  • Jenkins, GitLab, SonarQube, Artifactory, Ansible.

Observability & AIOps Tools:

  • Grafana, Prometheus, Splunk/ELK, AppDynamics, Elastic, ThousandEyes, Aternity, Google Cloud Logging.

AIOps Platforms:

  • Moogsoft, AI/ML-based analytics frameworks.

Operations & Data ITSM Tools:

  • ServiceNow, Remedy, IBM Netcool.

Databases:

  • Oracle, DB2, SQL Server, MongoDB, Hadoop/Cloudera, Spark, Teradata.

Foundational AI Knowledge

  • Understanding of common AI/ML concepts (classification, regression, clustering, anomaly detection).
  • Ability to work with structured/unstructured data for model evaluation.
  • Awareness of ethical/operational considerations in AI systems.
  • Experience integrating AI into automation workflows is a plus.

Preferred Qualifications

  • Experience with AutoSys.
  • Prior experience in corporate banking or financial services.
  • Strong interest in AI-driven operations and AIOps.

Skills

AnsibleAppDynamicsAWSAzureBootstrapsDB2DockerElasticELKGoogle Cloud PlatformGrafanaHadoopIBM NetcoolJenkinsJIRAKafkaKubernetesLinuxMicroservicesMongoDBOracleOpenShiftPCFPrometheusPythonRPAReactRemedyServiceNowSparkSplunkSQLTeradataThousandEyesUnixVMwareXML

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free