Site Reliability Engineer (SRE)

Wits Innovation Lab

Hyderabad · On-site Full-time Mid Level Yesterday

About the role

About

Wits Innovation Lab is a rapidly growing technology company specializing in providing cutting‑edge AI‑powered solutions for the financial services industry. We develop and deploy sophisticated algorithms and platforms that enable our clients to optimize trading strategies, manage risk effectively, and enhance customer experiences. Our solutions are used by leading financial institutions globally, processing billions of transactions daily.

Role Overview

As a Site Reliability Engineer (SRE) at Wits Innovation Lab, you will be instrumental in ensuring the reliability, availability, and performance of our critical AI‑driven financial platforms. You will collaborate closely with development, operations, and security teams to design, implement, and maintain robust infrastructure and automation solutions.

Design and implement scalable and resilient infrastructure solutions on AWS to support our AI platforms.
Automate infrastructure provisioning, configuration management, and application deployments using tools like Ansible, Terraform, and Kubernetes to improve efficiency and reduce manual effort.
Monitor system performance, identify bottlenecks, and implement proactive measures to prevent outages and ensure optimal performance for our financial applications.
Develop and maintain CI/CD pipelines to enable rapid and reliable software releases, ensuring continuous delivery of new features and bug fixes.
Participate in on‑call rotations to provide 24/7 support for critical systems, ensuring business continuity and minimizing impact to our clients.
Implement and maintain security best practices to protect our infrastructure and data, ensuring compliance with industry regulations and standards.

Requirements

Proven ability to design, implement, and manage infrastructure on AWS, including EC2, S3, VPC, and other relevant services.
Deep understanding of Linux system administration, including performance tuning, security hardening, and troubleshooting.
Extensive experience with configuration management tools like Ansible, Chef, or Puppet.
Strong proficiency in scripting languages such as Python, Bash, or Go.
Solid understanding of containerization technologies like Docker and orchestration platforms like Kubernetes.
Hands‑on experience with CI/CD pipelines and related tools like Jenkins, GitLab CI, or CircleCI.
Excellent communication and collaboration skills, with the ability to work effectively in a fast‑paced, agile environment.
Experience with monitoring tools like Prometheus, Grafana, or ELK stack.
Familiarity with Redis or other in‑memory data stores.
Bachelor's degree in Computer Science or a related field.

Requirements

Proven ability to design, implement, and manage infrastructure on AWS, including EC2, S3, VPC, and other relevant services.
Deep understanding of Linux system administration, including performance tuning, security hardening, and troubleshooting.
Extensive experience with configuration management tools like Ansible, Chef, or Puppet.
Strong proficiency in scripting languages such as Python, Bash, or Go.
Solid understanding of containerization technologies like Docker and orchestration platforms like Kubernetes.
Hands-on experience with CI/CD pipelines and related tools like Jenkins, GitLab CI, or CircleCI.
Excellent communication and collaboration skills, with the ability to work effectively in a fast-paced, agile environment.
Experience with monitoring tools like Prometheus, Grafana, or ELK stack.
Familiarity with Redis or other in-memory data stores.

Responsibilities

Design and implement scalable and resilient infrastructure solutions on AWS to support our AI platforms.
Automate infrastructure provisioning, configuration management, and application deployments using tools like Ansible, Terraform, and Kubernetes to improve efficiency and reduce manual effort.
Monitor system performance, identify bottlenecks, and implement proactive measures to prevent outages and ensure optimal performance for our financial applications.
Develop and maintain CI/CD pipelines to enable rapid and reliable software releases, ensuring continuous delivery of new features and bug fixes.
Participate in on-call rotations to provide 24/7 support for critical systems, ensuring business continuity and minimizing impact to our clients.
Implement and maintain security best practices to protect our infrastructure and data, ensuring compliance with industry regulations and standards.

Skills

AWSAnsibleBashCI/CDChefDockerEC2ELK stackGitLab CIGoGrafanaJenkinsKubernetesLinuxPuppetPythonRedisS3TerraformVPC

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer (SRE)

About the role

About

Role Overview

Requirements

Requirements

Responsibilities

Skills

Similar roles

Data Engineer

Software Architect, AI/ML

Cloud Engineer – GCP

Don't send a generic resume