Skip to content
mimi

SRE Specialist

FIS

Pune · On-site Full-time Senior Yesterday

About the role

About

As a Site Reliability Engineer at FIS, you will play a critical role in driving innovation and growth for the Banking Solutions, Payments, and Capital Markets business. You will have the opportunity to make a lasting impact on the company's transformation journey, drive customer‑centric innovation and automation, and position the organization as a leader in the competitive banking, payments, and investment landscape.

Responsibilities

  • Designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
  • Implementing automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
  • Ensuring application reliability, availability, and performance, minimizing downtime and optimizing response times.
  • Leading incident response, including identification, triage, resolution, and post‑incident analysis.
  • Conducting capacity planning, performance tuning, and resource optimization.
  • Collaborating with security teams to implement best practices and ensure compliance.
  • Managing deployment pipelines and configuration management for consistent and reliable app deployments.
  • Developing and testing disaster recovery plans and backup strategies.
  • Collaborating with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
  • Participating in on‑call rotations and providing 24/7 support for critical incidents.

Duplicate responsibilities (as provided in the original posting):

  • Designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
  • Implementing automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
  • Ensuring application reliability, availability, and performance, minimizing downtime and optimizing response times.
  • Leading incident response, including identification, triage, resolution, and post‑incident analysis.
  • Conducting capacity planning, performance tuning, and resource optimization.
  • Collaborating with security teams to implement best practices and ensure compliance.
  • Managing deployment pipelines and configuration management for consistent and reliable app deployments.
  • Developing and testing disaster recovery plans and backup strategies.
  • Collaborating with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
  • Participating in on‑call rotations and providing 24/7 support for critical incidents.

Qualifications Required

  • 7 to 12 Years of experience in development technologies, architectures, and platforms (Web, API, Middleware, Service Bus, Enterprise Application servers).
  • Experience in development with at least one of (.NET, Java, REST APIs, Microservices, SQL).
  • Proficiency in Unix/Linux command‑line utilities for troubleshooting system and application issues, including process availability and state, stack analysis, network diagnostics, and log inspection.
  • Knowledge of implementing and integrating observability monitoring tools like Splunk, SolarWinds (Ignite), Service View, ServiceNow, Wireshark, TFS, Git, AutoFailover Tools, WAF, Akamai ION/GTM.
  • Demonstrated ability to troubleshoot and resolve complex issues by analyzing system and application metrics, ensuring effective fault identification and resolution.
  • Proficiency in scripting languages (Python, Bash).
  • Ownership approach to engineering and product outcomes. Proactively spotting problems, areas for improvement & performance bottlenecks.
  • Excellent interpersonal communication, negotiation, and influencing skills.
  • Experience with CI/CD pipelines (Harness) and Monitoring (Splunk, Dynatrace).
  • Proven experience in SRE practices, including automation, monitoring, and performance tuning.
  • Flexible working in shifts and on‑call.

Preferred Qualifications (It is preferable to have)

  • Experience with hybrid environments integrating on‑prem and cloud platforms.
  • Certifications in SRE, DevOps, or Cloud technologies.
  • Exposure to AI/ML enablement for automation and resiliency.
  • Experience with containerization (Docker, Kubernetes) and cloud‑native architecture.

Requirements

  • 7 to 12 Years of experience in development technologies, architectures, and platforms (Web, API, Middleware, Service Bus, Enterprise Application servers).
  • Experience in development with at least one of (.NET, Java, REST APIs, Microservices, SQL.)
  • Proficiency in Unix/Linux command-line utilities for troubleshooting system and application issues, including process availability and state, stack analysis, network diagnostics, and log inspection.
  • Knowledge of implementing and integrating observability monitoring tools like Splunk, SolarWinds (Ignite), Service View, ServiceNow, Wireshark, TFS, Git, AutoFailover Tools, WAF, Akamai ION/GTM.
  • Demonstrated ability to troubleshoot and resolve complex issues by analyzing system and application metrics, ensuring effective fault identification and resolution.
  • Proficiency in scripting languages (Python, Bash).
  • Ownership approach to engineering and product outcomes. Proactively spotting problems, areas for improvement & performance bottlenecks
  • Excellent interpersonal communication, negotiation, and influencing skills.
  • Experience with CI/CD pipelines (Harness) and Monitoring (Splunk, Dynatrace).
  • Proven experience in SRE practices, including automation, monitoring, and performance tuning.
  • Flexible working in shifts and on-call.

Responsibilities

  • Designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
  • Implementing automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
  • Ensuring application reliability, availability, and performance, minimizing downtime and optimizing response times.
  • Leading incident response, including identification, triage, resolution, and post-incident analysis.
  • Conducting capacity planning, performance tuning, and resource optimization.
  • Collaborating with security teams to implement best practices and ensure compliance.
  • Managing deployment pipelines and configuration management for consistent and reliable app deployments.
  • Developing and testing disaster recovery plans and backup strategies.
  • Collaborating with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
  • Participating in on-call rotations and providing 24/7 support for critical incidents.

Skills

.NETAPIAkamai ION/GTMAutoFailover ToolsBashCI/CDDockerDynatraceEnterprise Application serversGitHarnessJavaKubernetesMicroservicesMiddlewareMonitoringNETOn-callPythonREST APIsService BusServiceNowSolarWinds (Ignite)SplunkSQLTFSUnix/LinuxWAFWebWireshark

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free