F
SRE Specialist
FIS
Pune · On-site Full-time Senior Yesterday
About the role
About
As a Site Reliability Engineer at FIS, you will play a critical role in driving innovation and growth for the Banking Solutions, Payments, and Capital Markets business. You will have the opportunity to make a lasting impact on the company's transformation journey, drive customer‑centric innovation and automation, and position the organization as a leader in the competitive banking, payments, and investment landscape.
Responsibilities
- Designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
- Implementing automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
- Ensuring application reliability, availability, and performance, minimizing downtime and optimizing response times.
- Leading incident response, including identification, triage, resolution, and post‑incident analysis.
- Conducting capacity planning, performance tuning, and resource optimization.
- Collaborating with security teams to implement best practices and ensure compliance.
- Managing deployment pipelines and configuration management for consistent and reliable app deployments.
- Developing and testing disaster recovery plans and backup strategies.
- Collaborating with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
- Participating in on‑call rotations and providing 24/7 support for critical incidents.
Duplicate responsibilities (as provided in the original posting):
- Designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
- Implementing automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
- Ensuring application reliability, availability, and performance, minimizing downtime and optimizing response times.
- Leading incident response, including identification, triage, resolution, and post‑incident analysis.
- Conducting capacity planning, performance tuning, and resource optimization.
- Collaborating with security teams to implement best practices and ensure compliance.
- Managing deployment pipelines and configuration management for consistent and reliable app deployments.
- Developing and testing disaster recovery plans and backup strategies.
- Collaborating with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
- Participating in on‑call rotations and providing 24/7 support for critical incidents.
Qualifications Required
- 7 to 12 Years of experience in development technologies, architectures, and platforms (Web, API, Middleware, Service Bus, Enterprise Application servers).
- Experience in development with at least one of (.NET, Java, REST APIs, Microservices, SQL).
- Proficiency in Unix/Linux command‑line utilities for troubleshooting system and application issues, including process availability and state, stack analysis, network diagnostics, and log inspection.
- Knowledge of implementing and integrating observability monitoring tools like Splunk, SolarWinds (Ignite), Service View, ServiceNow, Wireshark, TFS, Git, AutoFailover Tools, WAF, Akamai ION/GTM.
- Demonstrated ability to troubleshoot and resolve complex issues by analyzing system and application metrics, ensuring effective fault identification and resolution.
- Proficiency in scripting languages (Python, Bash).
- Ownership approach to engineering and product outcomes. Proactively spotting problems, areas for improvement & performance bottlenecks.
- Excellent interpersonal communication, negotiation, and influencing skills.
- Experience with CI/CD pipelines (Harness) and Monitoring (Splunk, Dynatrace).
- Proven experience in SRE practices, including automation, monitoring, and performance tuning.
- Flexible working in shifts and on‑call.
Preferred Qualifications (It is preferable to have)
- Experience with hybrid environments integrating on‑prem and cloud platforms.
- Certifications in SRE, DevOps, or Cloud technologies.
- Exposure to AI/ML enablement for automation and resiliency.
- Experience with containerization (Docker, Kubernetes) and cloud‑native architecture.
Requirements
- 7 to 12 Years of experience in development technologies, architectures, and platforms (Web, API, Middleware, Service Bus, Enterprise Application servers).
- Experience in development with at least one of (.NET, Java, REST APIs, Microservices, SQL.)
- Proficiency in Unix/Linux command-line utilities for troubleshooting system and application issues, including process availability and state, stack analysis, network diagnostics, and log inspection.
- Knowledge of implementing and integrating observability monitoring tools like Splunk, SolarWinds (Ignite), Service View, ServiceNow, Wireshark, TFS, Git, AutoFailover Tools, WAF, Akamai ION/GTM.
- Demonstrated ability to troubleshoot and resolve complex issues by analyzing system and application metrics, ensuring effective fault identification and resolution.
- Proficiency in scripting languages (Python, Bash).
- Ownership approach to engineering and product outcomes. Proactively spotting problems, areas for improvement & performance bottlenecks
- Excellent interpersonal communication, negotiation, and influencing skills.
- Experience with CI/CD pipelines (Harness) and Monitoring (Splunk, Dynatrace).
- Proven experience in SRE practices, including automation, monitoring, and performance tuning.
- Flexible working in shifts and on-call.
Responsibilities
- Designing and maintaining monitoring solutions for infrastructure, application performance, and user experience.
- Implementing automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
- Ensuring application reliability, availability, and performance, minimizing downtime and optimizing response times.
- Leading incident response, including identification, triage, resolution, and post-incident analysis.
- Conducting capacity planning, performance tuning, and resource optimization.
- Collaborating with security teams to implement best practices and ensure compliance.
- Managing deployment pipelines and configuration management for consistent and reliable app deployments.
- Developing and testing disaster recovery plans and backup strategies.
- Collaborating with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
- Participating in on-call rotations and providing 24/7 support for critical incidents.
Skills
.NETAPIAkamai ION/GTMAutoFailover ToolsBashCI/CDDockerDynatraceEnterprise Application serversGitHarnessJavaKubernetesMicroservicesMiddlewareMonitoringNETOn-callPythonREST APIsService BusServiceNowSolarWinds (Ignite)SplunkSQLTFSUnix/LinuxWAFWebWireshark
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free