ML
Manager SRE
Mphasis Limited
Elizabeth · On-site Full-time Lead $80k – $100k/yr 2d ago
About the role
Role description
Job Description
Role: Manager SRE
Automation Lead – Leading Automation SRE, Responsible to perform end to end Self‑Healing automation solution to reduce manual effort/TOIL.
Primary Skill – Observability, Telemetry and event co‑reliation
Secondary Skill – Shell Script, Linux, Monitoring tools - Big Panda – Splunk, AppD etc.
Automation Engineer
- 15+ years of experience in leading Automation SRE teams
- Advanced working experience with two or more of the following: Unix/Linux, Windows Server, Oracle, MSSQL, Mongo DB.
- Experience with Python, Java, Curl scripting or any other types of scripting.
- Experience with two or more of the following observability tools: App Dynamics, Big Panda, Elastic Search (ELK), Google Cloud Logging, Grafana, Prometheus, Splunk, Thousand Eyes.
- Experience with logging, monitoring, and event detection on Cloud or Distributed platforms.
- Experience working with one or more of the following: Auto Sys, CRON, Windows Scheduler or other logical batch schedulers.
- Provides technical direction regarding monitoring and logging to less experienced staff or develops highly complex original solutions. Acts as an Expert technical resource for modeling, simulation and analysis efforts.
- Experience creating and modifying technical documentation such as environment flow, functional requirements, nonfunctional requirements.
- Outstanding problem solving and analytical skills with ability to turn findings into strategic imperatives.
- Technical operations application support experience.
- Minimum 4-6 years of hands‑on experience into SRE implementation of monitoring system development for application reliability using Splunk, Grafana, App Dynamics, Big panda.
- Completely On‑Prim environment, so we would require strong candidates on the above skills.
- Overall, we are looking for an Automation Engineer, who could reduce the toil issues and enhance the system towards reliability and scalability.
Nature of the Job
- Collaborate with Production support team, identify the existing manual activities, and automate.
- Identify toil area where it can be automated to avoid manual intervention.
- Build Monitoring system and observability platform for more Stack traces and dashboards.
- Ability to define SLA, SLO and SLI and implement the same for better monitoring.
- Scalability, reliability, and observability are the primary goals for reduction of MTTD and MTTR.
Other details
Deputation Location : US – New Jersey – New Jersey
Skills
App DynamicsBig PandaCurl scriptingELKGrafanaGoogle Cloud LoggingJavaLinuxMongo DBMonitoring toolsMSSQLObservabilityOraclePrometheusPythonShell ScriptSplunkTelemetryThousand EyesUnixWindows Server
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free