DN
SRE Architect (Splunk | Linux | Python
Data Nexus AI
New York · On-site Contract Senior 3w ago
About the role
Key Responsibilities
- Design, implement, and maintain enterprise observability solutions using Splunk Enterprise including dashboards, alerts, and data ingestion pipelines
- Develop and enhance monitoring frameworks for infrastructure, applications, and web platforms
- Automate operational processes using Linux shell scripting and Python
- Implement intelligent alerting strategies to reduce noise and improve incident response efficiency
- Provide L3 production support for business-critical applications and infrastructure
- Support cloud and containerized deployments across AWS and Kubernetes environments
- Collaborate with engineering teams to standardize logging and telemetry practices
- Drive root cause analysis, post-incident reviews, and continuous reliability improvements
- Build operational runbooks, disaster recovery procedures, and service continuity plans
- Integrate monitoring and deployment workflows with CI/CD tools such as Jenkins, Git, and TeamCity
- Support database monitoring and performance analysis across SQL Server, Oracle, DB2, and MySQL platforms
- Participate in ITIL-based change, incident, and problem management processes
Required Skills
- Strong hands-on expertise in Splunk engineering, administration, and architecture
- Advanced experience in Linux / Unix environments
- Proficiency in Python, Shell scripting, and automation frameworks
- Experience with AWS cloud services and Kubernetes / Docker platforms
- Knowledge of monitoring tools such as Nagios and custom observability solutions
- Experience supporting high-availability web platforms and distributed systems
- Strong troubleshooting and production incident management skills
- Understanding of CI/CD pipelines and deployment automation
- Familiarity with ITIL processes and service management tools like ServiceNow
Preferred Qualifications
- Splunk certifications (Power User / Admin / Architect)
- Experience building large-scale telemetry platforms
- Background in financial services or high-transaction enterprise environments
- Experience designing intelligent alerting and automated incident workflows
Experience Level
- 15+ years in production engineering / SRE / observability roles
- Prior experience supporting mission-critical enterprise systems
Skills
AWSDB2DockerGitJenkinsKubernetesLinuxMySQLNagiosOraclePythonSQL ServerSplunkTeamCityUnix
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free