Skip to content
mimi

Systems Analyst 3 (Site Reliability Engineer) - 26-03282

NavitasPartners

Round Rock · On-site Senior 5d ago

About the role

Job Title

Systems Analyst 3 (Site Reliability Engineer)

Location

Austin, TX

Job Duration

4 months

Position Overview

We are seeking an experienced Systems Analyst with a strong focus on Site Reliability Engineering (SRE). This role involves ensuring the reliability, availability, performance, and scalability of production systems by applying software engineering principles to infrastructure and operations.

The ideal candidate will partner with development teams to build resilient, observable, and automated platforms aligned with defined Service Level Objectives (SLOs).

Key Responsibilities

  • Analyze business objectives and technical requirements to propose effective solutions
  • Perform cost/benefit analysis and evaluate alternative approaches
  • Gather and document user requirements, workflows, and system processes
  • Design, implement, and support highly available distributed systems
  • Collaborate with cross-functional teams to improve system performance and reliability
  • Develop detailed documentation including system designs, runbooks, and reports
  • Monitor system performance and implement improvements for scalability and efficiency
  • Lead incident response, root cause analysis (RCA), and postmortem processes
  • Implement monitoring, alerting, and logging best practices
  • Ensure security and compliance are integrated into system operations

Minimum Requirements

  • 8+ years of experience in Systems Engineering, DevOps, or Site Reliability Engineering
  • Strong expertise in Linux/Unix systems and system internals
  • Proficiency in programming/scripting (Python, Go, Java, Bash)
  • Experience designing and operating highly available distributed systems
  • Hands‑on experience with cloud platforms (AWS or GCP)
  • Experience with containerization and orchestration tools (Docker, Kubernetes)
  • Strong knowledge of monitoring, alerting, and logging frameworks
  • Experience defining and managing SLIs, SLOs, and error budgets
  • Familiarity with incident management, RCA, and postmortem practices
  • Experience integrating security and compliance into operational workflows

Preferred Qualifications

  • Experience with observability tools (Prometheus, Grafana, Datadog, Splunk, etc.)
  • Experience supporting 24x7 production environments and on‑call rotations
  • Familiarity with chaos engineering and resiliency testing
  • Experience with canary deployments, feature flags, and progressive delivery
  • Strong documentation skills (runbooks, dashboards, operational standards)

Contact

For more details reach at resumes@navitassols.com

Requirements

  • Strong expertise in Linux/Unix systems and system internals
  • Proficiency in programming/scripting (Python, Go, Java, Bash)
  • Experience designing and operating highly available distributed systems
  • Hands-on experience with cloud platforms (AWS or GCP)
  • Experience with containerization and orchestration tools (Docker, Kubernetes)
  • Strong knowledge of monitoring, alerting, and logging frameworks
  • Experience defining and managing SLIs, SLOs, and error budgets
  • Familiarity with incident management, RCA, and postmortem practices
  • Experience integrating security and compliance into operational workflows

Responsibilities

  • Analyze business objectives and technical requirements to propose effective solutions
  • Perform cost/benefit analysis and evaluate alternative approaches
  • Gather and document user requirements, workflows, and system processes
  • Design, implement, and support highly available distributed systems
  • Collaborate with cross-functional teams to improve system performance and reliability
  • Develop detailed documentation including system designs, runbooks, and reports
  • Monitor system performance and implement improvements for scalability and efficiency
  • Lead incident response, root cause analysis (RCA), and postmortem processes
  • Implement monitoring, alerting, and logging best practices
  • Ensure security and compliance are integrated into system operations

Skills

AWSBashDockerGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonSplunkUnix

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free