Skip to content
mimi

Site Reliability Engineer (SRE)

Blankfactor

Hackensack · On-site Full-time Mid Level $100k – $125k/yr 6d ago

About the role

About

As a Site Reliability Engineer, you will ensure the reliability, availability, and performance of mission-critical platforms by building scalable systems, robust automation, and data-driven operations. You will partner closely with development, cloud, infrastructure, and security teams to deliver resilient, high-performing services that support the way people live and work today.

What You’ll Do

  • Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
  • Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and rapid incident response.
  • Lead incident management efforts, perform root cause analysis, and implement action-oriented post-mortem improvements.
  • Automate operational workflows using scripting, IaC, and configuration management tools.
  • Analyze capacity, performance, and usage trends to forecast demand and optimize
  • Collaborate with engineering teams to embed operability, resilience, and security into application and architecture designs.
  • Support safe, reliable deployments through CI/CD pipelines, release governance, and change control.
  • Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.

Experience

Required:

  • Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and orchestration.
  • Experience in public cloud platforms (AWS, Azure, or GCP) across compute, storage, networking, IAM, and cost governance.
  • Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana.
  • Implementing security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secrets management and vulnerability remediation.
  • Infrastructure as Code experience using Terraform, Cloud Formation, Ansible, or similar tools.
  • Designing and maintaining CI/CD pipelines using Jenkins, Git Lab CI, Git Hub Actions.
  • Scripting and automation using Bash, Power Shell, or Python.
  • Equivalent combination of education, experience, and/or military background.
  • Key point is the experience on projects with high volume transactions and taking care of Zero data loss is a must which primarily in banking and payment projects.

Good to Have

  • Certifications such as AWS Sys Ops Administrator, AWS Dev Ops Engineer, Google Cloud Dev Ops Engineer, or CKA.
  • Experience with Premier applications, IBM iSeries, and/or Unisys systems.
  • Hands-on database operations and performance tuning (Oracle, SQL Server).
  • Proven experience in major incident command, stakeholder communication.
  • Experience with ITIL and Service Now (change, problem, and configuration).

Skills

AnsibleAWSAzureBashCloud FormationDynatraceGCPGit Hub ActionsGit Lab CIGrafanaIBM iSeriesJenkinsKubernetesOraclePower ShellPrometheusPythonSQL ServerSplunkTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free