B
Site Reliability Engineer (SRE)
Blankfactor
Hackensack · On-site Full-time Mid Level $100k – $125k/yr 6d ago
About the role
About
As a Site Reliability Engineer, you will ensure the reliability, availability, and performance of mission-critical platforms by building scalable systems, robust automation, and data-driven operations. You will partner closely with development, cloud, infrastructure, and security teams to deliver resilient, high-performing services that support the way people live and work today.
What You’ll Do
- Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
- Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and rapid incident response.
- Lead incident management efforts, perform root cause analysis, and implement action-oriented post-mortem improvements.
- Automate operational workflows using scripting, IaC, and configuration management tools.
- Analyze capacity, performance, and usage trends to forecast demand and optimize
- Collaborate with engineering teams to embed operability, resilience, and security into application and architecture designs.
- Support safe, reliable deployments through CI/CD pipelines, release governance, and change control.
- Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience
Required:
- Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and orchestration.
- Experience in public cloud platforms (AWS, Azure, or GCP) across compute, storage, networking, IAM, and cost governance.
- Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana.
- Implementing security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secrets management and vulnerability remediation.
- Infrastructure as Code experience using Terraform, Cloud Formation, Ansible, or similar tools.
- Designing and maintaining CI/CD pipelines using Jenkins, Git Lab CI, Git Hub Actions.
- Scripting and automation using Bash, Power Shell, or Python.
- Equivalent combination of education, experience, and/or military background.
- Key point is the experience on projects with high volume transactions and taking care of Zero data loss is a must which primarily in banking and payment projects.
Good to Have
- Certifications such as AWS Sys Ops Administrator, AWS Dev Ops Engineer, Google Cloud Dev Ops Engineer, or CKA.
- Experience with Premier applications, IBM iSeries, and/or Unisys systems.
- Hands-on database operations and performance tuning (Oracle, SQL Server).
- Proven experience in major incident command, stakeholder communication.
- Experience with ITIL and Service Now (change, problem, and configuration).
Skills
AnsibleAWSAzureBashCloud FormationDynatraceGCPGit Hub ActionsGit Lab CIGrafanaIBM iSeriesJenkinsKubernetesOraclePower ShellPrometheusPythonSQL ServerSplunkTerraform
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free