Calypso Site Reliability Engineer (SRE)

Quinnox Inc

New York · Hybrid Full-time Senior $90k – $100k/yr 1mo ago

Apply with a tailored resume Save job

About the role

Required Skills & Experience

6–10+ years of experience in Site Reliability Engineering / DevOps / Production Support in capital markets platforms/enterprise applications
[experience with Calypso V17/V18 is added advantage]
Strong hands-on experience with:
- Amazon Web Services (EC2, S3, networking, IAM, VPC)
- GitLab CI/CD pipelines
- Scripting: PowerShell, Bash/Shell [Python is added advantage]
Experience with:
- Monitoring tools (e.g., ELK, Prometheus, Grafana, Splunk)
- CI/CD and release automation
- Infrastructure as Code (Terraform, CloudFormation – preferred)
Strong understanding of:
- Linux/Unix systems
- Networking fundamentals and cloud architecture
- Basic Database concepts (Oracle/SQL)
Experience supporting high-availability, low-latency enterprise systems

Roles & Responsibilities

Own reliability, availability, and performance of Calypso across production and non-production environments
Design, implement, and operate end-to-end SRE practices, including monitoring, alerting, incident management, and capacity planning
Build and manage CI/CD pipelines using GitLab, enabling automated build, deployment, and release of Calypso components
Automate deployment and environment provisioning on Amazon Web Services (AWS) using Infrastructure as Code (IaC) principles
Develop and maintain automation scripts using PowerShell, Shell (Bash), and Python for operational tasks, deployments, and monitoring
Ensure high availability and resiliency of Calypso services through failover strategies, clustering, and disaster recovery planning
Implement observability frameworks, including logging, metrics, and distributed tracing for proactive issue detection
Define and monitor SLOs/SLIs/SLAs, ensuring system performance meets business expectations in a trading environment
Lead incident management and root cause analysis (RCA), ensuring quick resolution of production issues and prevention of recurrence
Optimize system performance, including JVM tuning, database performance, and application-level optimizations for high-volume trade processing
Manage environment stability, including handling batch jobs, EOD processing, and trade lifecycle events in Calypso
Collaborate with development, QA, and infrastructure teams to ensure smooth releases and production readiness
Implement security best practices, including access controls, secrets management, and compliance with regulatory requirements
Support release management and deployment strategies, including blue-green deployments, canary releases, and rollback mechanisms
Drive continuous improvement and automation, reducing manual intervention and improving system reliability
Maintain runbooks, playbooks, and operational documentation for support and incident handling
Support production releases and provide hypercare support, ensuring system stability during critical business cycles

Pay

$90,000.00 - $100,000.00 per year

Work Location

Hybrid remote in New York, NY 10040

Skills

AWS LambdaBashCloudFormationDockerELKGitLab CI/CDGrafanaIAMInfrastructure as CodeLinuxMonitoringOraclePostgreSQLPrometheusPythonPowerShellSQLS3TerraformUnixVPC

Similar roles

MCP Engineer / AI Backend Engineer

Ruby Labs

Senior Database Engineer

Glencore AG

Team Leads

imagino

€70k – €110k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free