QI
Calypso Site Reliability Engineer (SRE)
Quinnox Inc
New York · Hybrid Full-time Senior $90k – $100k/yr Today
About the role
Required Skills & Experience
- 6–10+ years of experience in Site Reliability Engineering / DevOps / Production Support in capital markets platforms/enterprise applications
- [experience with Calypso V17/V18 is added advantage]
- Strong hands-on experience with:
- Amazon Web Services (EC2, S3, networking, IAM, VPC)
- GitLab CI/CD pipelines
- Scripting: PowerShell, Bash/Shell [Python is added advantage]
- Experience with:
- Monitoring tools (e.g., ELK, Prometheus, Grafana, Splunk)
- CI/CD and release automation
- Infrastructure as Code (Terraform, CloudFormation – preferred)
- Strong understanding of:
- Linux/Unix systems
- Networking fundamentals and cloud architecture
- Basic Database concepts (Oracle/SQL)
- Experience supporting high-availability, low-latency enterprise systems
Roles & Responsibilities
- Own reliability, availability, and performance of Calypso across production and non-production environments
- Design, implement, and operate end-to-end SRE practices, including monitoring, alerting, incident management, and capacity planning
- Build and manage CI/CD pipelines using GitLab, enabling automated build, deployment, and release of Calypso components
- Automate deployment and environment provisioning on Amazon Web Services (AWS) using Infrastructure as Code (IaC) principles
- Develop and maintain automation scripts using PowerShell, Shell (Bash), and Python for operational tasks, deployments, and monitoring
- Ensure high availability and resiliency of Calypso services through failover strategies, clustering, and disaster recovery planning
- Implement observability frameworks, including logging, metrics, and distributed tracing for proactive issue detection
- Define and monitor SLOs/SLIs/SLAs, ensuring system performance meets business expectations in a trading environment
- Lead incident management and root cause analysis (RCA), ensuring quick resolution of production issues and prevention of recurrence
- Optimize system performance, including JVM tuning, database performance, and application-level optimizations for high-volume trade processing
- Manage environment stability, including handling batch jobs, EOD processing, and trade lifecycle events in Calypso
- Collaborate with development, QA, and infrastructure teams to ensure smooth releases and production readiness
- Implement security best practices, including access controls, secrets management, and compliance with regulatory requirements
- Support release management and deployment strategies, including blue-green deployments, canary releases, and rollback mechanisms
- Drive continuous improvement and automation, reducing manual intervention and improving system reliability
- Maintain runbooks, playbooks, and operational documentation for support and incident handling
- Support production releases and provide hypercare support, ensuring system stability during critical business cycles
Pay
$90,000.00 - $100,000.00 per year
Work Location
Hybrid remote in New York, NY 10040
Skills
AWS LambdaBashCloudFormationDockerELKGitLab CI/CDGrafanaIAMInfrastructure as CodeLinuxMonitoringOraclePostgreSQLPrometheusPythonPowerShellSQLS3TerraformUnixVPC
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free