Skip to content
mimi

Calypso Site Reliability Engineer (SRE)

Quinnox Inc

New York · Hybrid Full-time Senior $90k – $100k/yr Today

About the role

Required Skills & Experience

  • 6–10+ years of experience in Site Reliability Engineering / DevOps / Production Support in capital markets platforms/enterprise applications
  • [experience with Calypso V17/V18 is added advantage]
  • Strong hands-on experience with:
    • Amazon Web Services (EC2, S3, networking, IAM, VPC)
    • GitLab CI/CD pipelines
    • Scripting: PowerShell, Bash/Shell [Python is added advantage]
  • Experience with:
    • Monitoring tools (e.g., ELK, Prometheus, Grafana, Splunk)
    • CI/CD and release automation
    • Infrastructure as Code (Terraform, CloudFormation – preferred)
  • Strong understanding of:
    • Linux/Unix systems
    • Networking fundamentals and cloud architecture
    • Basic Database concepts (Oracle/SQL)
  • Experience supporting high-availability, low-latency enterprise systems

Roles & Responsibilities

  • Own reliability, availability, and performance of Calypso across production and non-production environments
  • Design, implement, and operate end-to-end SRE practices, including monitoring, alerting, incident management, and capacity planning
  • Build and manage CI/CD pipelines using GitLab, enabling automated build, deployment, and release of Calypso components
  • Automate deployment and environment provisioning on Amazon Web Services (AWS) using Infrastructure as Code (IaC) principles
  • Develop and maintain automation scripts using PowerShell, Shell (Bash), and Python for operational tasks, deployments, and monitoring
  • Ensure high availability and resiliency of Calypso services through failover strategies, clustering, and disaster recovery planning
  • Implement observability frameworks, including logging, metrics, and distributed tracing for proactive issue detection
  • Define and monitor SLOs/SLIs/SLAs, ensuring system performance meets business expectations in a trading environment
  • Lead incident management and root cause analysis (RCA), ensuring quick resolution of production issues and prevention of recurrence
  • Optimize system performance, including JVM tuning, database performance, and application-level optimizations for high-volume trade processing
  • Manage environment stability, including handling batch jobs, EOD processing, and trade lifecycle events in Calypso
  • Collaborate with development, QA, and infrastructure teams to ensure smooth releases and production readiness
  • Implement security best practices, including access controls, secrets management, and compliance with regulatory requirements
  • Support release management and deployment strategies, including blue-green deployments, canary releases, and rollback mechanisms
  • Drive continuous improvement and automation, reducing manual intervention and improving system reliability
  • Maintain runbooks, playbooks, and operational documentation for support and incident handling
  • Support production releases and provide hypercare support, ensuring system stability during critical business cycles

Pay

$90,000.00 - $100,000.00 per year

Work Location

Hybrid remote in New York, NY 10040

Skills

AWS LambdaBashCloudFormationDockerELKGitLab CI/CDGrafanaIAMInfrastructure as CodeLinuxMonitoringOraclePostgreSQLPrometheusPythonPowerShellSQLS3TerraformUnixVPC

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free