All jobs

Site Reliability Engineer(Bilingual Mandarin)

Comrise

New York · On-site Full-time 3mo ago

Apply with a tailored resume Save job

About the role

Site Reliability Engineer (SRE)

Responsibilities

Global Architecture & Disaster Recovery
Participate in the design and implementation of the company’s global infrastructure architecture. Responsible for cross-region architecture, disaster recovery, and high availability. Enable multi-region deployment, failover, and fault isolation for critical services to improve overall system stability for overseas businesses.
Infrastructure Platform Deployment & Operations
Manage the deployment, operation, and continuous optimization of core infrastructure platforms (e.g., release systems, monitoring and alerting, configuration management, service governance, traffic scheduling) in overseas regions. Ensure consistency and reliability between domestic and international platforms.
Reliability Engineering & Incident Response
Build and maintain reliability systems for overseas services, including observability (monitoring, logging, tracing), incident response, root cause analysis, and postmortem processes. Lead cross-team coordination during major incidents to quickly restore services and drive systemic improvements.
Global Technical Solution Implementation
Understand overseas business requirements and architecture. Drive the implementation of infrastructure capabilities in global environments, including multi-region architecture, network and data architecture optimization, and adaptation of core services.
Cross-team Collaboration & System Standardization
Work closely with domestic infrastructure, product engineering, and platform teams to align overseas systems with internal architecture standards. Develop and promote best practices for global system reliability.

Qualifications

Reliability & SRE Experience
Strong understanding of large-scale system reliability. Experience in high availability architecture design, incident management, capacity planning, and system resilience. Background in SRE, platform engineering, or infrastructure teams is preferred.
Global / Multi-region Architecture Experience
Experience with cross-region architecture and disaster recovery systems, including multi-region deployment, traffic routing, data replication, and failover mechanisms. Experience with global infrastructure or overseas systems is preferred.
Core Technical Skills
Strong knowledge of Linux systems, networking, and common middleware (e.g., MySQL, Redis, Kafka). Familiarity with cloud-native infrastructure (e.g., Kubernetes, Service Mesh) and observability systems (monitoring, logging, tracing).
Development & Automation Skills
Proficient in at least one programming language such as Python, Go, or Java. Experience building automation tools, reliability platforms, or infrastructure systems.
Problem-solving & Collaboration
Strong troubleshooting and analytical skills, with the ability to quickly identify issues in complex systems. Excellent communication and teamwork skills.
Language Skills
Business level Fluency in both English and Mandarin, with the ability to communicate effectively in a global team environment.

Preferred Qualifications

Experience with multi-cloud or global cloud providers (e.g., AWS, GCP, Azure, Alibaba Cloud International, Volcano Engine International)
Experience with cross-region disaster recovery and traffic routing (e.g., DNS, GSLB, Anycast, Global Load Balancing)
Experience with reliability engineering practices (e.g., Chaos Engineering, resilience testing, automated recovery)
Bilingual in English and Chinese is preferred.

Skills

GoJavaKafkaKubernetesLinuxMySQLPythonRedisService Mesh

Similar roles

Software Engineer

Google

$147k – $211k/yr

Senior Database Engineer

Glencore AG

Team Leads

imagino

€70k – €110k/yr

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free