Skip to content
mimi

Site Reliability Engineer(Bilingual Mandarin)

Comrise

New York · On-site Full-time 2d ago

About the role

Site Reliability Engineer (SRE)

Responsibilities

  • Global Architecture & Disaster Recovery
    Participate in the design and implementation of the company’s global infrastructure architecture. Responsible for cross-region architecture, disaster recovery, and high availability. Enable multi-region deployment, failover, and fault isolation for critical services to improve overall system stability for overseas businesses.

  • Infrastructure Platform Deployment & Operations
    Manage the deployment, operation, and continuous optimization of core infrastructure platforms (e.g., release systems, monitoring and alerting, configuration management, service governance, traffic scheduling) in overseas regions. Ensure consistency and reliability between domestic and international platforms.

  • Reliability Engineering & Incident Response
    Build and maintain reliability systems for overseas services, including observability (monitoring, logging, tracing), incident response, root cause analysis, and postmortem processes. Lead cross-team coordination during major incidents to quickly restore services and drive systemic improvements.

  • Global Technical Solution Implementation
    Understand overseas business requirements and architecture. Drive the implementation of infrastructure capabilities in global environments, including multi-region architecture, network and data architecture optimization, and adaptation of core services.

  • Cross-team Collaboration & System Standardization
    Work closely with domestic infrastructure, product engineering, and platform teams to align overseas systems with internal architecture standards. Develop and promote best practices for global system reliability.

Qualifications

  • Reliability & SRE Experience
    Strong understanding of large-scale system reliability. Experience in high availability architecture design, incident management, capacity planning, and system resilience. Background in SRE, platform engineering, or infrastructure teams is preferred.

  • Global / Multi-region Architecture Experience
    Experience with cross-region architecture and disaster recovery systems, including multi-region deployment, traffic routing, data replication, and failover mechanisms. Experience with global infrastructure or overseas systems is preferred.

  • Core Technical Skills
    Strong knowledge of Linux systems, networking, and common middleware (e.g., MySQL, Redis, Kafka). Familiarity with cloud-native infrastructure (e.g., Kubernetes, Service Mesh) and observability systems (monitoring, logging, tracing).

  • Development & Automation Skills
    Proficient in at least one programming language such as Python, Go, or Java. Experience building automation tools, reliability platforms, or infrastructure systems.

  • Problem-solving & Collaboration
    Strong troubleshooting and analytical skills, with the ability to quickly identify issues in complex systems. Excellent communication and teamwork skills.

  • Language Skills
    Business level Fluency in both English and Mandarin, with the ability to communicate effectively in a global team environment.

Preferred Qualifications

  • Experience with multi-cloud or global cloud providers (e.g., AWS, GCP, Azure, Alibaba Cloud International, Volcano Engine International)
  • Experience with cross-region disaster recovery and traffic routing (e.g., DNS, GSLB, Anycast, Global Load Balancing)
  • Experience with reliability engineering practices (e.g., Chaos Engineering, resilience testing, automated recovery)
  • Bilingual in English and Chinese is preferred.

Skills

GoJavaKafkaKubernetesLinuxMySQLPythonRedisService Mesh

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free