Skip to content
mimi

Site Reliability Engineer

Thales

Nice · On-site Full-time Mid Level 2w ago

About the role

About Thales

Thales is a global leader in high technology, specializing in three business sectors: Defense & Security, Aeronautics & Space, and Cyber & Digital. It develops products and solutions that contribute to a safer, more environmentally friendly, and more inclusive world. The Group invests nearly €4 billion per year in Research & Development, particularly in key innovation areas such as AI, cybersecurity, quantum, cloud technologies, and 6G. Thales has nearly 81,000 employees in 68 countries.

Our Commitments, Your Benefits

  • A success driven by our technological excellence, your experience, and our shared ambition
  • An attractive remuneration package
  • Continuous skills development: training courses, academies, and internal communities
  • An inclusive, benevolent environment that respects employee balance
  • Recognized societal and environmental commitment

Your Daily Life

At the heart of the PACA region's Silicon Valley, our site brings together our activities developing cutting-edge sonars equipping submarines and surface vessels, as well as digital services activities. A pioneer in simulation products, the site mobilizes in-depth expertise in acoustics and signal processing.

We are seeking a Site Reliability Engineer to ensure the high level of service and operation excellence for the development of the innovative and ambitious Telecommunication solution (high availability, strong performance constraints) deployed in the public cloud. This product requires the establishment of a product specific SRE team.

Essential Functions

  • Automation & Infrastructure as Code: Design, build, and maintain scalable infrastructure using tools such as Terraform, Ansible, and Kubernetes. Develop automated CI/CD pipelines via GitLab to reduce manual toil.
  • Availability & Reliability Engineering: Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Manage "Error Budgets" to balance the velocity of new features with the stability of the platform.
  • Incident Management & On-Call Support: Participate in 24/7 on-call rotations to provide emergency response and perform deep-dive troubleshooting for production issues.
  • Performance & Capacity Planning: Conduct system performance analysis, identify bottlenecks, and perform capacity planning to ensure the infrastructure can handle growth and peak loads.
  • Observability & Monitoring: Implement and refine symptom-based alerting and comprehensive monitoring strategies using platforms like Datadog to ensure high visibility into system health.
  • Continuous Improvement & Postmortems: Lead blameless postmortems after incidents to identify root causes and implement long-term technical fixes to prevent recurrence.
  • Security & Compliance Collaboration: Partner with Cloud Security teams to implement security best practices, manage access controls, and respond to security breaches or vulnerabilities.
  • Support customer relationship
  • Interface with other stakeholders to define solution improvement plan
  • You will have the ownership of solution service availability.

Minimum Requirements

Education:

  • Engineer or equivalent

Experience:

  • at least 1 year experience

Skills and Abilities:

  • Java development skill is required.
  • You are familiar with Public Cloud (GCP, AWS), containers and microservices (Docker, Kubernetes, Java), CI/CD and automation (Jenkins, Gitlab, Helm), NoSQL database.

Certification

  • GCP cloud architect certification is a plus

Preferred Qualifications

  • You have already set up product monitoring and the underlying infrastructure
  • You have development experience in a distributed systems and/or high availability context
  • You are familiar with microservices development
  • You participated in the definition of architectures, data structures, algorithms with performance, security, reliability constraints, etc.
  • Public cloud architect certification
  • You are interested in aspects of Site Reliability Engineer: CI/CD, automation, monitoring and observability, and continuous improvement.
  • You are an accomplished, versatile and multi-tasking developer engineer.

Thales, an "Handi-Engagée" company, recognizes all talents. Diversity is our greatest asset. Apply and join us!

Skills

AnsibleAWSCI/CDDatadogDockerGCPGitLabHelmInfrastructure as CodeJavaJenkinsKubernetesMicroservicesNoSQLTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free