Site Reliability Engineer

Thales

Nice · On-site Full-time Mid Level 2mo ago

About the role

About Thales

Thales is a global leader in high technology, specializing in three business sectors: Defense & Security, Aeronautics & Space, and Cyber & Digital. It develops products and solutions that contribute to a safer, more environmentally friendly, and more inclusive world. The Group invests nearly €4 billion per year in Research & Development, particularly in key innovation areas such as AI, cybersecurity, quantum, cloud technologies, and 6G. Thales has nearly 81,000 employees in 68 countries.

Our Commitments, Your Benefits

A success driven by our technological excellence, your experience, and our shared ambition
An attractive remuneration package
Continuous skills development: training courses, academies, and internal communities
An inclusive, benevolent environment that respects employee balance
Recognized societal and environmental commitment

Your Daily Life

At the heart of the PACA region's Silicon Valley, our site brings together our activities developing cutting-edge sonars equipping submarines and surface vessels, as well as digital services activities. A pioneer in simulation products, the site mobilizes in-depth expertise in acoustics and signal processing.

We are seeking a Site Reliability Engineer to ensure the high level of service and operation excellence for the development of the innovative and ambitious Telecommunication solution (high availability, strong performance constraints) deployed in the public cloud. This product requires the establishment of a product specific SRE team.

Essential Functions

Automation & Infrastructure as Code: Design, build, and maintain scalable infrastructure using tools such as Terraform, Ansible, and Kubernetes. Develop automated CI/CD pipelines via GitLab to reduce manual toil.
Availability & Reliability Engineering: Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Manage "Error Budgets" to balance the velocity of new features with the stability of the platform.
Incident Management & On-Call Support: Participate in 24/7 on-call rotations to provide emergency response and perform deep-dive troubleshooting for production issues.
Performance & Capacity Planning: Conduct system performance analysis, identify bottlenecks, and perform capacity planning to ensure the infrastructure can handle growth and peak loads.
Observability & Monitoring: Implement and refine symptom-based alerting and comprehensive monitoring strategies using platforms like Datadog to ensure high visibility into system health.
Continuous Improvement & Postmortems: Lead blameless postmortems after incidents to identify root causes and implement long-term technical fixes to prevent recurrence.
Security & Compliance Collaboration: Partner with Cloud Security teams to implement security best practices, manage access controls, and respond to security breaches or vulnerabilities.
Support customer relationship
Interface with other stakeholders to define solution improvement plan
You will have the ownership of solution service availability.

Minimum Requirements

Education:

Engineer or equivalent

Experience:

at least 1 year experience

Skills and Abilities:

Java development skill is required.
You are familiar with Public Cloud (GCP, AWS), containers and microservices (Docker, Kubernetes, Java), CI/CD and automation (Jenkins, Gitlab, Helm), NoSQL database.

Certification

GCP cloud architect certification is a plus

Preferred Qualifications

You have already set up product monitoring and the underlying infrastructure
You have development experience in a distributed systems and/or high availability context
You are familiar with microservices development
You participated in the definition of architectures, data structures, algorithms with performance, security, reliability constraints, etc.
Public cloud architect certification
You are interested in aspects of Site Reliability Engineer: CI/CD, automation, monitoring and observability, and continuous improvement.
You are an accomplished, versatile and multi-tasking developer engineer.

Thales, an "Handi-Engagée" company, recognizes all talents. Diversity is our greatest asset. Apply and join us!

Skills

AnsibleAWSCI/CDDatadogDockerGCPGitLabHelmInfrastructure as CodeJavaJenkinsKubernetesMicroservicesNoSQLTerraform

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free

Site Reliability Engineer

About the role

About Thales

Our Commitments, Your Benefits

Your Daily Life

Essential Functions

Minimum Requirements

Preferred Qualifications

Skills

Similar roles

MCP Engineer / AI Backend Engineer

Senior Database Engineer

Team Leads

Don't send a generic resume