Skip to content
mimi

Site Reliability Engineering

HNM Solutions

France · On-site Contract 3d ago

About the role

Job Description

Mission Context

As part of strengthening observability and operational reliability practices, the client wishes to integrate a Site Reliability Engineer (SRE) specializing in Datadog.

The objective is to consolidate monitoring, improve the quality of operations, anticipate incidents, and ensure the reliability of critical applications within the scope.

This role involves working within the Ops/Platform/Production teams, in direct collaboration with the Dev, Cloud, and Security teams

Main Responsibilities:

Datadog Observability & Monitoring

Define, implement, and optimize Datadog dashboards.

Configure Application Performance Monitoring (APM).

Implement monitors (alerts, probes, dynamic thresholds).

Set up business and technical dashboards for application teams.

Monitor logs, metrics, traces, and events via Datadog Logs/Metrics/Tracing.

Reliability & Performance (SRE)

Analyze recurring incidents and propose remediation actions.

Define and monitor SLOs/SLIs/SLAs.

Participate in reducing overhead and automating repetitive tasks.

Participate in post-mortems (blameless) and the implementation of corrective actions.

Automation & CI/CD Pipeline

Automate Datadog configuration via Terraform/Ansible/CI/CD.

Participate in Datadog integration into deployment pipelines.

Contribute to the industrialization of observability practices.

Collaboration & Technical Support

Support Dev and Ops teams on proper application instrumentation.

Raise awareness of observability (best practices) among teams.

Participate in the creation of standardized dashboard/alert templates.

Project Type (support/maintenance/implementation/…): Support

Service Type (Client/Hybrid/Remote): Hybrid

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free