Skip to content
mimi

Senior SRE Engineer

Aklip Technologies LLC

US · Hybrid Full-time Senior 3w ago

About the role

Role Summary

We are seeking a Senior SRE with strong expertise in Unified Observability, proactive detection, AIOps, and GenAI-driven operations to support complex, distributed financial services platforms. The role requires hands-on experience designing SLI/SLO-driven monitoring, dynamic thresholds, intelligent alerting, and AI/ML-based anomaly detection across multi-stream architectures.

Key Responsibilities

Observability & Reliability Engineering

  • Design and implement unified observability dashboards across metrics, logs, traces, events, and topology
  • Define and manage SLIs, SLOs, and error budgets aligned to business outcomes
  • Build actionable dashboards for operations, engineering, and leadership
  • Implement alerting strategies using static and dynamic thresholds

Proactive Detection & AIOps

  • Leverage AI/ML/AIOps to detect anomalies, predict incidents, and reduce MTTR
  • Transition monitoring from reactive alerts to proactive insights
  • Implement noise reduction, alert correlation, and root cause analysis
  • Apply baseline modeling, seasonality detection, and anomaly scoring

Distributed Systems & Dependency Analysis

  • Monitor and troubleshoot multi-service architectures involving:
    • Microservices
    • Downstream APIs
    • Kafka / streaming platforms
    • Cloud infrastructure (Terraform, IaC)
  • Identify whether issues originate from:
    • Upstream/downstream dependencies
    • Streaming platform
    • Infrastructure
    • Application code

Tooling & Platforms

  • Deep hands-on experience with Dynatrace (mandatory)
  • Experience with:
    • OpenTelemetry
    • Prometheus / Grafana
    • ELK / EFK
    • Cloud-native monitoring (AWS/Azure/GCP)
  • Strong JSON-based telemetry manipulation and enrichment

GenAI & LLM Enablement

  • Apply GenAI / LLMs for:
    • Incident summarization
    • Root cause explanation
    • Runbook recommendations
    • Auto-remediation suggestions
  • Collaborate with platform teams to operationalize GenAI safely

Skills

AWSAzureCloud-native monitoringDynatraceELKEFKGenAIGCPGrafanaIaCJSONKafkaLLMMicroservicesOpenTelemetryPrometheusTerraformUnified Observability

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free