AI Infrastructure Engineer

High Trail

Mundelein · On-site Full-time 3mo ago

About the role

Overview

Build and own the observability and diagnostics layer for a real-time AI assistant platform. You’ll make complex AI systems transparent, debuggable, and reliable by enabling end-to-end tracing, rapid root-cause analysis, and real-time monitoring.

Responsibilities

Design event tracing across AI decisioning, workflows, and real-time communication systems
Build automated pipelines to detect, classify, and analyze system failures
Create dashboards for real-time and post-session visibility (timelines, decision paths, errors)
Monitor live sessions and surface alerts for anomalies (latency, loops, failed actions)
Enable human intervention tools for in-session issue handling
Identify recurring failure patterns and drive system improvements
Implement automated triage and alerting to route issues to the right teams