Skip to content
mimi

Senior AWS Agentcore Platform Engineer

Jobs via Dice

Exton · Hybrid Contract Senior Today

About the role

Role

Senior AWS Agentcore Platform Engineer

Position Type

Contract to hire after initial 6 months

Location

Reading, PA or Exton, PA (Hybrid 2‑3 days a week from office)

Responsibilities

Observability & Distributed Tracing

  • Gap Analysis: Assess AWS CloudWatch, X‑Ray, Bedrock logging, and AgentCore traces against agentic workflow requirements; produce a comprehensive gap analysis and lead the setup of observability within Dynatrace.
  • Validation Pipelines: Design and implement post‑deployment validation pipelines for agents and Model Context Protocol (MCP) servers, ensuring deployment health and successful tool registration.
  • Tracing & Logging: Implement distributed tracing and structured logging to capture LLM decision logic, tool selections, sub‑agent calls, and MCP interactions.
  • Architecture Strategy: Evaluate LangFuse and LiteLLM proxies against AWS‑native solutions; deliver a target‑state observability architecture recommendation.

Cost Tracking & TCO (Total Cost of Ownership)

  • Taxonomy Expansion: Extend tagging taxonomy to capture costs across agent runtimes, MCP servers, vector databases, and Bedrock token consumption per namespace.
  • Cost Modeling: Design a granular cost visibility model to aggregate expenses for agents, MCPs, and LLM tokens by team and department.
  • Dashboards & Alerting: Build CloudWatch (or equivalent) dashboards for per‑team spending; configure AWS Budgets with proactive alerting thresholds.
  • Automation: Automate cost reporting via email and Microsoft Teams, incorporating anomaly detection rules to identify spend spikes.

Monitoring & Incident Management

  • Alerting Framework: Define and implement P1‑P4 alerting rules covering deployment failures, runtime errors, tool invocation failures, and MCP connectivity issues.
  • Incident Integration: Integrate alert notifications with Microsoft Teams and email, utilizing resource ownership tags for intelligent routing.
  • Operational Excellence: Author detailed runbooks for every alert; publish and maintain these in Confluence to facilitate developer self‑service resolution.
  • Stack Evaluation: Compare AWS‑native vs. third‑party monitoring stacks to deliver a long‑term recommendation aligned with the broader observability architecture.

Security & Governance

  • Risk Assessment: Evaluate current IAM and tagging strategies for multi‑team isolation; identify scalability gaps and potential security risks.
  • Policy Engines: Assess the Cedar policy engine (AgentCore) for fine‑grained tool access control and document gaps for enterprise‑scale deployment.
  • Identity Architecture: Design a scalable Attribute‑Based Access Control (ABAC) identity model to ensure multi‑team isolation without IAM policy sprawl; deliver production‑ready Terraform modules.

Contact

Isaac Rajiv – Kutir Corporation

Requirements

  • Security & Governance
  • Identity Architecture: Design a scalable Attribute-Based Access Control (ABAC) identity model to ensure multi-team isolation without IAM policy sprawl; deliver production-ready Terraform modules

Responsibilities

  • Observability & Distributed Tracing
  • Gap Analysis: Assess AWS CloudWatch, X-Ray, Bedrock logging, and AgentCore traces against agentic workflow requirements; produce a comprehensive gap analysis and lead the setup of observability within Dynatrace
  • Validation Pipelines: Design and implement post-deployment validation pipelines for agents and Model Context Protocol (MCP) servers, ensuring deployment health and successful tool registration
  • Tracing & Logging: Implement distributed tracing and structured logging to capture LLM decision logic, tool selections, sub-agent calls, and MCP interactions
  • Architecture Strategy: Evaluate LangFuse and LiteLLM proxies against AWS-native solutions; deliver a target-state observability architecture recommendation
  • Cost Tracking & TCO (Total Cost of Ownership)
  • Taxonomy Expansion: Extend tagging taxonomy to capture costs across agent runtimes, MCP servers, vector databases, and Bedrock token consumption per namespace
  • Cost Modeling: Design a granular cost visibility model to aggregate expenses for agents, MCPs, and LLM tokens by team and department
  • Dashboards & Alerting: Build CloudWatch (or equivalent) dashboards for per-team spending; configure AWS Budgets with proactive alerting thresholds
  • Automation: Automate cost reporting via email and Microsoft Teams, incorporating anomaly detection rules to identify spend spikes
  • Monitoring & Incident Management
  • Alerting Framework: Define and implement P1 P4 alerting rules covering deployment failures, runtime errors, tool invocation failures, and MCP connectivity issues
  • Incident Integration: Integrate alert notifications with Microsoft Teams and email, utilizing resource ownership tags for intelligent routing
  • Operational Excellence: Author detailed runbooks for every alert; publish and maintain these in Confluence to facilitate developer self-service resolution
  • Stack Evaluation: Compare AWS-native vs
  • third-party monitoring stacks to deliver a long-term recommendation aligned with the broader observability architecture
  • Risk Assessment: Evaluate current IAM and tagging strategies for multi-team isolation; identify scalability gaps and potential security risks
  • Policy Engines: Assess the Cedar policy engine (AgentCore) for fine-grained tool access control and document gaps for enterprise-scale deployment

Skills

AWSAWS BudgetsAWS CloudWatchAWS IAMABACAgentCoreCedarConfluenceDynatraceLangFuseLiteLLMLLMMicrosoft TeamsMCPTerraformX-Ray

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free