Senior AWS Agentcore Platform Engineer

Jobs via Dice

Exton · Hybrid Contract Senior 3mo ago

About the role

Contract to hire after initial 6 months

Reading, PA or Exton, PA (Hybrid 2‑3 days a week from office)

Gap Analysis: Assess AWS CloudWatch, X‑Ray, Bedrock logging, and AgentCore traces against agentic workflow requirements; produce a comprehensive gap analysis and lead the setup of observability within Dynatrace.
Validation Pipelines: Design and implement post‑deployment validation pipelines for agents and Model Context Protocol (MCP) servers, ensuring deployment health and successful tool registration.
Tracing & Logging: Implement distributed tracing and structured logging to capture LLM decision logic, tool selections, sub‑agent calls, and MCP interactions.
Architecture Strategy: Evaluate LangFuse and LiteLLM proxies against AWS‑native solutions; deliver a target‑state observability architecture recommendation.

Taxonomy Expansion: Extend tagging taxonomy to capture costs across agent runtimes, MCP servers, vector databases, and Bedrock token consumption per namespace.
Cost Modeling: Design a granular cost visibility model to aggregate expenses for agents, MCPs, and LLM tokens by team and department.
Dashboards & Alerting: Build CloudWatch (or equivalent) dashboards for per‑team spending; configure AWS Budgets with proactive alerting thresholds.
Automation: Automate cost reporting via email and Microsoft Teams, incorporating anomaly detection rules to identify spend spikes.

Alerting Framework: Define and implement P1‑P4 alerting rules covering deployment failures, runtime errors, tool invocation failures, and MCP connectivity issues.
Incident Integration: Integrate alert notifications with Microsoft Teams and email, utilizing resource ownership tags for intelligent routing.
Operational Excellence: Author detailed runbooks for every alert; publish and maintain these in Confluence to facilitate developer self‑service resolution.
Stack Evaluation: Compare AWS‑native vs. third‑party monitoring stacks to deliver a long‑term recommendation aligned with the broader observability architecture.

Risk Assessment: Evaluate current IAM and tagging strategies for multi‑team isolation; identify scalability gaps and potential security risks.
Policy Engines: Assess the Cedar policy engine (AgentCore) for fine‑grained tool access control and document gaps for enterprise‑scale deployment.
Identity Architecture: Design a scalable Attribute‑Based Access Control (ABAC) identity model to ensure multi‑team isolation without IAM policy sprawl; deliver production‑ready Terraform modules.

Isaac Rajiv – Kutir Corporation

AWSAWS BudgetsAWS CloudWatchAWS IAMABACAgentCoreCedarConfluenceDynatraceLangFuseLiteLLMLLMMicrosoft TeamsMCPTerraformX-Ray

Cosmoquick

Arango

Pennylane

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.