JV
Senior AWS Agentcore Platform Engineer
Jobs via Dice
Exton · Hybrid Contract Senior Today
About the role
Role
Senior AWS Agentcore Platform Engineer
Position Type
Contract to hire after initial 6 months
Location
Reading, PA or Exton, PA (Hybrid 2‑3 days a week from office)
Responsibilities
Observability & Distributed Tracing
- Gap Analysis: Assess AWS CloudWatch, X‑Ray, Bedrock logging, and AgentCore traces against agentic workflow requirements; produce a comprehensive gap analysis and lead the setup of observability within Dynatrace.
- Validation Pipelines: Design and implement post‑deployment validation pipelines for agents and Model Context Protocol (MCP) servers, ensuring deployment health and successful tool registration.
- Tracing & Logging: Implement distributed tracing and structured logging to capture LLM decision logic, tool selections, sub‑agent calls, and MCP interactions.
- Architecture Strategy: Evaluate LangFuse and LiteLLM proxies against AWS‑native solutions; deliver a target‑state observability architecture recommendation.
Cost Tracking & TCO (Total Cost of Ownership)
- Taxonomy Expansion: Extend tagging taxonomy to capture costs across agent runtimes, MCP servers, vector databases, and Bedrock token consumption per namespace.
- Cost Modeling: Design a granular cost visibility model to aggregate expenses for agents, MCPs, and LLM tokens by team and department.
- Dashboards & Alerting: Build CloudWatch (or equivalent) dashboards for per‑team spending; configure AWS Budgets with proactive alerting thresholds.
- Automation: Automate cost reporting via email and Microsoft Teams, incorporating anomaly detection rules to identify spend spikes.
Monitoring & Incident Management
- Alerting Framework: Define and implement P1‑P4 alerting rules covering deployment failures, runtime errors, tool invocation failures, and MCP connectivity issues.
- Incident Integration: Integrate alert notifications with Microsoft Teams and email, utilizing resource ownership tags for intelligent routing.
- Operational Excellence: Author detailed runbooks for every alert; publish and maintain these in Confluence to facilitate developer self‑service resolution.
- Stack Evaluation: Compare AWS‑native vs. third‑party monitoring stacks to deliver a long‑term recommendation aligned with the broader observability architecture.
Security & Governance
- Risk Assessment: Evaluate current IAM and tagging strategies for multi‑team isolation; identify scalability gaps and potential security risks.
- Policy Engines: Assess the Cedar policy engine (AgentCore) for fine‑grained tool access control and document gaps for enterprise‑scale deployment.
- Identity Architecture: Design a scalable Attribute‑Based Access Control (ABAC) identity model to ensure multi‑team isolation without IAM policy sprawl; deliver production‑ready Terraform modules.
Contact
Isaac Rajiv – Kutir Corporation
Requirements
- Security & Governance
- Identity Architecture: Design a scalable Attribute-Based Access Control (ABAC) identity model to ensure multi-team isolation without IAM policy sprawl; deliver production-ready Terraform modules
Responsibilities
- Observability & Distributed Tracing
- Gap Analysis: Assess AWS CloudWatch, X-Ray, Bedrock logging, and AgentCore traces against agentic workflow requirements; produce a comprehensive gap analysis and lead the setup of observability within Dynatrace
- Validation Pipelines: Design and implement post-deployment validation pipelines for agents and Model Context Protocol (MCP) servers, ensuring deployment health and successful tool registration
- Tracing & Logging: Implement distributed tracing and structured logging to capture LLM decision logic, tool selections, sub-agent calls, and MCP interactions
- Architecture Strategy: Evaluate LangFuse and LiteLLM proxies against AWS-native solutions; deliver a target-state observability architecture recommendation
- Cost Tracking & TCO (Total Cost of Ownership)
- Taxonomy Expansion: Extend tagging taxonomy to capture costs across agent runtimes, MCP servers, vector databases, and Bedrock token consumption per namespace
- Cost Modeling: Design a granular cost visibility model to aggregate expenses for agents, MCPs, and LLM tokens by team and department
- Dashboards & Alerting: Build CloudWatch (or equivalent) dashboards for per-team spending; configure AWS Budgets with proactive alerting thresholds
- Automation: Automate cost reporting via email and Microsoft Teams, incorporating anomaly detection rules to identify spend spikes
- Monitoring & Incident Management
- Alerting Framework: Define and implement P1 P4 alerting rules covering deployment failures, runtime errors, tool invocation failures, and MCP connectivity issues
- Incident Integration: Integrate alert notifications with Microsoft Teams and email, utilizing resource ownership tags for intelligent routing
- Operational Excellence: Author detailed runbooks for every alert; publish and maintain these in Confluence to facilitate developer self-service resolution
- Stack Evaluation: Compare AWS-native vs
- third-party monitoring stacks to deliver a long-term recommendation aligned with the broader observability architecture
- Risk Assessment: Evaluate current IAM and tagging strategies for multi-team isolation; identify scalability gaps and potential security risks
- Policy Engines: Assess the Cedar policy engine (AgentCore) for fine-grained tool access control and document gaps for enterprise-scale deployment
Skills
AWSAWS BudgetsAWS CloudWatchAWS IAMABACAgentCoreCedarConfluenceDynatraceLangFuseLiteLLMLLMMicrosoft TeamsMCPTerraformX-Ray
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free