The Agentic Stack
A 7-layer architecture reference for the post-API enterprise. From infrastructure to governance — every layer an agentic system needs to survive production.
The enterprise stack is inverting. For decades, architects thought in terms of presentation → business logic → data. In the agentic era, the stack is: infrastructure → data → models → runtime → communication → orchestration → governance. Each layer depends on the ones below it. Skip a layer, and your agents fail in production.
This reference maps all seven layers — the technologies, the patterns, the trade-offs, and the key decisions at each level. It is not a tutorial. It is an architecture map for technical leaders building systems that will still be running in 2028.
The Seven Layers
Each layer builds on the one below. Together, they form the complete architecture for agentic enterprise systems.
Infrastructure
The foundation beneath every agentic system. Traditional API gateways are evolving into AI Gateways — adding token metering, prompt sanitization, and model-aware rate limiting. Service mesh is being repurposed for agent-to-agent traffic with identity-aware routing. Edge compute brings inference closer to the user for sub-100ms latency.
Key Capabilities
- API Gateways evolving into AI Gateways with token metering and prompt sanitization
- Service mesh for agent-to-agent traffic with mTLS identity (Istio, Linkerd)
- Edge compute for low-latency inference and real-time agent interactions
- GPU orchestration and inference optimization at the infrastructure level
Key Technologies
“The AI Gateway is the new API Gateway. If your gateway can't meter tokens, sanitize prompts, and enforce model-level policies — it's already legacy.”
Data & Memory
Agents need memory — both short-term context and long-term knowledge. Vector databases enable semantic retrieval over unstructured data. Knowledge graphs encode structured relationships for precise reasoning. Context stores give agents persistent memory across sessions. The key shift: data architecture must be agent-readable, not just human-queryable.
Key Capabilities
- Vector databases for semantic retrieval (Pinecone, Weaviate, Qdrant, pgvector)
- Knowledge graphs for structured relationships (Neo4j, Amazon Neptune)
- Context stores for persistent agent memory across sessions and handoffs
- RAG 2.0: hybrid retrieval with re-ranking, moving beyond naive vector search
Key Technologies
“RAG 1.0 was 'embed and pray.' RAG 2.0 combines dense + sparse retrieval, cross-encoder re-ranking, and knowledge graph grounding. Your retrieval pipeline is your agent's IQ ceiling.”
Model Layer
The foundation models powering agentic reasoning. Model selection is no longer about picking 'the best model' — it's about routing tasks to the most cost-effective model for each capability. Semantic routers analyze intent and dispatch to specialized models. Multi-modal capabilities span text, vision, audio, and code generation.
Key Capabilities
- Foundation models: GPT-4o, Claude 3.5/4, Gemini 2, Llama 4, Mistral Large
- Model routing and semantic routing — send tasks to the most cost-effective model
- Fine-tuning vs RAG decision framework for enterprise customization
- Multi-modal capabilities across text, vision, audio, and code generation
Key Technologies
“Models are commoditizing. GPT-4o, Claude, and Gemini are converging in capability. The differentiator is now architecture — how you route, orchestrate, and govern models — not which model you pick.”
Agent Runtime
The execution environment where agents come alive. The runtime provides the primitives — tool use, structured output, function calling, code interpretation, and guardrails. OpenAI's Agents SDK defines the new standard with its Agents, Tools, Handoffs, Guardrails, and Tracing abstractions. Microsoft's Semantic Kernel brings enterprise-grade patterns. Salesforce's AgentForce adds governed enterprise deployment.
Key Capabilities
- OpenAI Agents SDK: Agents, Tools, Handoffs, Guardrails, Tracing primitives
- Semantic Kernel (Microsoft): enterprise-grade agent framework with planner architecture
- Salesforce AgentForce: governed enterprise agents with CRM-native deployment
- The Assistants API sunset (August 2026) and migration path to Agents SDK
Key Technologies
“OpenAI sunsetting the Assistants API in August 2026 is a signal: the industry is moving from 'stateful assistant threads' to 'composable agent primitives.' If you're still building on Assistants, start migrating now.”
Agent Communication
The protocols that let agents talk to tools and to each other. MCP (Model Context Protocol) is the USB port for AI — a universal standard for connecting agents to data sources, tools, and services. A2A (Agent-to-Agent) is the LinkedIn for agents — enabling capability discovery and task delegation across organizational boundaries.
Key Capabilities
- MCP: 97M monthly downloads — context passing, session memory, agent identity, access scope
- MCP 2026 roadmap: Streamable HTTP, metadata discovery via .well-known, task lifecycle
- A2A (Google): 150+ partners — Agent Cards for capability discovery, gRPC support (v0.3)
- MCP vs A2A: complementary, not competing. MCP = tool access. A2A = agent coordination.
Key Technologies
“MCP and A2A are not competing — they're complementary. MCP defines how agents access tools and context. A2A defines how agents find and coordinate with each other. Together, they form the communication backbone of the agentic enterprise.”
Orchestration
The patterns and frameworks that coordinate multi-agent systems. The supervisor-worker pattern has emerged as the dominant enterprise deployment model — a supervisor decomposes tasks, delegates to specialized workers, and synthesizes results. The framework landscape is maturing: LangGraph leads production workloads, CrewAI dominates rapid prototyping, and AutoGen excels at research and negotiation scenarios.
Key Capabilities
- Supervisor-Worker pattern: the dominant enterprise deployment model
- Framework landscape: LangGraph (40% production), CrewAI (prototyping), AutoGen (research)
- Deterministic vs probabilistic workflow routing for different risk tolerances
- Anti-patterns: God Agent, context window stuffing, synchronous chains
Key Technologies
“The three anti-patterns killing agentic projects: the God Agent (one agent doing everything), context window stuffing (dumping everything into the prompt), and synchronous chains (agents waiting on agents waiting on agents). Avoid all three.”
Governance & Compliance
The layer that determines whether your agentic systems survive contact with reality. The EU AI Act reaches full enforcement on August 2, 2026. Dynamic Agent Authorization (DAA) is replacing traditional IAM — because agents don't have job titles, they have capabilities. Every agent action needs an audit trail. Every decision needs explainability. And only 28% of organizations have mature governance structures.
Key Capabilities
- EU AI Act full enforcement: August 2, 2026 — with penalties up to €35M or 7% of global turnover
- Dynamic Agent Authorization (DAA) replacing traditional IAM for autonomous systems
- Agent audit trails and decision explainability for regulatory compliance
- Risk classification frameworks for agentic systems under the EU AI Act
Key Technologies
“Only 28% of organizations have mature governance structures for AI agents. With penalties up to €35M or 7% of global annual turnover, governance isn't a nice-to-have — it's existential. Build it into Layer 0, not as an afterthought at Layer 6.”
How the Layers Connect
This isn't a waterfall. Agents traverse the stack dynamically. A single user request might touch all seven layers in milliseconds:
Infrastructure routes the request through the AI Gateway, applies rate limits, and sanitizes the prompt.
Data & Memory retrieves relevant context from vector stores and knowledge graphs via hybrid RAG.
Model Layer routes the enriched prompt to the optimal model based on task complexity and cost.
Agent Runtime executes tool calls, applies guardrails, and generates structured output.
Communication connects to external tools via MCP and discovers peer agents via A2A.
Orchestration coordinates the supervisor-worker flow, manages handoffs, and synthesizes results.
Governance logs every decision, enforces authorization policies, and maintains the audit trail.
The key insight: governance isn't just the top layer — it permeates every layer. Authorization checks happen at infrastructure (L0). Data access controls live at the memory layer (L1). Model usage policies are enforced at L2. Every layer has a governance surface.
Architects who treat this as a linear stack will build fragile systems. Architects who understand the cross-cutting concerns — observability, security, governance — will build systems that scale to production.
The Landscape in Numbers
Go Deeper
Each pillar of the agentic stack has its own dedicated deep dive. Start with the layer that matters most to your architecture.