The Agent Stack I Actually Use (And the Pieces Still Missing)
Every few weeks someone pings me asking what an "agent" really is, and what you need to build one. I used to pull up a slide. Now I pull up a list.
Here's the stack I reach for when I'm building. Each layer is a primitive. Some are mature, some are duct tape, and three of them don't really exist yet.
1. Model / Model Router
The foundation. You don't pick a model anymore — you pick a routing strategy. LiteLLM gives me an OpenAI-compatible shim across every provider. OpenRouter gives me access to 500+ models behind one key. For cost-sensitive routing I've been watching RouteLLM. The lesson I keep relearning: lock yourself to a single model and you're writing a migration into your next quarter's roadmap.
2. Agentic Loop
The thing that actually makes an agent an agent. Read the goal → pick the next step → act → observe → repeat. LangGraph, CrewAI, the Claude Agent SDK, and OpenAI's new server-side primitives all implement variations of this. Most teams either write their own 40-line loop or adopt one of these frameworks. For production, I still think "write your own" is the right call — frameworks bury the control flow you'll eventually need to debug.
3. Tools
Tools are the atomic unit of execution: bash, grep, read_file, patch_file, web_fetch. One tool = one verb. The best agents I've built have fewer, more powerful tools — not a tool per API endpoint. MCP is quietly becoming the standard wiring here, and that's a good thing.
4. Skills
Skills are the newest primitive, and the one most people underestimate. Anthropic formalized them as SKILL.md — a folder of instructions the agent loads only when the task matches. It's the difference between "here's a hammer" (tool) and "here's how a carpenter frames a wall" (skill). I split mine into technical skills (how to use the Google Workspace CLI, how to write a Dockerfile) and business skills (how we write LinkedIn posts, how we structure a pitch). If tools are the what, skills are the how.
5. Prompts
The unsexy one. System prompts are still where 70% of an agent's personality lives. Prompt drift is real — version them, diff them, and when something regresses, bisect by prompt before you bisect by model.
6. Scheduler
This is where I think the stack is most underbuilt. Agents that only run when a human types are assistants, not agents. You need cron, triggers, event queues — something that can say "wake up every hour and check my inbox." Temporal and Inngest handle this well in prod. Most agent frameworks don't ship a scheduler and you end up bolting one on.
7. Interfaces
TUI, web, Slack, Discord, email, iMessage. The interface is a thin skin on top of everything else — pick two, not six. I keep seeing teams burn a quarter building a custom web UI when a Slack app would have shipped in a week.
8. Memory
The hardest unsolved layer. A Memory.md file gets you 80% of the way for a single user. Honcho goes further with a peer-centric, theory-of-mind layer — it builds a behavioral profile of the user, not just a fact store. The trap most teams fall into: shelling out to an LLM on every memory recall adds 500–2000ms and non-determinism to every single step. Treat memory as infrastructure, not prompt engineering.
9. Tracing
LangSmith if you're deep in LangChain. Langfuse if you want open-source and OpenTelemetry-native. Braintrust and Arize if you lean evals-first. Non-negotiable — you cannot debug a multi-step agent without a trace viewer. Print statements stop scaling past five steps.
10. Evals / RL
This is where the moat is forming. Evals were a nice-to-have in 2024; by 2026 they're the product. The loop I'm building toward: production traces → curated golden sets → offline evals → RL environments → model updates. Vendors like HUD, Mercor, and Invisible are building the RL-environment layer. If you don't have evals, you don't have a feedback loop, and you're flying blind on every prompt change.
11. Infra (Sandboxes, Secrets)
e2b is the default for code-execution sandboxes; Modal, Daytona, and Northflank are all credible alternatives. Secret management for agents is still weirdly manual — I haven't seen a clean "Vault for agents" yet. Which brings me to what's missing.
What's still missing
Three gaps the stack hasn't filled yet:
1. Agent identity and auth. RSAC 2026 shipped five "agent identity" frameworks and every one of them punted on the delegation-chain problem. When an agent spawns a sub-agent that makes a purchase at 3 AM, who authorized what? Short-lived tokens expire mid-run; nobody has a clean story for 90-minute workflows.
2. Action governance. Identity tells you who ran an action. Governance tells you whether they should have. Today only ~28% of teams can trace an agent action back to a human sponsor across all environments. That's a non-starter for regulated industries.
3. Continuous learning in prod. Everyone talks about RL. Very few teams have the pipeline that actually closes the loop from production trace → eval → reward signal → model update. The teams that build that pipeline first are going to compound faster than everyone else.
If you're building today, the 11 primitives above are table stakes. If you're investing, I'd bet on the teams solving those three gaps — not on the 47th "agent framework."
That's the stack. Twelve layers, nine of them mature, three of them wide open.