MCP Has Three Problems. Here's the Architecture That Fixes All of Them.

TL;DR: MCP's token bloat, serial round-trip chaining, and portability constraints are three distinct problems that no single tool solves cleanly. CLI agents solve some of them but introduce new ones. A synthesis approach — converting MCP servers into typed Python classes delivered via Agent Skills, run in a code sandbox — cherry-picks the best of all three paradigms. That's what mcp-skill is. This post explains the problem structure, why each partial solution falls short, and how the architecture hangs together.

There's a debate running through the agent infrastructure community right now about whether MCP is worth using at all.

The criticisms aren't wrong. MCP has real problems, and people hitting those problems in production are right to look for alternatives. But most of the debate is treating a set of distinct, tractable problems as if they were a single fundamental flaw in the protocol. That framing leads to solutions that fix one thing while making others worse.

Before proposing anything, I want to be precise about what the actual problems are.

Problem 1: The Token Tax at Load Time

Load the GitHub MCP server and you've consumed 55,000 tokens before your agent does a single unit of work. That's not a rough estimate — it's documented in Anthropic's own tool search documentation, and it matches what I've seen in production. Add Slack, Sentry, Grafana, and Splunk on top of that, and a typical multi-server setup burns through a substantial portion of your context window just defining what tools are available.

The mechanism is straightforward: every tool schema gets injected into the LLM's context on every turn, regardless of whether the model uses those tools or not. At 30 tools that's around 3,600 wasted tokens per turn. Scale that over a long agentic session and you're looking at hundreds of thousands of tokens doing nothing.

There's a compounding effect beyond the raw token cost. Claude's tool selection accuracy degrades meaningfully once you exceed 30–50 available tools. More tools don't give the model more capability — past a certain threshold, they hurt it.

The solution to this specific problem already exists. Anthropic's Tool Search API (tool_search_tool_bm25_20251119) implements dynamic loading: tools are marked with defer_loading: true and only loaded on demand, with the agent running a BM25 search to retrieve the 3–5 most relevant tools for any given request. According to Anthropic's documentation, this approach typically reduces token usage by over 85% while maintaining selection accuracy even across thousands of tools. This feature is now available in the API and in Claude Code.

This is a clean, simple fix for Problem 1. It doesn't require rethinking the MCP architecture — it just changes when tool definitions enter the context.

Problem 2: Serial Round-Trip Chaining

The token cost is the most visible problem, but this one matters more at scale.

Standard MCP tool calling is serial by design. The agent calls Tool 1, the result comes back into the model's context, the model decides what to do next, then it calls Tool 2. Every tool invocation is a full model round-trip: request, execute, full result into context, next decision. If you need to chain five operations, you have five round-trips. If you need to check budget compliance across 20 employees, you have 20 round-trips, each pulling potentially large result payloads into the context.

The compounding effect is both token cost and latency. Large intermediate results accumulate in the context window. The model has to reason over increasing amounts of data it doesn't need. Sequential operations that could be handled in a tight loop are instead serialized through the model's inference loop.

This is the problem that Programmatic Tool Calling addresses. Rather than having the model call tools one at a time, the model writes code — a Python script — that calls tools directly inside a code execution sandbox. The script handles the chaining, filtering, and aggregation. Only the final, processed result comes back to the model. For the 20-employee budget check, that means one script execution instead of 20 round-trips, returning a handful of lines instead of hundreds of kilobytes.

Anthropic's programmatic tool calling documentation gives a concrete benchmark: adding this approach on top of basic search tools was the key factor in fully unlocking agent performance on BrowseComp and DeepSearchQA — multi-step web research and complex information retrieval benchmarks. The same pattern holds in production. When you're doing sequential lookups, filtering, or aggregation across tool results, the token reduction and latency improvement are substantial.

Cloudflare's "Code Mode" implements the same underlying idea differently: MCP tool schemas are converted to TypeScript interfaces, the model writes TypeScript code, and execution happens in V8 isolates. The architectural logic is identical — eliminate model round-trips for sequential operations by pushing the chaining into code.

Problem 2 is solved by code execution. The question is how to make that ergonomic.

Problem 3: Portability vs. Discovery

The response to MCP's overhead issues that comes up most often is: just use CLI tools instead.

The argument has real merit. An agent with bash access can run gh --help and discover everything the GitHub CLI can do without any upfront token cost. It can pipe output from one CLI to another. It learns about tool behavior progressively by running commands and observing results. There's no schema to load, no upfront definition cost, and chaining is genuinely elegant once you're working in a Unix pipeline.

The problems with this approach are also real.

CLI tools are non-portable. They require bash, they require the right binaries installed in the right environment, and they don't exist in many of the contexts where agents need to run — browser-based sandboxes, constrained enterprise environments, WebAssembly runtimes. If you're building for broad deployment, "just use CLI" is a constraint you can satisfy in some environments and not others.

Progressive discovery also has a cost. An agent using --help flags has to make multiple tool calls to understand a tool's interface before it can use it effectively. This is fine for exploration but it's not efficient for well-defined, repeatable workflows. And bash chaining, while powerful for simple pipelines, gets difficult fast when you need structured data transformation, error handling, or conditional logic across multiple tool outputs.

There's also a more fundamental concern: giving an agent unrestricted access to bash is a meaningful security surface. The same capability that makes CLI attractive — the ability to run arbitrary commands — is the reason you want careful constraints around it in production environments.

So CLI is not a clean answer to the MCP problem. It trades one set of constraints for another.

What Agent Skills Get Right

Before proposing an architecture, one more concept worth grounding precisely: Agent Skills.

The Agent Skills format — now an open standard supported across Claude Code, GitHub Copilot, Cursor, VS Code, Gemini CLI, and a growing list of other platforms — uses progressive disclosure to manage context efficiently.

At startup, an agent loads only the name and description of each available skill. This is enough to know when a skill might be relevant. When a task matches, the agent reads the full SKILL.md into context. From there, it can follow instructions and execute bundled scripts as needed.

The key thing Agent Skills solve is the initial load problem for capabilities. Rather than defining everything upfront, the agent discovers what's available from lightweight summaries and loads depth on demand. This is the same principle as dynamic tool loading, applied to a richer container: skills can carry instructions, context, guardrails, and executable scripts alongside their tool definitions.

The limitation is that skills, on their own, don't address how tool calls are made once a skill is active. If you're using MCP inside a skill, you still have the chaining and efficiency problems. Skills manage context access. They don't change the round-trip structure of tool invocation.

The Synthesis: mcp-skill

The three problems are now in focus:

Token bloat at load time → solved by dynamic loading with BM25 search
Serial round-trip chaining → solved by code execution (programmatic tool calling)
Portability vs. discovery → not cleanly solved by CLI

And Agent Skills give us a pattern for progressive disclosure that works across environments.

The question is whether you can combine these into something that keeps MCP's genuine advantages — standardized auth, broad ecosystem availability, portability across platforms that lack bash — while fixing the efficiency problems.

That's what mcp-skill attempts to do.

The approach: take any MCP server, use a compiler to introspect its tools, and generate a typed Python class where each tool becomes an async method. Package that class into an Agent Skill with a SKILL.md that tells the agent when to use it and how. When the agent needs to work with that MCP server, it loads the skill, writes Python that chains method calls directly, and runs in a sandbox.

This is how the pieces fit together in practice:

No upfront loading cost. The skill is discovered via its description, not by loading all tool schemas. The agent sees "GitHub operations — file access, PR management, issue tracking" and loads the skill when that's what it needs.

Chaining in code, not model round-trips. Once the skill is loaded, the agent writes a Python script. github.list_prs(), filter by label, github.get_pr_diff(pr_number) for each — the entire multi-step operation runs in one sandbox execution. Intermediate results never touch the model's context window.

MCP auth, not bash. The generated Python class retains the MCP connection and auth configuration. You don't need bash access. You need a Python sandbox — which is available in a far wider range of environments than a full shell, including browser-based runtimes.

Python's standard library is in scope. The agent can use json, requests, list comprehensions, standard Python data manipulation alongside the generated tool methods. This is more expressive than bash pipelines for anything involving structured data.

Security through isolation. Everything runs in a sandboxed execution environment with no ambient access to the rest of the system.

How mcp-skill Works in Practice

The compilation step: mcp-skill create <server-url> connects to the MCP server, calls list_tools(), converts each tool's JSON Schema into Python type annotations, and generates a typed class with async methods. It then validates the output through AST parsing and type checking, and generates a SKILL.md with usage examples designed for agents.

The resulting class looks like this pattern:

class GitHubClient:
    async def list_pull_requests(
        self,
        repo: str,
        state: Literal["open", "closed", "all"] = "open"
    ) -> list[PullRequest]:
        ...

    async def get_file_contents(
        self,
        repo: str,
        path: str,
        ref: str | None = None
    ) -> FileContents:
        ...

The agent writes Python targeting these methods, chains them as needed, and the sandbox handles execution.

Where This Falls Short

Being direct about the limitations:

Pre-compilation overhead. Unlike CLI dynamic discovery, mcp-skill requires a compilation step when you bring a new MCP server in. This is a one-time cost, but it means you can't improvise with a new tool at runtime.

Sandbox availability. The approach requires a code execution environment. This is broadly available — ChatGPT, Claude's API, many enterprise platforms all support it — but it's not universal. Where no sandbox exists, this approach doesn't apply.

MCP server stability. If the upstream MCP server changes its tool schemas, the compiled Python class needs to be regenerated. This is a maintenance consideration that static MCP tool calling doesn't have in the same way.

The Design Choice Behind This

The reason these three problems get conflated is that they're all visible symptoms of the same underlying challenge: agents need access to a large number of capabilities without paying the cost of loading all of them upfront, and they need to compose those capabilities efficiently once loaded.

MCP solves the standardization and auth problem. It does not solve the load-time or chaining problems. Dynamic tool search (BM25) solves the load-time problem without requiring a change to MCP. Programmatic tool calling solves the chaining problem. Agent Skills solves the discovery and packaging problem. CLI solves discovery and portability in specific environments.

mcp-skill is an attempt to stack these solutions — taking the auth and ecosystem from MCP, the progressive disclosure from Agent Skills, and the code-execution approach from programmatic tool calling — into something that's portable, efficient, and usable in production without requiring bash access.

It's an early-stage project and the edges show. But the architectural logic is, I think, the right direction: rather than choosing between MCP, CLI, and Skills, there's a design that keeps what's genuinely good about each.

MCP vs CLI is wrong question