- Home
- The Thinking Wire
- Tool Sprawl Has a Token Tax: When 50 MCP Tools Eat 7% of Context
Tool Sprawl Has a Token Tax: When 50 MCP Tools Eat 7% of Context
Most platform teams cannot answer, in five seconds, how many tools their agent actually has registered. They can answer the headcount of the team. They can answer the cloud bill. They cannot answer the tool count.
That answer matters more than it sounds. The LeanIX engineering team published a measurement last week that turns the question from sloppy to quantitative: at fifty or more MCP tools registered to an agent, the tool schemas — names, descriptions, parameter shapes — consume between five and seven percent of the model’s context window before the user has typed a single character. Every request pays the tax. Every step of every workflow pays the tax. Nobody put it on a budget.
This is the cheap part of the problem.
The Tax Nobody Budgeted
When you register a tool with an MCP-compliant agent, three artifacts go into the model’s prompt every time it makes a decision: the tool’s name, its natural-language description, and its full parameter schema. At one tool, this is invisible. At ten, it is rounding error. At fifty, it is measurable in the percentage of context window that disappeared before the agent saw the user’s question.
Five to seven percent is not a catastrophe in isolation. It is a catastrophe in compounding. The same agent has system prompts, retrieved documents, prior turns, scratchpad reasoning, and the actual user message all competing for the same window. Every percentage point given to tool schemas is a percentage point not given to the work. Long conversations degrade faster. Retrieval gets aggressively trimmed. The agent loses earlier context to make room for the question it is trying to answer.
But the schema tax is the bookkeeping problem. The behavior problem is worse.
The Hallucination Tax Beneath It
Semantically similar tools — update_user, update_user_record, update_user_profile — confuse the model in ways that are not visible in any single trace. The agent invents a fourth tool name that does not exist. It invokes the right tool with parameters borrowed from another tool’s schema. It conflates two parameter names that mean different things. The error rate climbs as the surface area grows, and it climbs nonlinearly. You do not get a 50% increase in tool count and a 50% increase in tool errors. You get a 50% increase in tool count and a regime change in how the agent reasons about which tool to call.
The schema tax is recoverable: prune tools, reclaim tokens. The hallucination tax is structural: it does not shrink until the surface area shrinks, and it is hardest to detect on the workflows that worked yesterday and quietly stopped working today.
The author’s diagnosis is the right one to absorb: regardless of architectural pattern, you still need to maintain a reasonable number of tools. Tool sprawl is the disease; everything below is the treatment.
Code Mode as One Treatment
Cloudflare’s pattern, which the LeanIX post traces in detail, treats the schema tax and the hallucination tax as the same disease. Instead of registering fifty tools, you fetch the upstream MCP server’s schema, generate a TypeScript SDK from it with documentation comments, and expose exactly two tools to the model: one to search the SDK, one to execute generated code in a sandbox.
The mechanics are worth understanding precisely:
- Search. The SDK ships with a vector index over its endpoints, types, and examples. The model issues a natural-language query — “find the function that updates a user’s billing address” — and gets back a small set of relevant SDK fragments. Only those fragments enter the context window. The fifty-tool schema is no longer there to tax every request.
- Execute. The model writes a JavaScript async function that uses the SDK fragments it found, and the agent runs it inside a Cloudflare Dynamic Worker isolate. Loops, conditionals, batching, error handling — all collapse into one generated program and one execution call. Multi-step workflows that previously required ten round-trips of “model picks tool, tool returns, model picks next tool” become a single round-trip of “model writes program, sandbox runs program, model reads result.”
The token math improves twice. The schema tax drops to two tools instead of fifty. The conversation tax drops because workflows compress. The hallucination problem shrinks because the model is no longer choosing between forty-eight near-identical tool names; it is writing code against typed function signatures with documentation comments next to them.
This is the same architectural move we have catalogued in adjacent essays: the agent runtime is becoming a syscall surface where governance lives at the boundary between model and execution, not at every individual tool. Code Mode is one expression of that move. The sandbox is not optional infrastructure; it is the substrate that lets the boundary exist.
When Code Mode Is the Wrong Answer
Code Mode is not free. The decision matrix the LeanIX post offers is the right one to keep on hand:
Use Code Mode when the agent is registered against fifty or more tools with repetitive call patterns, when workflows want loops and conditionals and batching, when round-trip count is the bottleneck, and when you have a sandbox infrastructure already in production. The sandbox part is non-negotiable. If you are running JavaScript that the model wrote, you are running it inside an isolate with strict resource limits and no network access by default. A sandbox you stand up only for Code Mode is a sandbox you have not hardened.
Stay with traditional MCP when debugging transparency matters more than token efficiency — every tool call is a log line a human can read, where Code Mode produces one program execution that is harder to inspect step by step. Stay with MCP when single-call operations dominate; you do not need a code generator to call one function. Stay with MCP when you cannot run a code execution sandbox; the alternative is unsafe. And stay with MCP when interoperability with other MCP-compliant agents matters more than per-agent efficiency.
There is a third path that the post does not name explicitly but that follows from its argument: stay with MCP and prune. Cut the tool surface. Filter agent-side based on the task. Redesign server-side around use cases instead of mirroring upstream APIs one-to-one. The author’s point that you still need a reasonable tool count survives every architectural choice. Code Mode is one way to absorb the cost of having many tools. Reducing the tool count is another, and it does not require a sandbox.
The Question Behind the Architecture
Tool sprawl is a quantifiable governance problem dressed up as a technical detail. The five-to-seven percent number is useful because it makes the conversation budgetable. You can walk into the platform meeting and ask whether your agents are paying that tax, and you can make decisions with the answer.
But the deeper question is the one most teams cannot answer in five seconds: how many tools does your production agent actually have registered? Who added the last three? When was the last time anyone audited the surface? If those questions take an hour to answer, you have already lost the boundary, and the schema tax is what tells you the boundary is missing — not the cause.
Code Mode is one way to take the boundary back. Pruning is another. Designing tools around use cases instead of upstream APIs is a third. The architectures differ; the discipline does not. Whether your agent’s tool surface is governable, observable, and bounded is a question with a yes-or-no answer. Five to seven percent of the context window is what “no” costs you on every single request, before the conversation about hallucinated tool calls has even started.
If your platform team cannot answer “how many tools does our agent have?” in five seconds, that is the work. The architecture comes after.
This analysis synthesizes Why Your AI Agent Is Drowning in Tools (and How Code Mode Saves It) (LeanIX Engineering, April 2026).
Victorino Group helps engineering teams audit and bound their agent tool surface before it eats their context window. Let’s talk.
All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation