The Operations Tax: What Happens When AI Agents Hit Production Without Governance

Kan Yilmaz ran the numbers on MCP token consumption this month. His finding: the Model Context Protocol dumps the entire tool catalog into the context window at session start. Each tool costs roughly 185 tokens. With 84 tools registered, that is 15,540 tokens consumed before the agent does anything useful.

The obvious reaction is: strip it out. Use CLI lazy-loading. Drop the token count from 15,540 to about 300. A 98% reduction. Problem solved.

Except the tokens you just stripped away were JSON Schema definitions. Type constraints. Required field specifications. Enum validations. The protocol overhead is the governance layer. Remove it, and your agent is parsing unstructured --help text to decide how to call tools. The structured contract between agent and tool disappears.

This is the operations tax in miniature. The cost you see (tokens) and the cost you do not see (lost validation) pull in opposite directions. Optimize for one, and you pay the other.

The Tradeoff Nobody Discusses

Most organizations encountering MCP overhead for the first time frame it as waste. It feels like waste. Fifteen thousand tokens of boilerplate before a single productive action.

But consider what those schemas do. They define the contract between the agent and every tool it can call. A schema says: this tool accepts a string in this field, an integer in that field, and one of three enum values in the third. The agent does not guess. It does not hallucinate parameter names. The schema constrains the completion space, the same way a type system constrains a codebase.

Remove the schema, and you are back to unstructured text parsing. The agent reads a CLI help string and infers the interface. It works most of the time. “Most of the time” is fine in a demo. In production, “most of the time” compounds into incidents.

An academic paper (arXiv 2602.14878v1) quantified one dimension of this tension: augmenting MCP tool descriptions with richer metadata improves agent accuracy by 5.85 percentage points but increases execution steps by 67%. Better governance costs more to run. That is not a bug in the system. It is the fundamental tradeoff of operating AI agents.

Anthropic’s own answer to MCP overhead is Tool Search, which loads tool schemas on demand rather than all at once. It reduces token consumption by roughly 77% while preserving the validation layer. This is the engineering middle path: keep governance, reduce cost, accept that you cannot eliminate both.

One caveat worth noting: Yilmaz built CLIHub, the tool that benefits from the “just use CLI” conclusion. Commercial interest shapes framing. The pattern he identified is real. The solution he recommends serves his product.

Production Is Not a Demo

This week, Anthropic launched enterprise agent plugins with 13 new MCP connectors: Gmail, DocuSign, FactSet, S&P Global, and others. Kate Jensen, their head of product, said something honest: “2025 was meant to be the year agents transformed the enterprise, but the hype turned out to be mostly premature.”

The reason the hype was premature is that demos do not have operations costs. A demo runs once. It has no recurring token spend. It has no metric drift. It has no scheduled tasks running at 3 AM without oversight. Production has all of these, and each one compounds.

The Larridin survey from February 2026 found that 45% of enterprise AI adoption happens outside formal IT governance. The average enterprise runs 23 AI tools. Only 38% maintain an inventory of what AI is running and where. There is a 16-point visibility differential between executives who believe they understand their AI footprint and directors who actually manage it.

IDC’s FutureScape 2026 projects that G1000 organizations will face a 30% rise in underestimated AI infrastructure costs by 2027. The FinOps Foundation has created a dedicated working group specifically for token and GPU cost governance. These are not theoretical concerns. They are the beginnings of institutional response to costs that were invisible twelve months ago.

Three Layers of the Operations Tax

The operations tax has three layers, each compounding on the one below it.

Layer 1: Protocol overhead. Every agent session begins with a fixed cost. MCP schemas, system prompts, tool definitions. This layer is visible and measurable. It is also the layer most organizations try to optimize first, because it shows up in the invoice. The danger is optimizing it by removing governance.

Layer 2: Metric misdirection. Technical SLOs (latency, error rate, uptime) pass while business outcomes fail. Dunya Kirkali illustrated this with a scenario that any operations team will recognize: an Uber backend with 150ms response times, zero errors, stable database connections. Every dashboard is green. But no drivers are available within range. The business metric (riders matched with nearby drivers at 99.5% reliability) fails completely while every technical metric passes.

This principle, well-established in SRE practice since Google’s original SRE book in 2016, becomes acute with AI agents. An agent can execute tools successfully, return results within latency bounds, and produce zero errors, while delivering outputs that are wrong, irrelevant, or harmful. Technical success and business success diverge. The operations tax here is not tokens. It is the cost of measuring the wrong thing and believing you have visibility when you do not.

Layer 3: Ungoverned automation. Anthropic’s Cowork product now supports recurring scheduled tasks for Claude: hourly, daily, weekly. The capability exists. But it runs only on desktop, only while the laptop is awake. This is not enterprise orchestration. It is a prototype of enterprise orchestration.

The revealing detail is not that Cowork scheduling is limited. It is that even Anthropic, the company building the models, does not yet have production-grade scheduling infrastructure for its own agents. The capability to automate has outrun the infrastructure to govern automation.

When scheduled AI tasks run without monitoring, without business-level success criteria, without cost attribution, the operations tax compounds silently. Nobody sees it until the quarterly bill arrives or an ungoverned task produces a production incident.

The Compound Effect

Our own data from C.J. Roth’s aggregation earlier this month shows the compound effect in practice. AI-assisted teams complete 21% more tasks. They merge 98% more pull requests. But review time increases 91%. Incidents per PR rise 23.5%. Change failure rates climb roughly 30%.

The output increased. The operations infrastructure did not. The tax compounds.

At scale, the numbers become material. A conservative estimate: 1,000 agent sessions per day at 15,000 tokens of protocol overhead each, at $3 per million input tokens. That is $16,425 per year on protocol overhead alone. Not on productive work. On the structural cost of the protocol.

Strip the protocol to save money, and you lose the validation layer. Keep the protocol, and you pay the tax. Build selective loading (Anthropic’s approach), and you reduce the tax while preserving governance, but you add engineering complexity.

There is no free option. There is only the choice of which cost you pay, and whether you pay it deliberately or discover it in a postmortem.

What This Means for Operations Teams

Organizations moving AI agents from demo to production need to account for three things.

Budget for protocol overhead as an operating cost, not a bug. MCP token consumption is the cost of structured tool contracts. Treat it like you treat TLS overhead or authentication middleware. It is the price of operating safely.

Measure business outcomes, not just technical metrics. If your agent monitoring dashboard shows green while your business process fails, you are paying the operations tax in the most expensive currency: undetected failure. Define what success means at the business level before you measure anything at the infrastructure level.

Govern automation before scaling it. Scheduled AI tasks without monitoring, cost attribution, and success criteria are technical debt with compound interest. Every ungoverned automated task is a liability that grows silently until it does not.

The operations tax is real. It is not a reason to avoid production AI. It is a reason to build governance into the operating model from the start, before the compound interest makes the bill unmanageable.

Sources

Yilmaz, Kan. “MCP Token Cost Analysis.” February 2026. Analysis of MCP protocol overhead across tool configurations.
arXiv 2602.14878v1. Augmented MCP descriptions: +5.85 pp accuracy, +67% execution steps.
Anthropic. “Tool Search.” Documentation on selective tool loading for MCP overhead reduction.
TechCrunch. “Anthropic Launches Enterprise Agent Plugins.” February 24, 2026. Kate Jensen quote on 2025 enterprise AI expectations.
Kirkali, Dunya. “Business SLOs.” February 2026. Illustrative scenario on technical vs. business metric divergence.
Anthropic. “Cowork Scheduled Tasks.” February 25, 2026. Recurring task support for Claude desktop.
Larridin. Enterprise AI Adoption Survey. February 2026. 45% shadow AI, 23 average tools, 16-point visibility differential.
IDC FutureScape 2026. 30% rise in underestimated AI infrastructure costs by 2027.
FinOps Foundation. Dedicated AI cost governance working group.
Roth, C.J. AI Engineering Productivity Aggregation. February 2026. 21% more tasks, 98% more PRs, +91% review time, +23.5% incidents per PR, ~30% higher change failure rate.

Victorino Group helps organizations build governance and operational infrastructure for AI systems. If your AI agents are running in production without cost attribution, business SLOs, or automation governance, the operations tax is compounding. Reach out at contact@victorinollc.com or visit www.victorinollc.com.