Context Is the New Perimeter: Why AI Agent Governance Starts in the Context Window

The developer community is having the wrong argument.

MCP versus CLI. Protocol versus shell. Structured tools versus raw commands. Scroll through any technical forum this month, and you will find engineers debating which mechanism should connect agents to the outside world.

The debate misses the point entirely. The mechanism is plumbing. The real question is this: who controls what the agent sees, and what happens when that control degrades?

David Cramer, CTO and co-founder of Sentry, published an analysis in February 2026 that cuts closer to the right question than most of the discourse. His argument deserves serious engagement, not because everything in it is correct, but because one insight in it is genuinely important and two others are productively wrong.

Steering: The Insight That Matters

Cramer’s central claim is that MCP’s value is not in the protocol. It is in what tool descriptions do to model behavior.

This is correct, and it is underappreciated.

A tool description in MCP is not documentation for a developer. It is a behavioral instruction for a language model. When you write “Use this tool to search error logs for a specific issue. Provide the issue ID and the time range. Returns structured error data with stack traces,” you are not explaining an API. You are programming a decision-making process. You are telling the model when to act, what to provide, and what to expect.

Cramer describes this as steering. Tool descriptions govern which tools the agent selects, what parameters it provides, how it interprets results, and when it decides to stop. The description is the control surface.

A Tenable study found that steering effectiveness varies significantly by model. The same tool description produces different behavioral patterns in Claude, GPT-4o, and Gemini. This means tool descriptions are not universal instructions. They are model-specific governance mechanisms that must be tuned per model.

This has an implication that most teams building MCP servers have not internalized: your tool descriptions are policy documents. They define the boundaries of agent behavior as concretely as any access control list. And unlike access control lists, they degrade.

Context Rot Is a Governance Failure

Every team building with agents has encountered context rot: the phenomenon where model performance degrades as the context window fills. We covered the technical mechanics in our earlier analysis of Azure SRE and Manus techniques. The standard framing treats it as a performance problem. Optimize your tokens, compress your history, keep utilization below 40%.

That framing is incomplete.

Context rot is a governance failure. Here is why.

When context degrades, the model does not simply get slower or less accurate. It loses the ability to follow instructions reliably. The steering effect of tool descriptions weakens. Behavioral boundaries that held at 20,000 tokens dissolve at 120,000. The agent does not crash. It drifts. It starts selecting wrong tools. It hallucinating parameters. It ignores constraints that were clearly stated at the top of the context.

This is not a performance curve. It is a compliance curve. And the difference matters enormously for enterprises that need agents to operate within defined boundaries.

Sixty-three percent of organizations report they cannot enforce purpose limitations on AI agents in production. When context rot is understood as a technical optimization problem, this statistic is puzzling. When it is understood as a governance problem, it makes perfect sense. The control surface erodes in real time, invisibly, and no one designed for it.

The Progressive Disclosure Tension

Cramer makes a second claim that is worth examining: that progressive disclosure, hiding tools from agents until they are needed, is counterproductive. His argument is that hiding context to reveal it later creates indirection and increases failure modes.

This is where the analysis gets interesting, because Cramer’s position directly contradicts Anthropic’s own guidance.

Anthropic’s Agent Skills framework, documented in their context engineering guidelines from September 2025, recommends exactly the pattern Cramer dismisses: on-demand knowledge loaded when needed rather than everything visible at all times. The Vercel passive context study from January 2026 complicates this further. AGENTS.md, which loads documentation passively into every prompt, achieved a 100% pass rate. Skills, which require agents to actively retrieve documentation, achieved 53%. Same as having no documentation at all.

Both sides have evidence. Both sides are partially right. The resolution is not that one approach wins. It is that the choice between passive and active context is itself a governance decision.

Loading everything passively means the agent always has access. No retrieval failures. But it means the context fills faster, the steering effect of earlier instructions weakens sooner, and you cannot selectively restrict what the agent sees based on the current task. Progressive disclosure gives you granular control over the agent’s information environment, but introduces the activation problem: agents fail to retrieve what they need 56% of the time.

Neither approach is universally correct. The right choice depends on what you are governing for. If reliability of tool selection matters most, load passively. If information containment matters most, disclose progressively. If both matter, you need a different architecture entirely.

Subagent Boundaries Are Governance Boundaries

This is where Cramer’s most operationally useful contribution appears. Sentry’s MCP implementation went from approximately 14,000 tokens of tool descriptions to approximately 720 tokens by wrapping their tools in subagent boundaries. A 95% reduction.

The mechanism: instead of exposing all tools to a single agent context, they created subagents that encapsulate tool sets. The parent agent sees a small number of high-level capabilities. Each subagent sees only the tools relevant to its domain. Context is contained. Steering remains effective within each boundary.

The tradeoff is real. Sentry’s latency went from 11 seconds to 24 seconds. Every subagent invocation is a separate inference call. You pay twice for every tool interaction: once for the parent to decide which subagent to invoke, once for the subagent to execute.

But here is the governance insight that the performance discussion obscures: subagent boundaries are not just an optimization technique. They are isolation boundaries. They determine what each agent can see, what tools it can access, and what information it can leak across domains.

In a single-agent architecture with 14,000 tokens of tool descriptions, every tool description influences every decision. A database query tool’s description can affect how the agent uses an email-sending tool. There is no isolation. Context is shared, and shared context means shared failure modes.

In a subagent architecture, a database subagent cannot see email tools. An email subagent cannot see database schemas. The parent agent orchestrates at a higher level of abstraction. Each boundary constrains what is possible within it.

This maps directly to how enterprises think about data governance. Role-based access. Least privilege. Blast radius containment. Subagent boundaries are the agent-native implementation of these principles.

The Security Dimension the Developer Debate Misses

The MCP-versus-CLI debate is happening almost entirely within a performance and developer experience frame. Security is a footnote, when it is mentioned at all.

The data suggests it should be the headline.

Security researchers have identified 1,862 MCP servers running without authentication. Tool poisoning attacks, where malicious tool descriptions manipulate agent behavior, have a 72.8% success rate against advanced models. This is not theoretical. It is measured.

Connect this to the steering insight. If tool descriptions govern agent behavior, then compromised tool descriptions govern agent behavior toward attacker goals. A poisoned tool description does not exploit a software vulnerability. It exploits the same steering mechanism that makes tools work in the first place. The governance surface is the attack surface.

Context rot amplifies this. As context degrades, the agent’s ability to distinguish between legitimate and malicious instructions weakens. The same drift that causes governance failures creates security vulnerabilities. A model at 80% context utilization is more susceptible to tool poisoning than one at 20%.

This creates a compound risk that neither the MCP advocates nor the CLI advocates are addressing: context management is not just about performance or developer experience. It is about maintaining the integrity of the agent’s decision-making environment against both degradation and adversarial manipulation.

What This Means for Enterprise AI Governance

The practical implications are specific.

Context budgets are governance budgets. When you define how many tokens an agent can consume, you are defining how long your governance controls remain effective. This is not a DevOps decision. It is a risk decision. Treat it accordingly.

Tool descriptions require the same change management as security policies. If a tool description governs agent behavior, then changing a tool description changes agent behavior. In production, these changes need review, versioning, and rollback capability. Most teams treat tool descriptions as code comments. They are policy.

Subagent boundaries should map to data governance boundaries. If your organization has data classification tiers, your subagent architecture should reflect them. An agent that can access both PII and financial data in a single context is an agent that can leak PII into financial workflows. Isolation is not optional for regulated industries.

Monitor context utilization as a governance metric. A context window at 90% utilization is not just slow. It is ungoverned. The steering effect of your tool descriptions is degraded. The agent is more susceptible to hallucination, instruction drift, and adversarial manipulation. Context utilization belongs on governance dashboards, not just performance dashboards.

Assume the attack surface is the governance surface. Every mechanism you use to steer agent behavior is a mechanism an attacker can exploit. Design accordingly. Validate tool descriptions. Authenticate MCP servers. Treat tool registries as security-critical infrastructure.

The Wrong Debate and the Right Question

MCP versus CLI is a plumbing argument. Both approaches have merits. Neither addresses the fundamental challenge.

The fundamental challenge is this: AI agents make decisions based on what is in their context. Whoever controls the context controls the agent. As context degrades, control degrades. As tool ecosystems expand, the attack surface for context manipulation grows.

Context is the new perimeter. Not in the marketing-buzzword sense of “the new X.” In the literal, architectural sense. The boundary of what an agent can see is the boundary of what it can do, what it can leak, and what it can be manipulated into doing.

Enterprise AI governance that does not address context management is incomplete. It is like network security that does not address the network boundary.

The organizations that will operate agents reliably at scale will be the ones that treat context as a governed resource: budgeted, monitored, isolated, and defended.

The rest will discover, slowly and expensively, that the agents they deployed are only as governed as the context they see.

Sources

David Cramer. “Context Management and MCP.” February 2026.
Anthropic. “Effective Context Engineering for AI Agents.” September 2025.
Vercel. Passive context study: AGENTS.md vs Skills evaluation. January 2026.
Sentry. MCP implementation benchmarks: 14,000 to 720 tokens, 11s to 24s latency.
Tenable. Tool description steering effectiveness across models (Claude, GPT-4o, Gemini).
Security research: 1,862 unauthenticated MCP servers identified.
Tool poisoning attack study: 72.8% success rate against advanced models.
Industry survey: 63% of organizations cannot enforce purpose limitations on AI agents.

Victorino Group helps mid-market companies build AI systems that are fast because they are governed. If your organization needs agents that stay within their boundaries, reach out at contact@victorinollc.com or visit www.victorinollc.com.