- Home
- The Thinking Wire
- The Cage Pattern: What Agent Fleet Governance Actually Looks Like
The Cage Pattern: What Agent Fleet Governance Actually Looks Like
Most conversations about AI agents focus on what agents can do. The interesting question is what agents are not allowed to do --- and how that constraint is enforced.
Three companies --- Stripe, Cloudflare, and OpenAI --- are running agent fleets in production at meaningful scale. They arrived at their architectures independently. They converged on the same principle: give agents maximum autonomy inside strict structural boundaries. Not supervised freedom. Not gated approvals. Full permissions inside a cage.
This is not a metaphor. It is an architecture pattern. And it may be the most important thing happening in enterprise AI right now --- not because these companies are succeeding, but because of what their success reveals about what governance actually means when machines are doing the work.
Before we examine the pattern, a necessary disclaimer: these three companies are the exception. Only 14% of AI pilots scaled to production by mid-2025. Forty-two percent of companies abandoned most of their AI initiatives entirely (Gartner). What follows is a study of what the survivors built. Survivorship bias is the dominant force in every AI success story you read. Including this one.
Three Architectures, One Principle
Stripe: Blueprint-Based Orchestration. Stripe’s agent system merges more than 1,300 pull requests per week. The architecture uses hybrid graphs where deterministic nodes enforce invariants --- the parts that must never vary --- while agentic nodes handle the ambiguous work. The agents have access to roughly 400 internal MCP tools, but any given task receives a curated subset of about 15. The constraint is not “use whatever tools you want.” It is “here are exactly the tools this task requires, and nothing else.”
The execution environment is equally constrained. Agents run in Devboxes --- hot-started environments on AWS EC2 that spin up in about 10 seconds. These environments are QA-only. No production access. The agents can do anything they want inside the sandbox. They cannot touch anything outside it.
Cloudflare: Sandbox Isolation. Cloudflare took the cage metaphor literally. Their agents run inside V8 Worker isolates --- the same technology that powers their edge computing platform. No filesystem access. No environment variables. Fetch is disabled by default and must be explicitly enabled per endpoint. OAuth 2.1 downscopes permissions to the minimum required for each operation.
The result is an environment where agents have full computational autonomy and zero ambient authority. They can execute any logic. They cannot reach anything they were not explicitly granted access to. This is the principle of least privilege implemented at the runtime level, not the policy level.
OpenAI Codex: Organizational Governance. OpenAI’s Codex team, roughly 40 people with minimal hierarchy, uses AI to shape planning, execution, and review. This is the softest of the three architectures --- governance through organizational structure rather than technical isolation. It is also the hardest to evaluate, because the source material is self-reported promotional content from a company with obvious resource asymmetry (they built the model their agents run on).
The pattern across all three: governance is not a layer added after deployment. It is the deployment architecture itself.
The Cage Is the Innovation
The conventional approach to agent governance looks like supervised tooling. An agent proposes an action. A human or system reviews it. The action is approved or rejected. This is how most enterprise AI pilots work, and it is why most of them do not scale.
Supervised tooling creates a bottleneck at exactly the point where agents are supposed to create leverage. If every tool call requires approval, the agent’s throughput is bounded by the approval system’s throughput. You have built an expensive way to do work at human speed.
The cage pattern inverts this. Instead of supervising each action, you build an environment where dangerous actions are structurally impossible. The agent does not need permission to write to a file because the only files it can access are disposable. It does not need permission to call an API because the only APIs available are pre-approved for this task. It does not need a human reviewer because the output goes through the same CI pipeline that reviews human code.
Stripe’s formulation captures it precisely: “What’s good for humans is good for agents.” Their agents run the same linters, the same test suites, the same CI checks that human engineers face. The governance mechanism is not agent-specific. It is the engineering quality infrastructure that already existed --- extended to cover a new category of contributor.
This is a genuinely important insight. Organizations that build separate governance systems for AI agents are doing twice the work and getting worse results than organizations that extend their existing quality infrastructure to include agents as first-class participants.
The Token Economics of Constraint
Cloudflare discovered something that deserves more attention: reducing the information available to an agent is itself a governance mechanism.
Their Code Mode compresses more than 2,500 API endpoints into roughly 1,000 tokens of context. WorkOS independently verified an 81% token reduction for complex tasks using similar approaches. (Cloudflare’s own claim of 99.9% reduction compares worst-case naive prompting against their optimized approach --- real-world numbers are closer to what WorkOS measured.)
The governance insight: fewer tokens means less attack surface. An agent that can “see” 2,500 endpoints has 2,500 possible targets for prompt injection, confused deputy attacks, or simple hallucination-driven misuse. An agent that sees 15 curated tools has 15.
This is constraint as security. Not by blocking actions after the fact, but by never presenting the possibility. You cannot misuse a tool you do not know exists.
The parallel to human organizations is direct. A new employee does not receive the admin credentials on day one --- not because the company distrusts them, but because access should match scope. The same principle applies to agents, except agents make it architecturally enforceable rather than policy-dependent.
What the Success Stories Leave Out
The 1,300 pull requests Stripe merges per week deserve context. These are primarily low-complexity changes: linting fixes, formatting standardization, CI configuration, migration scripts. At roughly 8,000 engineers, agent-generated PRs represent about 5% of total output. This is valuable --- it frees engineers from repetitive work --- but it is not the autonomous feature development that headlines imply.
More importantly, none of the three companies address the maintenance cost of agent-generated code. Research from CodeRabbit shows that AI-assisted pull requests have 1.7 times more issues than human-written code. Technical debt increases 30-41%. Cognitive complexity rises 39%. The agents produce code that passes tests and CI checks --- the cage works --- but the long-term carrying cost of that code is an open question.
This is the gap in every agent governance architecture today. The cage ensures that output meets minimum quality standards at the time of creation. It does not ensure that the output remains maintainable, readable, or architecturally coherent over time. Governance solves the production problem. It does not yet solve the maintenance problem.
Gartner projects that 40% of agentic AI projects will be canceled by 2027 specifically because of governance failures. Only 29% of organizations report being prepared to secure agentic AI systems (Help Net Security, February 2026). The companies in this essay solved the governance problem. Most will not.
The Production Data Problem
Both Stripe and Cloudflare solved governance through isolation. Their agents operate in sandboxes with no access to production data. This is elegant and effective --- for their use cases.
Most enterprise agent deployments do not have this luxury. An agent processing insurance claims needs real customer data. An agent managing supply chain logistics needs live inventory systems. An agent handling financial reconciliation needs production databases. These agents cannot run in a cage that excludes the data they need to do their jobs.
This is the unsolved problem in agent governance. Isolation works when the agent’s task can be completed with synthetic or limited data. When the task requires production data --- which describes most enterprise use cases --- the cage must be permeable in exactly the right places. Permeable cages are harder to build, harder to verify, and harder to maintain than sealed ones.
The organizations that solve selective permeability --- agents that can access exactly the production data they need and nothing else, with audit trails, data minimization, and purpose limitation enforced at the architecture level --- will define the next generation of enterprise AI. Nobody has demonstrated this convincingly yet.
Feedback Loops Replace Approval Gates
One more pattern from Stripe that deserves extraction: their agents run subset linters locally, then get exactly one CI iteration. If the code passes, it merges. If it fails, the agent gets one chance to fix it. Then a human reviews.
This is a feedback loop, not an approval gate. The difference matters.
An approval gate says: “A human must review this before it proceeds.” It is a checkpoint. It creates a queue. It scales linearly with human availability.
A feedback loop says: “The system will tell you whether this works. Fix it if it doesn’t.” It is a conversation between the agent and the quality infrastructure. It scales with compute.
The shift from approval gates to feedback loops is what makes agent fleets viable at scale. You cannot have a human approve 1,300 pull requests per week. You can have a CI system evaluate them. The governance is in the CI system’s design --- the tests it runs, the standards it enforces, the thresholds it applies --- not in human judgment applied to each individual output.
This is why test infrastructure is governance infrastructure. We have written about this before: when agents are autonomous, your tests literally define what gets built. The cage pattern adds a corollary: your tests also define what gets approved.
Building the Cage
For organizations evaluating agent deployment, the cage pattern suggests a specific sequence.
Start with the boundary, not the agent. Before selecting an agent framework or model, define the execution environment. What can the agent access? What is structurally excluded? What quality checks must pass before output is accepted? The boundary design is the governance design.
Extend existing infrastructure rather than building parallel systems. If you have CI/CD pipelines, linters, and test suites, those are your agent governance mechanisms. The work is extending them to treat agent output as a first-class input --- not building a separate “AI governance” layer.
Reduce the attack surface through curation. Do not give agents access to every tool and endpoint. Curate task-specific subsets. Stripe’s ratio --- 15 tools selected from 400+ for each task --- is a useful benchmark. The constraint is not a limitation. It is a design choice that reduces both error rates and security exposure.
Design feedback loops, not approval gates. Every approval gate you build is a scaling bottleneck you will eventually need to remove. Build quality checks that agents can respond to autonomously. Reserve human review for the cases where automated checks are insufficient.
Plan for maintenance from the start. The cage ensures production quality. It does not ensure long-term maintainability. Build code review processes, architectural fitness functions, and technical debt tracking that account for the specific patterns of agent-generated code.
The Real Question
The cage pattern is not complicated. It is architecturally straightforward: define boundaries, enforce them structurally, let agents operate freely within them. The principle is the same one that makes containers, sandboxes, and capability-based security effective.
The difficulty is not technical. It is organizational.
Building the cage requires that an organization has strong engineering infrastructure --- comprehensive tests, reliable CI/CD, well-defined quality standards --- before adding agents. Organizations that lack this infrastructure cannot build the cage. They will default to supervised tooling, which does not scale, or unsupervised deployment, which does not govern.
This is the uncomfortable truth about agent governance: it is a trailing indicator of engineering maturity. The organizations that can govern agent fleets are the organizations that already had strong engineering practices. The cage pattern does not create discipline. It requires it.
For the 86% of organizations whose AI pilots have not scaled to production, the path forward is not better agents or better models. It is better infrastructure. Build the cage first. The agents are the easy part.
Sources
- Stripe Engineering Blog. “How Stripe builds with LLMs.” stripe.com, 2026.
- Cloudflare. “Introducing AI Agent SDK and MCP support.” cloudflare.com, 2026.
- OpenAI. “How the Codex team builds with Codex.” openai.com, 2026.
- WorkOS. “MCP Token Efficiency Benchmarks.” workos.com, 2026.
- CodeRabbit. “AI-Assisted Pull Request Quality Analysis.” coderabbit.ai, 2025.
- Gartner. “Agentic AI Project Governance Forecast.” gartner.com, 2025.
- Help Net Security. “Only 29% of organizations prepared to secure agentic AI.” helpnetsecurity.com, February 2026.
- METR. “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” metr.org, July 2025.
Victorino Group helps organizations design the governance architecture that makes autonomous AI agents production-viable. If you are running agent pilots that need to scale --- or evaluating whether your infrastructure is ready for agent fleets --- reach out.
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation