- Home
- The Thinking Wire
- The Containment Pattern: Four Approaches to Sandboxing AI Agents in Production
The Containment Pattern: Four Approaches to Sandboxing AI Agents in Production
In the second week of February 2026, four companies independently shipped solutions to the same problem. Cursor released OS-level sandboxing for coding agents. Docker launched shell sandboxes with credential isolation for microVMs. Zenity published a peer-reviewed paper on detecting malicious intent inside model activations. And Entire, founded by the former GitHub CEO, emerged from stealth with $60 million to build governance infrastructure for AI-generated code.
None of these companies coordinated. They converged.
The convergence tells us something important about where production AI is heading. The industry has recognized that the way we currently govern AI agents --- asking humans to approve every action --- does not scale. A new pattern is emerging: containment. Let agents operate freely inside defined boundaries. Move the trust decision from per-action to per-environment.
The Problem: Approval Fatigue
Cursor names the core problem directly. When agents require human approval for every terminal command, users initially review each request carefully. Then they stop. As approvals accumulate, especially when running multiple parallel agents, humans rubber-stamp without reading. The security gate becomes theater.
This is not a user education problem. It is a design problem. Humans are not built to maintain sustained attention across hundreds of sequential approval decisions. The more gates you add, the less each gate is worth. At some point, more security infrastructure produces less actual security.
Cursor’s data confirms the intuition. Sandboxed agents --- those running inside OS-level containment with no per-command approval --- stop 40% less often than unsandboxed agents. One-third of all requests on supported platforms now run sandboxed. The agents are more productive because they are not constantly pausing to ask permission. And the security is better because the boundary is architectural, not behavioral.
Four Layers of Containment
What makes this moment interesting is not that containment is happening. It is that four genuinely different approaches have appeared simultaneously, each addressing a distinct layer of the security stack.
Layer 1: OS-Level Sandboxing (Cursor)
Cursor constrains agent behavior at the operating system level. On macOS, this means Seatbelt --- Apple’s kernel-level sandboxing mechanism, introduced in 2007, deprecated in 2016, and still used by Chrome. On Linux, it means Landlock and seccomp, two Linux Security Modules that restrict filesystem access and syscalls respectively.
The implementation is precise. Seatbelt policies are generated dynamically at runtime, based on workspace settings, admin configuration, and .cursorignore files. Agents cannot write to .git/config, .vscode directories, or any file pattern the user has excluded. They can read and write freely within the project workspace. The boundary is the project, not the command.
On Linux, Cursor uses an overlay filesystem approach. User workspaces are mapped into overlays where ignored files are overwritten with Landlocked copies that cannot be read or modified. The team notes this is the slowest part of their Linux implementation --- a limitation of the platform, not the design.
Windows runs the Linux sandbox inside WSL2, because native Windows sandboxing primitives are built for browsers, not developer tooling. Cursor is working with Microsoft on better primitives.
The most revealing detail is how Cursor taught agents to work within the sandbox. They updated shell tool descriptions to document the constraints, built an internal benchmark (Cursor Bench) to evaluate sandboxed versus unsandboxed performance, and changed how shell results render so agents can see which constraint caused a failure. The result: agents recover “far more gracefully from sandbox-related failures.”
Layer 2: MicroVM Isolation (Docker)
Docker Sandboxes takes a different approach. Instead of constraining the host OS, it creates an entirely separate environment: a lightweight microVM running Ubuntu with Node.js 20, Python, git, and common development tools pre-installed.
The security model is elegant in its simplicity. The agent can only see the workspace directory you mount --- not your home directory, not your system files, nothing else. API keys are never stored inside the sandbox. Instead, Docker’s network proxy intercepts outgoing API calls and injects credentials at request time. The agent uses a sentinel value (echo proxy-managed) that the proxy swaps for the real key. The actual secret never exists in the agent’s environment.
Inside the sandbox, the agent runs with bypassPermissions mode. No per-command approval. Complete freedom within the boundary. If something goes wrong, you docker sandbox rm and start fresh.
This pattern extends beyond any single agent. Docker positions the shell sandbox as a general-purpose runtime for anything that runs on Linux and talks to AI APIs: Claude Code, Codex, GitHub Copilot, OpenCode, Kiro, and others. The pattern is always the same: create sandbox, install tools, configure credentials via proxy, run.
Layer 3: Activation-Probe Detection (Zenity)
Zenity’s approach is fundamentally different from Cursor and Docker. They are not constraining agent behavior. They are detecting malicious intent inside the model itself.
The system runs prompts through Llama-3.1-8B-Instruct, extracts internal layer activations, and feeds those into a lightweight logistic regression probe that scores maliciousness. The key insight is that language models internally represent whether content is harmful even when they do not refuse or flag it in their text output. Prompting the same model as a judge (what Zenity calls Llama-Judge) consistently underperformed the activation probe --- the model understands malicious content but struggles to articulate that understanding.
The results are striking, particularly on agentic threats. Prompt injections embedded in tool results --- not user inputs, but the data returned by tools --- are a category that existing guardrails (Prompt-Guard, Llama-Guard) cannot evaluate at all. They score N/A. Zenity’s probe achieves 99.2% recall on this category. On jailbreaks, they score 71.2% versus Llama-Guard’s 28.9%.
The research methodology is rigorous. Zenity uses Leave-One-Dataset-Out (LODO) evaluation, where entire datasets are withheld from training to test true out-of-distribution performance. They demonstrate that standard train/val/test splits produce inflated accuracy that collapses under real-world conditions. The code, benchmark datasets, and paper are all publicly available.
The limitation is the 6.8% false positive rate on benign requests. In high-throughput agent systems, one in fifteen legitimate operations getting flagged creates operational friction. Zenity positions this as one layer in a cascaded detection system, not a standalone solution.
Layer 4: Checkpoint-Based Governance (Entire)
Entire does not prevent bad actions or detect malicious intent. It records reasoning.
Thomas Dohmke spent nearly four years as GitHub CEO and led Copilot to becoming the most widely used AI coding tool. His thesis is straightforward: the development toolchain --- version control, code review, CI/CD --- was designed for humans writing code. When AI agents are the primary code producers, the toolchain needs to be rebuilt.
Entire’s first product, Checkpoints, is an open-source CLI tool that records the instructions and reasoning behind AI-generated code. When an agent writes code, Checkpoints captures the prompt, the reasoning chain, and the context that led to each change. This context travels with the code, making AI-generated pull requests auditable in a way that commit messages alone cannot achieve.
Anyone who has reviewed an AI-generated pull request understands this problem. A diff shows what changed. It rarely shows why. When multiple agents produce code across a codebase, and no human understood the full reasoning, the review process breaks down. Checkpoints addresses this gap.
The company raised $60 million at a $300 million valuation --- what lead investor Felicis called the largest seed investment ever for a developer tools startup. The investor list includes Microsoft’s M12, Yahoo co-founder Jerry Yang, Y Combinator CEO Garry Tan, Datadog CEO Olivier Pomel, and developer community figures Gergely Orosz and Theo Browne. The team of 15, all remote, comes from GitHub and Atlassian. A broader platform launch is planned for later in 2026.
The Containment Taxonomy
These four approaches form a taxonomy, not a competition. Each addresses containment at a different level:
| Layer | Approach | What It Does | Provider |
|---|---|---|---|
| OS | Syscall + filesystem restriction | Prevents agents from accessing resources outside their boundary | Cursor |
| VM | Isolated microVM + credential proxy | Prevents agents from seeing secrets or escaping their environment | Docker |
| Model | Internal activation probes | Detects malicious intent before it becomes action | Zenity |
| Process | Reasoning audit trail | Records why changes were made for review and compliance | Entire |
The taxonomy suggests that mature agent infrastructure will layer multiple containment approaches. An agent might run inside a Docker microVM (Layer 2), with OS-level restrictions inside the VM (Layer 1), monitored by activation probes (Layer 3), and with all reasoning recorded by Checkpoints (Layer 4).
This is defense in depth --- the same principle that has governed infrastructure security for decades, now applied to AI agents.
What This Means for Enterprises
The practical implication is that the transition from “approve every action” to “define the boundary” is happening now. It is not theoretical. Four companies shipped it in one week.
For teams running AI agents in production, three questions become urgent:
First, what is your trust boundary? Today, most organizations define trust per-action: approve this command, allow this file write, permit this API call. The containment pattern moves this to per-environment: define the boundary once, let the agent operate freely inside it. This requires knowing what the boundary should be --- which files are sensitive, which APIs are dangerous, which network endpoints matter.
Second, which layer do you need? Not every workload requires every layer. Internal development agents might need OS-level sandboxing and checkpoint governance. Customer-facing agents might need VM isolation and activation-probe detection. The risk profile determines the stack.
Third, who owns the boundary? Containment is infrastructure. Somebody needs to define, maintain, and audit the boundaries. In Cursor’s model, it is a combination of workspace settings, admin configuration, and .cursorignore files. In Docker’s model, it is the sandbox configuration. These boundaries are governance artifacts as concrete as IAM policies, and they need the same operational rigor.
The Productivity Paradox
The counterintuitive finding in all of this is that containment makes agents more productive, not less. Cursor’s 40% reduction in stops is the clearest data point, but the principle is visible across all four approaches.
When agents know their boundaries, they do not waste cycles negotiating permissions. When credentials are managed by the environment, agents do not need complex authentication workflows. When reasoning is recorded automatically, agents do not need to generate explanatory commit messages. And when activation probes handle threat detection, the agent’s own safety mechanisms do not need to be as conservative.
Constraints enable velocity. This is not a new principle in software engineering --- we have known since the 1960s that structured programming (deliberate constraints on control flow) produces better software faster. But applying it to AI agents is new, and the data is starting to confirm what the principle predicts.
Looking Forward
Cursor’s team describes future interest in “sandbox-native agents trained on the constraints of their environment” --- agents that do not just tolerate their boundaries but are designed around them. This is the logical endpoint: agents that write programs optimized for their containment layer, rather than general-purpose agents shoved into a sandbox after the fact.
Entire’s Dohmke draws the analogy to manufacturing: “Just like when automotive companies replaced the traditional, craft-based production system with the moving assembly line, we must now reimagine the software development lifecycle for a world where machines are the primary producers of code.”
Both framings point in the same direction. The containment pattern is not a temporary compromise between autonomy and control. It is the architectural foundation for how production AI agents will operate. The companies that build their containment infrastructure now will be able to scale their agent deployments. The ones that rely on approval fatigue will not.
Four companies. Four layers. One week. The containment pattern is becoming the production standard.
Sources: Cursor Blog | Docker Blog | Zenity Labs (arxiv) | DevOps.com on Entire
Contact: contact@victorinollc.com | www.victorinollc.com
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation