The Containment Stack Grew Up: Four Tiers Shipped in One Week

We already drew the diagram. In April, four agent-control surfaces shipped reference implementations in one week, and we mapped them as four floors of one building: compute, data, knowledge, identity. That diagram still holds. This week is different. This week, the same containment concept shipped at four distinct altitudes of the technical stack, from a 362KB WASM blob all the way up to a Rust gateway pushing 500,000 queries per second. The architecture did not get wider. It got taller.

The shift worth naming: containment is no longer a single layer you bolt on. It is a stack of nested boundaries, and the live question for any platform team is which altitude carries your blast radius. Draw it too low and a compromised process walks straight out your network egress. Draw it too high and you pay gateway latency on every harmless function call. Four vendors just published reference answers, each at a different floor.

Tier 1: The Language Runtime

The lowest tier is the one most teams forget exists. Before microVMs, before gateways, you can contain at the level of the interpreter itself.

Simon Willison’s MicroPython in a WASM sandbox is the cleanest demonstration this week. A 362KB MicroPython blob compiled to WebAssembly, driven by 78 lines of host C code. No filesystem. No network. CPU capped through wasmtime’s fuel mechanism, so an infinite loop dies on a budget rather than pinning a core. The agent writes Python, the Python runs, and the runtime physically cannot reach anything you did not hand it.

This is containment as a property of the execution environment, not a policy layered on top. The value is density and speed: you can spin thousands of these, they start in microseconds, and the trust boundary is the WASM linear memory itself. The cost is capability. No filesystem means no pip install, no reading a CSV off disk, no real toolchain. This tier is for evaluating untrusted code snippets, not for running a coding agent that needs to build a project.

Tier 2: The MicroVM

One floor up, you trade density for a real computer. LangSmith Sandboxes reached general availability on June 5, and the framing in LangChain’s announcement is the line every platform engineer should write on the wall: “a container boundary is not an isolation boundary.”

The product gives each agent a hardware-virtualized microVM. Sandboxes are creator-private by default, so one agent’s environment is not silently reachable by another. Credentials reach the workload through an auth-proxy that injects them at request time rather than baking long-lived secrets into the image. That last detail matters because it connects directly to the network and identity work we covered the week before: the sandbox is where compute isolation and credential discipline finally meet in one product.

Why hardware virtualization and not just a tighter container? Because the threat model now includes the agent’s own supply chain. LangChain points at Shai-Hulud, the npm worm that backdoored 796 packages and over 25,000 repositories in November 2025. When an agent runs npm install, it is executing code from a tree it never reviewed. A shared kernel means that code can probe for an escape. A microVM means the worst case is a destroyed, disposable VM. You get a full Linux environment with a filesystem and a network, and you get a virtualization boundary around it. That combination is why this tier became the default reference for production coding agents.

Tier 3: The Network Egress

The first two tiers contain what the agent can run. The third contains what the data can do once the agent has it. This is the exfiltration tier, and it is where the week got interesting.

OpenAI shipped Lockdown Mode, documented per OpenAI’s Help Center. It disables live browsing, web image retrieval, deep research, and agent mode, while keeping Codex network access intact. The Help Center frames it precisely: it is “designed to help prevent the final stage of data exfiltration.” A prompt injection can corrupt an agent’s instructions, but injection alone does not leak your data. The leak happens when the corrupted agent renders an image from an attacker URL or browses to an endpoint that carries your secrets out in a query string. Lockdown Mode severs that last hop.

Here is the part worth sitting with. OpenAI shipped egress containment as a toggle that is off by default, and shipped it alongside guidance that prompt injection “is not currently a major risk.” Read those two facts together. The vendor built the brake, then told you the road is mostly straight. Both statements can be technically defensible and still leave the customer holding the decision. If injection is not a major risk, why build the containment? If it is enough of a risk to build the containment, why ship it off by default? The honest reading is that the threat is real, the default is a product choice about friction, and the governance burden lands on you to flip the switch for any agent that touches sensitive data.

Tier 4: The Gateway

The top tier does not contain a single agent. It governs the traffic of an entire fleet. Agentgateway, which earned AAIF Growth-stage approval on May 21, is the reference implementation, detailed in Solo.io’s design writeup.

The numbers establish that this is infrastructure, not a research demo: 500,000 queries per second at under 0.2ms P99 latency, more than 7 million downloads, written in Rust, native to MCP and A2A, configured through an xDS control plane borrowed straight from the service-mesh world. At this altitude, containment means policy: which agent may call which tool, which model, which downstream API, and under what rate. The boundary is not memory or a VM. It is the wire, and every request crosses it.

A gateway gives you one place to enforce, observe, and revoke across hundreds of agents at once. The cost is that it sees nothing about what happens inside a sandbox; it governs the edges, not the interior. That is the entire point. Tier 4 is not a replacement for the lower tiers. It is the floor you reach for when you stop containing one agent and start governing a population of them.

Do This Now

Stop asking whether to contain your agents. Ask which tier owns your blast radius, and write the answer down per agent class.

Take your three most active agent workloads this week. For each, name the altitude where containment actually lives today. A coding agent running npm install against a shared host kernel is sitting at no real tier; it belongs at Tier 2. An agent that browses the web on behalf of a user with access to internal data is a Tier 3 problem, and if you are on OpenAI’s platform, Lockdown Mode is off until you turn it on. A fleet of more than ten agents calling shared tools with no central policy point is a Tier 4 problem wearing a Tier 1 costume.

The most expensive mistake is not choosing the wrong tier. It is assuming you have a tier when you have a container, because a container boundary is not an isolation boundary. Four vendors just proved that at four altitudes in one week. Pick yours on purpose.

This analysis synthesizes Give Your AI Agent Its Own Computer (LangChain, June 2026), Designing Agentgateway (Solo.io, June 2026), and Running Python in a Sandbox with MicroPython and WASM (Simon Willison, June 2026).

Victorino Group helps enterprises choose the right containment tier for their agent fleet. Let’s talk.