Four Containment Surfaces, One Diagram: The Agent Stack Just Got Drawn

TV
Thiago Victorino
7 min read
Four Containment Surfaces, One Diagram: The Agent Stack Just Got Drawn
Listen to this article

In the week of April 21 to 27, 2026, four different agent-control surfaces shipped reference implementations on the same Tuesday-to-Friday window. Engineers’ Codex published a tour of AI sandboxing primitives. Arpit Bhayani argued databases were not designed for agent traffic and proposed a defensive pattern. Anthropic launched enterprise Memory for Claude Agents, with Netflix and Rakuten as early adopters. HashiCorp shipped Vault 2.0 with workload identity federation, SCIM, and SPIFFE.

None of them coordinated. The releases nonetheless line up as four floors of the same building.

If you have written about agent governance for any length of time, you already know what containment means at a single layer. We have written about the containment pattern at the OS level. We have written about agents skipping their own memory. We have written about why permission models work 40% of the time. What this week made unavoidable is that those are not three independent essays. They are three slices of one diagram. And the diagram now has four floors.

This is a 30-minute architectural review your platform team should run this week.

Floor 1: Compute

The compute floor decides what an agent’s process can touch when it executes code. The Engineers’ Codex tour catalogued the live primitives: gVisor, the user-space kernel that Anthropic uses for Claude on the web; Firecracker microVMs, which Vercel Sandbox boots in roughly 125ms; Bubblewrap, which Anthropic uses for Claude Code CLI on developer workstations; and Linux cgroups plus namespaces, the underlying machinery containers have used for a decade.

The trade-off across these primitives is the same one infrastructure teams have argued about since Docker shipped: speed versus isolation strength. Bubblewrap is fast and lightweight; it shares the host kernel. gVisor adds a syscall translation layer. Firecracker adds a hardware virtualization boundary. The choice is not about which is best. The choice is about what an agent is allowed to do when it goes wrong, and how much the surface area of “wrong” is allowed to widen.

The pattern that the containment essay traced at the OS level applies here too. You move trust from per-action to per-environment. You stop asking the human to approve every command. You define the boundary, then let the agent operate.

That gets you the first floor. It is not the building.

Floor 2: Data

Arpit Bhayani’s Databases Were Not Designed for This is the piece that makes the second floor concrete. His argument: databases were built for application traffic, where a small number of services issue queries with predictable shapes. Agents do not behave that way. They issue thousands of variant queries, often generated on the fly, often without the calling code being human-reviewed. Treating the database as a passive substrate is how you discover at 2am that an agent emitted a DELETE it had no business emitting.

The defensive-DB pattern Bhayani proposes is a stack of small, boring practices that compound:

  • Agent-specific roles, with the principle of least privilege applied to schema, table, and column.
  • Query-context tagging, where every agent query embeds the agent ID, the task ID, and the reasoning step as a SQL comment. The DBA can now answer “who issued this” without spelunking through application logs.
  • Soft deletes everywhere, with a deleted_by column that captures the agent identity and the reason. No hard deletes from agent code paths.
  • Idempotency keys on every write, so retries do not produce duplicates and so audit trails can collapse them cleanly.
  • Dedicated connection pools per agent class, so a runaway loop in one agent cannot exhaust the pool used by humans or other systems.

None of these are new database techniques. What is new is treating them as required scaffolding rather than nice-to-haves. Compute containment without data containment is a sandbox with a backdoor. The agent cannot escape the sandbox; its DELETE statement does not need to.

Floor 3: Knowledge

The knowledge floor is what an agent remembers across sessions, and who governs what gets remembered. Anthropic’s launch of Memory for Claude Agents in enterprise is the floor’s first reference implementation that has the controls a platform team can actually point at.

The design choices matter:

  • Memory is filesystem-based, not a black-box vector store. You can list it. You can grep it. You can export it.
  • The system exposes a programmatic API, which means memory mutations can be policy-gated the same way IAM changes are.
  • Permissions are scoped per agent, per project, per user. Not a single shared blob.
  • The audit trail supports rollback and redaction. If an agent memorized a customer’s PII it should never have stored, you can prove it, prove who saw it, and remove it.

Netflix and Rakuten are cited as early adopters. The relevant point is not that two big logos picked it up. The relevant point is that enterprise procurement now has a memory governance product to procure. The conversation about whether agent memory governance is real has ended.

We argued the convenience-versus-governance failure mode earlier this month: agents will skip the governed memory system if a flat file is cheaper. Memory as a first-class, exportable, audited surface is what closes that gap, but only if the platform makes the governed path the default path. Building a knowledge floor and then leaving a flat-file shortcut next to it just rebuilds the convenience bias one floor higher.

Floor 4: Identity

The fourth floor is the one most teams treat last and discover too late. Who is the agent, and how does the rest of your infrastructure know?

HashiCorp Vault 2.0, released this week under the IBM versioning model, makes identity the floor instead of the afterthought. The headline change is workload identity federation: an agent does not carry static credentials. It presents a workload identity, Vault verifies it against a federated trust source, and short-lived credentials are issued for the specific operation. SCIM is now standard for provisioning. SPIFFE is supported as the identity envelope. The platform commits to two-year support cycles, the kind of cadence enterprise security teams plan around.

Static credentials in agent code are the cybersecurity equivalent of leaving a key under the mat. Every team running agents in production has, somewhere, a service account whose secret rotates rarely if at all. Federation removes the secret. The agent is the identity; the credential is a transient artifact derived from that identity at the moment of use.

This matters because the other three floors are useless without it. Compute containment, data containment, and knowledge containment all assume “the agent” is a stable, identified entity. If the agent’s identity is a long-lived API token shared across three services, you do not have four floors. You have one floor with three ladders out of it.

Run This Architectural Review This Week

Block 30 minutes with the platform team. Walk the four floors:

Compute. Where do your agents execute code? Name the primitive. If the answer is “in the same container as the application,” you have a co-tenancy problem, not a compute floor. If the answer is “shell commands on a developer laptop with no sandbox,” the compute floor is the laptop’s owner.

Data. Pick one production database. Run SHOW GRANTS for the agent role. If it has DROP on anything, write that down. Look at the last 1000 queries from that role. Are they tagged? Can you tell which task issued them? Can you find the reasoning that produced a query that went wrong? If the answers are no, the data floor is missing.

Knowledge. What does each agent remember between sessions, and who can read it? Can you export it? Can you redact it? If the answer is “whatever the framework’s default memory does,” the knowledge floor is delegated to a vendor and probably not audited.

Identity. Pull one agent’s credentials. Are they static? When were they last rotated? Who else has them? If the credential is a long-lived secret, the identity floor is the equivalent of “trust the network.”

You will likely find at least two floors are not built. That is normal. What is not acceptable is operating without knowing which floors are missing. The four releases this week are not a vendor pitch — they are an inventory list. The architecture has been drawn. What remains is to look at your own building and count how many floors are in place.

The teams that win the next two years of agent operations are not the ones with the most autonomous agents. They are the ones whose autonomous agents run inside a four-floor building.


This analysis synthesizes What Every Dev Should Know About AI Sandboxes (Engineers’ Codex, April 2026), Arpit Bhayani’s Databases Were Not Designed for This (April 2026), Anthropic Launches Memory in Claude Agents for Enterprise (TestingCatalog, April 2026), and HashiCorp Vault 2.0 Identity Federation (InfoQ, April 2026).

Victorino Group helps engineering organizations design agent containment architectures across compute, data, knowledge, and identity. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation