The Containment Stack Just Filled Out: Four Layers, One Week

TV
Thiago Victorino
8 min read
The Containment Stack Just Filled Out: Four Layers, One Week

Between May 20 and May 21, four organizations shipped four very different things into the same problem space. Dropbox open-sourced Nova, an internal platform that wraps coding agents in workflow isolation. The CNCF announced Prempti, a Falco-derived policy layer that intercepts the actions an agent tries to take before they reach the host. Google released Agent Executor (the open-source ax runtime) plus Agent Substrate on Kubernetes, a distributed runtime for agents that need to survive restarts, branch their own trajectories, and scale to millions of registered instances. IBM, on the same Think keynote, made the executive case for “digital workers” as a managed labor class with badges, onboarding, and retirement.

Read in isolation, each release is a vendor announcement. Read together, in the order they appeared in your feed, they describe four floors of a building that was sketched out a month ago in the original containment stack essay and is now being filled in by separate companies who did not coordinate. The interesting story this week is not that they all shipped. It is that they shipped at different altitudes.

Layer 1: Workflow Isolation (Dropbox Nova)

Dropbox’s Nova post is the most ground-floor of the four releases. Nova is a platform for running coding agents inside Dropbox’s own engineering workflows, with three constraints that matter:

A five-iteration cap on every workflow. An agent that has not converged after five tries is not allowed a sixth; the workflow halts and a human takes over. The platform refuses to spend infinite tokens chasing a bad plan.

A deflaker that validates each candidate fix against 100+ CI runs before merging. Coding agents propose code constantly; the bottleneck is not generation, it is verifying that the proposal does not introduce a flake. Nova treats the deflaker as a first-class workflow component, not an afterthought.

Hermetic per-commit snapshots. Each agent run gets a frozen view of the repo at a known commit, so reruns are reproducible and concurrent agents do not see each other’s half-finished work.

What Dropbox shipped is not a sandbox in the operating-system sense. It is a sandbox in the workflow sense: the agent runs against a bounded version of the repository, with a bounded number of attempts, gated by a deterministic validator. The trust boundary is the workflow itself. This is the layer closest to the agent’s actual job, and it is where most teams accidentally have nothing.

Layer 2: Action Interception (CNCF Prempti)

One floor up from workflow isolation sits action interception. The CNCF’s Prempti announcement is the reference implementation that did not exist last month. Prempti is built on the Falco runtime security project and watches for the things coding agents try to do that they should not be doing: reading SSH keys, exfiltrating AWS credentials, modifying MCP server configurations to escalate their own permissions, injecting commands into git hooks.

The design decision worth naming: Prempti is pre-execution, not post-hoc audit. A logged violation is useful for forensics, but it does not stop the laptop’s SSH key from leaving the building. Falco’s kernel-level instrumentation lets Prempti block the syscall before it completes. It supports Claude Code today on Linux, macOS, and Windows, with Codex on the roadmap.

This layer answers a question the workflow layer cannot: “What is the agent actually trying to do on the host?” Nova’s five-iteration cap does not protect you if iteration three quietly reads ~/.ssh/id_rsa and POSTs it to a Discord webhook. The workflow layer trusts the workflow’s intent. The action layer trusts nothing and inspects every reach into the system.

Prempti also produces the telemetry that the next two layers depend on. Without per-action attribution, you cannot tell which agent did what, and the upper floors lose their ability to make decisions about specific agent instances.

Layer 3: Runtime Durability (Google Agent Executor)

Two floors up, you hit the question Google chose to answer this week: how do agents survive at scale? The Agent Executor announcement and the corresponding google/ax repository define a runtime, not a sandbox. The primitives are durable execution (an agent can crash and resume mid-trajectory), secure sandboxes per agent process, trajectory branching (the agent can fork its own reasoning and discard the worse branch), and Agent Substrate, a Kubernetes-backed registry designed for millions of registered agents.

The runtime is A2A-protocol compatible, which means agents written for it can interoperate with other A2A endpoints, including the agent-to-agent ecosystem we covered in the Cloud Next notes. The deliberate choice is to make the runtime, not the framework, the thing that scales. The agent’s job graph is the unit of execution; the framework that produced it is interchangeable.

Durability is the floor people skip because everything works fine until it does not. An agent halfway through a 40-step trajectory gets evicted by a Kubernetes node failure. Without durable execution, the agent restarts from step one, burns the tokens again, possibly takes a different path, and quietly drifts. With durable execution, it picks up at step 23 with the same context and continues. The difference is invisible on a dashboard until you count the wasted compute and the inconsistent outcomes.

Agent Executor sits above the action layer because it assumes the host is already protected. It is the layer where an agent becomes a long-running, observable, restartable workload, the same way services became long-running observable workloads a decade ago.

Layer 4: Lifecycle Management (IBM Digital Workers)

The top floor is the one IBM staked out at Think this week. The Mohamad Ali keynote (SVP, IBM Consulting, speaking under the Krishna mandate) framed agents not as code but as workers with a lifecycle: hired, onboarded, badged, audited, retired. The Pearson partnership produces skill badges that gate which agents are allowed to take which jobs. Providence Health cut nurse recruitment cycles by 12 days using a digital worker pool. IBM’s own internal application of the model decomposed 490 consulting workflows, claims $4.5B in productivity savings, and credits a 20 percentage-point profit lift in consulting between 2024 and 2025.

Strip the keynote framing and the operational claim is this: at enterprise scale, agents are not workloads, they are headcount. The questions HR has always asked about employees apply to agents at this layer. Who do they report to? What are they certified to do? What do you do when they go wrong? What is the offboarding procedure that removes their access cleanly? IBM’s bet is that organizations operating at hundreds-to-thousands of agents need an HR-shaped layer above the runtime, not another runtime.

This is the layer that does not fit any of the lower three. Nova manages workflows. Prempti manages actions. Agent Executor manages processes. None of them answer: “Should this specific agent be allowed to take this specific job today?” That is a lifecycle question. The badge, the role, the retirement, the audit trail of who hired this agent and why, all of it sits above the runtime and below the business decision.

Why the Stack Diagram Matters More Than Any Single Vendor

You do not need to pick Dropbox, CNCF, Google, or IBM. You need to pick a layer to be honest about. If you are running coding agents and your only control is the prompt, you are missing layer 1. If you have workflow isolation but no action interception, an iteration cap will not save you from an exfiltration. If your agents are restarting from scratch every time a node dies, you have no layer 3. If you are running more than fifty agents in production and you cannot answer “who certified this one to touch billing data,” you have no layer 4.

The four vendors this week did not coordinate. They did, however, expose the layers cleanly enough that you can audit your own stack against the diagram. This is what governance as product looks like when the products arrive in the same week and slot into different floors. It is also why the convergence story from earlier this month was undercounting: it called the trend correctly but underestimated how fast the layers would differentiate.

The layer most teams will be tempted to buy first is layer 4 (it has executive narrative, board-friendly metrics, and ROI claims). The layer most teams actually need first is layer 2 (an agent without action interception is a credential exfiltration waiting to happen). The layer most teams already have an answer for is layer 3 (Kubernetes was already there; you just need a runtime that knows how to use it). The layer most teams underestimate is layer 1 (because the workflow constraints feel like a productivity tax until the first time an agent burns 200 iterations on a wrong plan).

Do This Now

Take 45 minutes with your platform lead this week. Draw the four layers on a whiteboard. For each layer, write the vendor or system that owns it in your stack, or write “none” if nobody owns it. Then count the “none”s. That number is your honest containment debt.

Then pick the lowest-numbered “none” and assign an owner. Not a project. An owner. Layer order matters because the upper floors assume the lower ones exist. Buying layer 4 lifecycle management before you have layer 2 action interception is hiring an HR director for a building with no front door.

The diagram is now drawn by people who do not work for you and who shipped on the same Tuesday. The hard part is the audit you run inside your own building. You will find at least one missing floor. That is the work for next quarter.


This analysis synthesizes Introducing Nova, Dropbox’s Internal Platform for Coding Agents (Dropbox Engineering, May 2026), Introducing Prempti: Policy and Visibility for AI Coding Agents (CNCF, May 2026), Introducing Agent Executor, Google’s Distributed Agent Runtime (Google Cloud, May 2026), and Managing Digital Worker Lifecycle (SiliconANGLE / IBM, May 2026).

Victorino Group helps teams choose containment layers that fit their actual workflow risk, not vendor marketing. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation