Governed Implementation

The Containment Convergence: Why NVIDIA, Docker, and OpenAI All Shipped Agent Sandboxes This Week

TV
Thiago Victorino
12 min read
The Containment Convergence: Why NVIDIA, Docker, and OpenAI All Shipped Agent Sandboxes This Week

In February, we documented four companies shipping containment infrastructure in one week. Cursor built OS-level sandboxing. Docker launched shell sandboxes. Zenity published activation probes. Entire raised $60 million for checkpoint-based governance.

Three weeks later, it happened again. Bigger.

On March 11, NVIDIA open-sourced OpenShell, a four-layer policy-governed runtime for AI agents. The same day, Docker published its agent team orchestration architecture with microVM isolation. OpenAI shipped the Responses API computer environment with egress proxies and domain-scoped credential injection. On March 16, NVIDIA unveiled NemoClaw at GTC, turning the viral OpenClaw framework into an enterprise-grade platform with built-in security. And OpenAI published a deep technical argument for why Codex Security uses semantic trust boundaries instead of traditional static analysis.

Five launches. One week. No coordination.

The containment pattern we identified in February is no longer emerging. It is converging on a shared architecture.

The Shared Architecture

What makes this week different from February is not the number of launches. It is the similarity of the designs. Three organizations, with different business models and different customer bases, independently arrived at the same containment stack:

Declarative YAML governance. NVIDIA OpenShell uses YAML policy files that control four layers: filesystem, network, process, and inference. Static policies lock at sandbox creation. Dynamic policies (network, inference) support hot-reload via openshell policy set without container restart. Docker Agent uses YAML configuration files that define agent roles, models, sub-agent hierarchies, and toolset permissions. The governance specification has become a YAML file.

Egress proxy as the trust boundary. OpenShell’s policy engine intercepts all outbound connections and performs one of three actions: allow, route for inference, or deny with logging. OpenAI’s Responses API routes all outbound network requests through a sidecar egress proxy that enforces allowlists and access controls. Docker Sandboxes provides network isolation with configurable allow/deny lists. The pattern is identical across all three: no direct outbound access. Everything goes through a policy layer.

Credential isolation from the agent environment. OpenShell injects API keys as environment variables at runtime; they never exist on the sandbox filesystem. OpenAI uses domain-scoped secret injection at egress --- the model and container only see placeholders, while raw secret values stay outside model-visible context and are applied only for approved destinations. Docker’s earlier architecture used network proxy credential injection with sentinel values. The agent never holds the real key.

This is not a coincidence. It is convergent evolution. The same engineering constraints --- agents need network access, agents should not hold secrets, agents need bounded autonomy --- produce the same architectural solutions regardless of who is building them.

NVIDIA OpenShell: Policy as Infrastructure

OpenShell is the most architecturally explicit of the three. Its four-layer security model maps containment to a taxonomy:

LayerProtectionMutability
FilesystemRestricts reads/writes to allowed pathsLocked at creation
NetworkBlocks unauthorized outbound connectionsHot-reloadable
ProcessPrevents privilege escalation, dangerous syscallsLocked at creation
InferenceReroutes model API calls to managed backendsHot-reloadable

The entire stack runs as K3s (lightweight Kubernetes) within a single Docker container. No separate cluster installation. The Gateway manages sandbox lifecycle and authentication boundaries. The Policy Engine applies constraints. The Privacy Router routes LLM calls to controlled backends while preserving sensitive context locally.

The distinction between static and dynamic policies is the most interesting design choice. Filesystem and process policies cannot change after sandbox creation --- an agent cannot negotiate its way to broader file access or privilege escalation mid-session. Network and inference policies can be updated in real time, allowing operators to adjust what an agent can reach without destroying and recreating the environment.

This maps to how real-world governance works. Some boundaries are constitutional --- they do not change during the session. Others are operational --- they adapt as the task evolves. OpenShell encodes that distinction directly into the policy engine.

Docker: From Sandboxes to Agent Teams

Docker’s contribution this week extends beyond isolation. The Docker Agent + Docker Sandboxes architecture addresses a problem the February launches did not: multi-agent coordination inside containment boundaries.

Docker Agent allows teams of specialized agents --- a product manager, a designer, an engineer, a QA specialist --- each with its own model, its own context, and its own toolset permissions. The root agent coordinates delegation. Each agent operates inside a Docker Sandbox running in a dedicated microVM (Docker Desktop 4.60+).

The security model is straightforward. Agents cannot access files outside the mounted workspace. Network isolation uses configurable allow/deny lists. The workspace mounts preserve absolute paths, so error messages and scripts with hardcoded paths work as expected. If an agent makes a mistake, it is contained in its microVM. docker sandbox rm and start fresh.

The multi-agent dimension matters because enterprise deployments will not be single-agent. A workflow that involves research, code generation, testing, and deployment requires multiple agents with different capabilities and different trust levels. Docker’s architecture lets you define those trust levels per agent, not per environment.

OpenAI: The Agent Runtime

OpenAI’s Responses API computer environment is the most complete agent runtime of the three, but it follows the same containment principles.

The shell tool lets the model propose commands; the Responses API orchestrates execution in a hosted container. The model proposes multiple shell commands concurrently. The API multiplexes execution sessions. Output is bounded per command to control context budgets.

The containment layer is in the networking. All outbound requests flow through a centralized egress proxy. Domain-scoped secret injection means the model never sees raw credentials --- only placeholders that the proxy swaps for real values at approved destinations. This is architecturally identical to Docker’s sentinel-value approach from February and to OpenShell’s credential injection via environment variables.

OpenAI also introduced Agent Skills --- reusable workflow packages with a SKILL.md file and supporting resources. Skills are versioned bundles loaded into the container before execution. The model discovers and executes them through shell commands. This is not containment per se, but it is the workflow layer that makes containment practical. Without reusable patterns, every agent session rediscovers the same workflows, and governance becomes impossible to standardize.

NemoClaw: Enterprise OpenClaw

NVIDIA’s second launch of the week, NemoClaw, addresses the governance gap that made OpenClaw unsuitable for enterprises. OpenClaw, the open-source local agent framework that went viral in early 2026, was found to have an unsecured database that allowed anyone to impersonate any agent on the platform. Several large companies, including Meta, banned it from corporate machines.

NemoClaw is OpenClaw with enterprise security baked in. Jensen Huang’s framing at GTC was explicit: “Every company in the world today needs to have an OpenClaw strategy, an agentic systems strategy.” NVIDIA worked with OpenClaw’s creator Peter Steinberger to build NemoClaw as the enterprise-safe version.

The platform integrates with NVIDIA’s NeMo framework and NIM microservice layer. The backbone model is Nemotron 3 Nano --- 30 billion parameters, 1 million token context window, hybrid Mixture-of-Experts architecture. A ~100 billion parameter Super variant is expected soon.

NemoClaw is currently alpha. But the strategic pattern is clear: take a viral open-source agent framework, add governance infrastructure, and position it as the enterprise standard. This is the CUDA playbook applied to software --- give away the platform to create dependency on the hardware.

Codex Security: Trust Boundaries Beyond the Sandbox

The fifth piece of this week’s convergence is OpenAI’s explanation of why Codex Security does not start with a SAST report. This is not a containment product, but it reveals a containment philosophy that matters.

Traditional static analysis traces data from source to sink and checks whether a sanitizer exists along the path. Codex Security argues this is insufficient because “there’s a big difference between ‘the code calls a sanitizer’ and ‘the system is safe.’” The check might run before decoding. The validation might not survive the transformation chain. The invariant might not hold across context boundaries.

Codex Security instead starts from the repository’s architecture and trust boundaries. It writes micro-fuzzers for suspicious code slices. It uses z3-solver for constraint reasoning. It executes hypotheses in sandboxed environments to produce proof-of-concept exploits.

The relevance to containment is this: runtime sandboxing constrains what an agent can do. Semantic trust boundaries constrain what an agent should do. The first is necessary but insufficient. An agent running inside a perfect sandbox can still produce code that is secure at the perimeter and broken at the logic level. As we noted in a previous analysis, sandboxing prevents exfiltration but not architectural drift, logic errors, or invariant violations.

The containment stack is growing a new layer: not just “what can the agent access” but “does the agent’s output maintain the trust properties the system requires.”

The Updated Containment Taxonomy

In February, we proposed a four-layer taxonomy. This week’s data expands it:

LayerApproachProviders (Feb)Providers (Mar)
OSSyscall + filesystem restrictionCursorOpenShell (filesystem + process layers)
VM/ContainerIsolated runtime + credential proxyDockerDocker Sandboxes (microVM), OpenAI (hosted container), OpenShell (K3s container)
NetworkEgress proxy + domain-scoped secretsDocker (credential proxy)OpenShell, OpenAI, Docker (all three)
ModelInternal activation probesZenity---
ProcessReasoning audit trailEntire---
PolicyDeclarative YAML governance---OpenShell, Docker Agent, NemoClaw
SemanticTrust boundary analysis---Codex Security

The February taxonomy had four layers. The March data adds three: network egress as a distinct layer (previously bundled with VM), declarative policy as infrastructure, and semantic trust analysis.

The convergence on the network egress layer is the most significant. When three independent organizations implement the same sidecar-proxy-with-credential-injection pattern in the same week, that is not a design choice. That is a design requirement. Egress control with secret isolation is becoming as fundamental to agent infrastructure as TLS is to web infrastructure.

What This Means for Enterprises

The February question was “what is your trust boundary?” The March question is more specific: have you implemented the three elements that every major platform now considers mandatory?

Declarative governance. Your agent containment policies should be YAML files in version control, reviewed like code, audited like IAM policies. If your governance is still per-action approval prompts, you are running a model that the entire industry has abandoned this month.

Egress proxy. Every outbound request from an agent environment should go through a policy layer that enforces allowlists, injects credentials at the domain level, and logs all traffic. If agents hold raw API keys or have unrestricted network access, your containment boundary has a hole exactly where every vendor this week put their primary defense.

Credential isolation. The agent should never see the real secret. Placeholders, sentinel values, environment-variable injection at runtime --- the specific mechanism varies, but the principle is now universal. The credential lives outside the agent’s environment and is applied only at approved destinations.

These are not aspirational recommendations. They are the architectural minimum that NVIDIA, Docker, and OpenAI shipped in production this week.

The Velocity of Convergence

In February, four companies shipping containment in one week was notable. In March, three companies independently shipping the same containment architecture in one week is a phase transition.

The containment pattern is not becoming a standard through committee or specification. It is becoming a standard through convergent implementation. When every major platform arrives at YAML governance, egress proxies, and credential isolation independently, the standard exists whether anyone has written it down or not.

For enterprises building agent infrastructure, the implication is clear: the architectural decisions are no longer debatable. The implementation details vary, but the containment stack has converged. Build to this architecture now, or spend next quarter retrofitting it.


Five launches. One week. One architecture. The containment convergence is not a trend. It is infrastructure becoming standard.

Sources: NVIDIA OpenShell | Docker Agent + Sandboxes | OpenAI Responses API | NVIDIA NemoClaw (TechCrunch) | Codex Security

Victorino Group helps organizations build agent containment infrastructure that matches the architecture the industry converged on this week. Let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation