Governed Agent Infrastructure at Scale: What KernelEvolve and AWS DevOps Agent Actually Prove

Two announcements landed in the same week. Meta published KernelEvolve, an AI agent that autonomously generates and optimizes hardware kernels across NVIDIA, AMD, MTIA, and CPU platforms. AWS announced DevOps Agent as generally available — the first autonomous SRE product from a major cloud provider.

Neither is a demo. Both run in production. And both embed governance so deeply into their architecture that removing it would break the system.

That is the pattern worth examining.

KernelEvolve: Evaluation as the Entire Architecture

KernelEvolve does not generate kernels. It searches for them. The distinction matters because it determines where governance lives.

A one-shot generation approach would produce a kernel, run it, and hope. KernelEvolve treats optimization as a search problem: generate candidates, profile them across multiple levels, feed diagnostic information back into the next generation cycle, and repeat until correctness and performance constraints are satisfied.

The results validate the approach. On Meta’s Andromeda ads ranking model, KernelEvolve achieved a 60% inference throughput improvement on NVIDIA GPUs and a 25% training throughput improvement on Meta’s own MTIA silicon. It scored a 100% pass rate on KernelBench, a benchmark suite of 250 kernel optimization problems. And it maintained 100% correctness across 160 PyTorch ATen operators validated on three hardware platforms — 480 configurations total.

Development timelines compressed from weeks to hours. The paper was accepted at ISCA 2026, the premier computer architecture conference. Twelve Meta engineers — including Gang Liao, Yavuz Yetim, Chunqiang Tang, and Carole-Jean Wu — contributed to the work.

What makes this a governance story rather than a performance story is the multi-level profiling with diagnostic feedback loops. The agent does not just run code and check if it crashes. It instruments execution at multiple granularities, compares against baselines, identifies where performance degrades, and uses that structured diagnostic signal to guide the next iteration. The feedback loop is not bolted on. It is the mechanism through which the agent reasons.

This is what we described in evaluation-driven development taken to its logical extreme: evaluation is not a quality gate the agent passes through. It is the substrate the agent thinks on.

AWS DevOps Agent: Productized Autonomous SRE

AWS DevOps Agent occupies a different part of the stack but embeds the same principle. It handles autonomous incident resolution, proactive prevention, and SRE task automation. AWS reports MTTR reduction from hours to minutes.

The governance mechanisms are different in implementation but identical in philosophy. Custom agent skills define what the agent can do. Reporting integration provides visibility into what the agent did. CI/CD integration constrains where the agent operates. Multicloud and on-premises support means the governance boundary extends across environments rather than being limited to a single cloud.

This is the first GA autonomous SRE product from a major cloud provider. “First GA” is the operative phrase. The infrastructure gap between agent demos and agent operations has been well documented. AWS closing that gap with a generally available product — not a preview, not a beta — signals that the market considers governed agent autonomy ready for production workloads.

The Pattern: Governance as Search Constraint, Not Post-Hoc Check

Both systems share a structural insight that separates them from the demo-stage agent work we covered in enterprise agent ops.

In demo-stage agents, governance is a filter applied after the agent acts. The agent generates output. A separate system checks the output. If the check fails, the output is rejected or retried.

In KernelEvolve and AWS DevOps Agent, governance is a constraint that shapes how the agent searches for solutions. KernelEvolve’s profiling feedback does not reject bad kernels after they are generated. It steers the generation process toward correct and performant candidates. AWS DevOps Agent’s skill definitions do not audit actions after execution. They define the action space the agent operates within.

The difference is not philosophical. It is architectural. Post-hoc governance scales linearly with output volume — every output needs checking. Governance-as-search-constraint scales with the definition of the constraint space, which changes infrequently.

What This Means for Agent Infrastructure Decisions

Organizations building agent infrastructure face a design choice that these two systems make concrete.

Option one: build agents that act freely and govern them with monitoring and rollback. This is the pattern most teams default to because it is simpler to start. The agent does its thing; you watch and intervene when it goes wrong.

Option two: build agents whose reasoning process is structured by evaluation. The agent does not produce outputs and wait for approval. The agent’s search process is constrained by correctness criteria, profiling feedback, and defined action boundaries.

KernelEvolve demonstrates that option two works at hardware-level performance optimization — a domain where incorrect output is not just costly but dangerous (silent data corruption, hardware damage at scale). AWS DevOps Agent demonstrates that option two is commercially viable as a managed service across heterogeneous infrastructure.

The 100% correctness validation across 480 configurations is not a quality metric. It is a governance metric. It means the system’s evaluation pipeline is comprehensive enough that no incorrect kernel escaped to production across three hardware platforms and 160 operator types.

The Evaluation Pipeline Is the Product

Strip the evaluation pipeline from KernelEvolve and you have a code generation agent that sometimes produces fast kernels and sometimes produces incorrect ones. Strip the skill definitions and reporting from AWS DevOps Agent and you have an automation tool that sometimes fixes incidents and sometimes creates them.

The governance layer is not overhead. It is the mechanism that makes autonomous operation viable. This is the thesis from evaluation-driven development confirmed by production systems at Meta and AWS scale: evaluation infrastructure is not a cost center. It is the product.

For teams planning agent infrastructure investments, the implication is direct. Budget for evaluation and governance infrastructure at least as much as for the agent capabilities themselves. The agent that generates candidates is table stakes. The evaluation pipeline that ensures only correct candidates ship is the competitive advantage.

This analysis draws on Meta Engineering’s KernelEvolve publication (April 2 2026, ISCA 2026 accepted paper) and the AWS DevOps Agent GA announcement (March 31 2026).

Victorino Group helps organizations design agent infrastructure where governance is architectural, not afterthought. Let’s talk.