Governance Belongs in the Backlog

Daniel Epstein, a Partner Tech Strategist at Microsoft, wrote something in May 2026 that most teams chasing better models will read too fast. The failures they keep hitting when agents build software at scale are not coming from the model. They are coming from a missing process. His sentence is blunt: “This is not a model problem; it is a process problem. Upgrading the model does not fix missing acceptance criteria.” Then comes the part that should stop a reader cold: “A more capable agent working against an ambiguous spec produces more sophisticated drift, not less.”

That second clause inverts the instinct most leaders are operating on. The plan everyone has is to wait for the next model, drop it in, and watch the quality problems dissolve. Epstein is saying the opposite happens. The smarter the executor, the more convincing its wrong answers become. Governance has to live somewhere the model upgrade cannot route around. He puts it in the backlog.

A Better Model Makes a Loose Spec Worse

Think about what a more capable agent actually does with ambiguity. It does not pause and ask. It fills the space, and it fills it with more competence than the last model had. A weaker agent given a vague story produced obvious garbage, the kind a reviewer caught in thirty seconds because it looked wrong. A stronger agent given the same vague story produces something that looks right, compiles clean, passes the tests it wrote for itself, and quietly violates an invariant nobody wrote down.

This is why “wait for the next model” is not a quality strategy. Capability raises the ceiling on output and the ceiling on plausible-looking drift at the same time. The thing keeping the two apart is the specification. Epstein is precise about this: “Agents operate against specifications, not open-ended prompts. Each story defines inputs, outputs, and invariants.” An agent is not a colleague who shares your context. It is an executor whose entire world is the artifact in front of it. If that artifact says what should happen but not what must never happen, the agent treats the silence as permission.

So the constraint is not the agent’s reasoning. It is the precision of the work item the agent reasons against. And precision is not a model property. It is a process property. You write it, you version it, you enforce it.

Governance Is a Property of the Backlog

Here is the move Epstein makes that the companion reframe of Agile for agents does not. He does not just say keep your contracts. He says where those contracts have to live: inside the story itself, as acceptance criteria, before any agent touches the work.

Most teams treat governance as a checkpoint. Build the thing, then run it past a reviewer who checks for architectural violations, security exposure, and broken invariants. Epstein’s line aimed at exactly this habit deserves to be printed and taped above the board: “If you are catching architectural violations during final review rather than during story execution, your governance is too late.”

Read that as a relocation order. A safety constraint that lives in a reviewer’s head, or in a wiki, or in a final-review checklist, is a constraint the agent never saw while it was working. The agent already drifted. The review only tells you how far. Move that same constraint into the acceptance criteria of the story, written as an invariant the agent must satisfy to call the work done, and it becomes something the agent executes against rather than something a human discovers after the fact. The story stops being a request. It becomes the governance surface.

This is concrete, not philosophical. Every card on the board carries its own gate. “Must not write to the legacy table.” “Must preserve the existing API contract for v1 callers.” “Must reject input that fails schema validation rather than coercing it.” Those are not review comments. They are entries in the story, present before the first line of code, visible to the executor the whole time.

CI Is Story One, Review Gates Sit Between Waves

Epstein backs the principle with a failure he lived through, and it is the most useful part of the piece because it is a mistake, not a theory. On his Minthe project, the CI pipeline was not the first story built. It came later. By the time the automated gate existed, quality issues had already accumulated across several waves of work, and features had to be reopened and rebuilt against the standard the pipeline finally enforced. The lesson he draws is unambiguous: validation infrastructure, CI/CD, linting, automated tests, should be the first story you implement, not the last.

The reasoning follows directly from the drift argument. If governance only works when it sits in front of the agent, then the machinery that enforces governance has to exist before the agent produces anything worth governing. Build the gate, then build through it. A team that ships six waves of features and then adds CI has six waves of unguarded output to re-audit. A team that ships CI as wave zero has a standard every later wave is measured against on arrival.

The same logic explains why the review gates sit between delivery waves instead of at the very end. Epstein’s workflow runs Plan, Issue, Implement, Review, Merge, Docs, with persistent agent context held in repository files like .github/copilot-instructions.md, CLAUDE.md, and STYLE.md, and a review checkpoint after each wave rather than one final inspection. Drift compounds. Catching it after wave two costs one wave of rework. Catching it after wave six costs five. A gate between every wave keeps the blast radius to a single increment, and it keeps the persistent context files honest, because each review is a chance to tighten the rules the next wave inherits.

What This Means for Any Team Adopting Agents

The shape of the advice is general even though Epstein’s evidence is one engineer’s project. He is candid that this is a methodology piece, not a study. There are no benchmarks here, no percentages, no controlled comparison. It is the first of a planned series, grounded in his Minthe work and colleague practice. Treat it as a well-argued hypothesis from inside Microsoft, not as measured proof.

What survives that caveat is the structure, and the structure travels. Any team putting agents on real work inherits the same physics. Ambiguity gets filled by the executor. A more capable executor fills it more convincingly. Constraints discovered at review are constraints discovered too late. The remedy does not depend on which model you run or which vendor you buy. It depends on whether your work items carry their own invariants and whether your gate exists before your output does.

This generalizes past code, too. A marketing brief, a legal intake form, a data pipeline spec: each is a backlog item an agent executes against, and each can carry its acceptance criteria up front or surface its violations at review. The discipline is identical, and it is the same one we traced through the automation governance deficit and through the org inversion at teams that deleted their standups. The only question is whether you write the constraint into the card or pay to find it later.

Do This Now

Open your agent backlog and pick the next story you plan to assign. Before it goes out, add three lines to it: one invariant the work must preserve, one thing it must never do, and one concrete piece of evidence that proves both. Then check one fact about your setup, whether your CI gate runs before that story merges or after. If the gate runs after, you have just found your real first story, and it is not the feature you were about to build. It is the pipeline that should have shipped before any of them.

This analysis synthesizes Agentic-Agile: Why Agent Development Needs Agile (Not Just Prompts) (Daniel Epstein, Microsoft, May 2026).

Victorino Group helps teams move governance upstream into how AI work is specified and reviewed. Let’s talk.