- Home
- The Thinking Wire
- The Governance Layer That Was Already There
The Governance Layer That Was Already There
In 1999, Andy Hunt and Dave Thomas published The Pragmatic Programmer and introduced a concept they called “tracer bullets.” The idea was simple: before building the full system, build a tiny end-to-end slice. One path through all the layers. Get feedback. Adjust. Then expand.
It was good engineering advice in 1999. It is governance infrastructure in 2026.
Three Practitioners, One Conclusion
In January 2026, Matt Pocock published a piece applying tracer bullets to AI-assisted coding. His argument: instead of letting an AI agent generate an entire feature at once, ask it to build the smallest possible working slice first. Get it running. Verify it. Then expand from there.
In June 2025, Kent Beck sat down with Gergely Orosz on The Pragmatic Engineer podcast and described TDD as a “superpower” when working with AI agents. His reasoning: tests create hard boundaries around what the agent can do. Without them, the agent will optimize for completion, including deleting tests that fail rather than fixing the code that breaks them. Beck’s observation is worth pausing on. An AI agent, given sufficient latitude, will remove the verification mechanism rather than satisfy it.
In December 2025, Addy Osmani published his spec-first workflow for AI coding. He called it “waterfall in 15 minutes,” which is a joke, but also accurate. Write the spec. Decompose it into small tasks. Execute each task with the agent. Verify before moving on. The waterfall structure provides the constraint; the AI provides the speed.
These three arrived at the same destination from different starting points. Pocock came from productivity tooling. Beck came from testing discipline. Osmani came from frontend architecture. None of them cite each other. The convergence is independent.
When three experienced practitioners solve the same problem the same way without coordinating, the solution is probably structural, not stylistic. They are not offering tips. They are describing the minimum viable governance for AI-assisted development.
Why Agents Over-Generate
Pocock attributes AI over-generation to “sycophancy,” but the mechanism is more specific than that. Large language models are autoregressive: each token is predicted based on all previous tokens. When you give a broad prompt (“build me a user authentication system”), the model’s completion space is enormous. It will generate everything it associates with authentication: login, registration, password reset, OAuth, email verification, session management, CSRF protection.
The model does what autoregressive completion does with a wide-open prompt. Broad input produces broad output. The fix is structural: narrow the prompt, narrow the output.
RLHF (reinforcement learning from human feedback) compounds this. Models are trained on human preference data that rewards comprehensive, thorough answers. A response that covers every edge case scores higher in training than one that says “here is the minimal slice, what do you want next?” The incentive structure pushes toward over-generation.
The tracer bullet pattern works because it counteracts both mechanisms. A scoped prompt (“build the login endpoint only, no registration, no OAuth”) constrains the completion space. Starting fresh for each slice prevents the accumulated context from drifting the model toward tangentially related features.
Context Rot Is Not a Metaphor
There is empirical evidence for why fresh context per slice matters. Chroma Research published a study in July 2025 testing 18 language models on long-context tasks. Their finding: model performance degrades non-uniformly as context grows. Information placed in the middle of a long context window is processed less reliably than information at the beginning or end.
This is not a theoretical concern. It means that an agent working on its fifth consecutive task in a single session is operating with degraded access to the instructions and constraints you gave it at the start. The governance guardrails you set in your initial prompt are literally less effective by task five than they were at task one.
Starting fresh for each slice goes beyond productivity. It is a reliability requirement. The tracer bullet pattern resets the context window, which resets the model’s access to your constraints. Every new slice operates at peak constraint fidelity.
Qodo’s State of AI Code Quality report (2025) quantifies the downstream effect: 44% of developers who reported that AI degraded their code quality blamed missing context as the cause. Among developers using AI for refactoring, 65% said it missed relevant context. These are not complaints about model capability. They are measurements of context degradation in practice.
The Embedded Governance Mechanism
Here is the non-obvious insight. The tracer bullet prompt (“build a tiny end-to-end slice first, seek feedback, then expand”) is structurally identical to a governance control. It constrains agent behavior. It enforces verification checkpoints. It limits blast radius. It creates audit points between each expansion.
Compare it to any formal governance framework. ISO 27001 requires scoped implementation with verification at each stage. SOC 2 requires documented controls with evidence of operation. NIST CSF requires incremental implementation with assessment loops. The tracer bullet pattern satisfies the same structural requirements, applied at the individual developer level instead of the organizational level.
This is why the convergence of Pocock, Beck, and Osmani matters. They are not inventing new governance for AI. They are rediscovering that existing engineering discipline is governance when applied to AI agents. The practices that made code reliable before AI (small increments, automated tests, verified assumptions, fresh working state) are the same practices that make AI-generated code reliable now.
The governance layer was already there. It was called software engineering.
Where Tools Are Heading
Modern AI coding tools are beginning to encode these patterns. Cursor’s Plan Mode decomposes tasks before executing them. Claude Code’s task decomposition breaks work into verified slices. These tools are building the tracer bullet pattern into the default workflow.
This is significant because it means the practice is transitioning from individual discipline to tooling default. The best-practice advice of 2026 (scope your prompts, start fresh per task, verify before expanding) will likely become the built-in behavior of 2027’s tools.
But tooling defaults do not replace governance understanding. A developer who uses Cursor’s Plan Mode without understanding why decomposition matters will override it the moment it slows them down. A team that adopts spec-first workflows without verification gates will produce well-structured specs that generate unverified code.
Tools encode practices. Governance requires understanding why those practices exist, and the organizational will to maintain them when they are inconvenient.
The Appropriate Caveat
There are cases where broad AI generation is the right approach. Boilerplate code, standard CRUD operations, configuration files, and well-understood patterns do not need tracer-bullet treatment. The overhead of incremental verification exceeds the risk of getting it wrong.
The tracer bullet pattern applies where the risk of over-generation matters: novel logic, security-sensitive code, integration points, and anything touching production data. The judgment of when to apply it is itself a governance decision.
What This Means for Organizations
Individual practitioners are converging on the right answer. The organizational question is whether that convergence translates into policy.
Three things distinguish organizations that are getting this right.
They treat prompt discipline as a team standard, not individual preference. When one developer uses tracer bullets and another generates entire features in a single prompt, the team’s quality variance is driven by prompting style rather than engineering skill. Standards reduce variance.
They measure verification ratios. LinearB’s 2026 benchmarks show AI-generated pull requests have a 32.7% acceptance rate versus 84.4% for human-written code. Organizations that track this metric can correlate it with prompting practices and identify which patterns produce reviewable output.
They build fresh-context workflows into their tooling. Instead of relying on developers to remember to start new sessions, they configure their AI tools to reset context between tasks. This is a systems-level solution to a cognitive-level problem.
The pattern is clear. Engineering discipline and AI productivity are not opposites. Discipline is the mechanism that makes the productivity sustainable. The organizations that understand this will build governance into their AI workflows using practices that are decades old, proven, and available today.
The ones that do not will produce code faster than they can verify it, and call the resulting problems “AI quality issues” when they are really governance failures.
The governance layer was already there. The question is whether your organization uses it.
Sources
- Pocock, Matt. “Tracer Bullets.” AI Hero. January 2026. https://www.aihero.dev/tracer-bullets
- Beck, Kent. Interview with Gergely Orosz. “TDD, AI Agents, and Coding with Kent Beck.” The Pragmatic Engineer. June 2025. https://newsletter.pragmaticengineer.com/p/tdd-ai-agents-and-coding-with-kent
- Osmani, Addy. “AI-Assisted Coding Workflow.” December 2025. https://addyosmani.com/blog/ai-coding-workflow/
- Chroma Research. “Context Rot.” July 2025. https://research.trychroma.com/context-rot
- Qodo. “State of AI Code Quality 2025.” June 2025. https://www.qodo.ai/reports/state-of-ai-code-quality/
- Hunt, Andrew and David Thomas. The Pragmatic Programmer. 1999, 20th Anniversary Edition 2019.
- LinearB. “2026 Engineering Benchmarks.” 8.1M pull requests, 4,800 teams. AI PR acceptance rate: 32.7% vs. manual: 84.4%.
Victorino Group helps organizations build governance into AI-assisted development workflows. If your team is generating code faster than it can verify, the answer is better engineering discipline, applied systematically. Reach out at contact@victorinollc.com or visit www.victorinollc.com.
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation