Three Voices, One Verification Gap: The Harness Is Cross-Discipline Now

TV
Thiago Victorino
7 min read
Three Voices, One Verification Gap: The Harness Is Cross-Discipline Now
Listen to this article

Three independent practitioners published in the same week, none coordinated, none citing each other, and none using the same vocabulary. Oskar Dudycz, an independent architect on Substack, called it the “harness.” Wei Zhang and Jessie Jie Xia, both at Thoughtworks, called it Structured-Prompt-Driven Development. Leah Tharin, a product writer, called it “direction over speed.” Different disciplines, different framings, one destination.

Each was pointing at the same shape: AI made the Act phase of work cheap, and the discipline that used to keep work honest — verification, observation, alignment between intent and output — has not yet caught up. The gap is now visible enough that three people from three corners of the practice independently named it within seven days.

We have been calling that gap the “harness” for some time. What changed this week is that the term is no longer ours alone.

Voice One: The Harness as OODA Acceleration

Oskar Dudycz’s “Vibing, Harness and OODA Loop” frames the problem in cybernetic terms. The OODA Loop — Observe, Orient, Decide, Act — was John Boyd’s model for adversarial decision-making under time pressure. Dudycz’s argument is that AI radically compresses the Act phase. Code is written in seconds, infrastructure provisioned in minutes, prototypes assembled before lunch. The other three phases were not designed for this cadence. Observe, Orient, and Decide remain at human speed. Act has detached and run ahead.

The harness, in his framing, is the mechanical infrastructure that drags Observe back up to Act’s new tempo. Concretely, that means diagnostic functions that hit service health endpoints automatically, automated setup and teardown scripts written with execa and native fetch, multiple verification checks running in parallel — HTTP readiness, service name resolution, metric availability — and reproducible cleanup paths so failed runs leave no residue. Docker log inspection on failure, rather than after-the-fact archaeology.

The line that lands hardest is his description of one-shot AI infrastructure setup: “If you try to repeat it, you won’t know how to do it without doing Voodoo again.” That sentence captures the failure mode the harness exists to prevent. Without verification infrastructure, AI-assisted velocity produces results that work once and cannot be reproduced. The team gets faster at building things they cannot rebuild.

Dudycz is explicit about what the harness is not: “Harness is not magic, a new discipline, or the next buzzword.” It is testing and automation discipline applied as a speed enabler rather than a brake. The reframing is what makes it useful — these were always the right practices; what is new is the consequence of skipping them.

Voice Two: SPDD as First-Class Prompt Artifact

Structured-Prompt-Driven Development was published on martinfowler.com but authored by Wei Zhang and Jessie Jie Xia, both at Thoughtworks. That distinction matters because the framing comes from a delivery practice, not a personal essay — Zhang works on AI delivery and Xia is Thoughtworks’ Global CIO. Fowler hosts; Thoughtworks practitioners speak.

Their argument: prompts are first-class delivery artifacts. Version controlled. Reviewed. Reused. Improved. Treated with the same governance as code, not as throwaway scaffolding lost in chat history. The vehicle for that treatment is the REASONS Canvas — a seven-part structure covering Requirements, Entities, Approach, Structure, Operations, Norms, and Safeguards. Each part is named because each part is something a team needs to be able to point at when something goes wrong.

The workflow is a set of slash commands: /spdd-analysis, /spdd-reasons-canvas, /spdd-prompt-update, /spdd-sync. The last is the one worth pausing on. SPDD requires bidirectional synchronization between the prompt and the code it produces — the document is updated when the implementation changes, and the implementation is regenerated when the document changes. The discipline they want to enforce is that “intent and implementation do not drift apart.” Drift between specification and code is the failure mode every documented system eventually develops; SPDD’s claim is that the AI loop, paradoxically, is what makes ongoing alignment cheap enough to maintain.

The three skills they identify as core — abstraction-first thinking, alignment, and iterative review — are recognizably the skills of a senior engineer on a regulated product. The case study, a billing engine enhancement covering model-aware pricing and multi-plan logic, is not the kind of work where you ship a prototype and call it done. It is the kind of work where someone has to be able to explain the change three years later to an auditor.

Voice Three: Direction When the Productivity Proxy Breaks

Leah Tharin’s “Direction Over Speed” comes at the same problem from the product side. Her observation: AI broke the cover of fake-productivity-by-shipping. For two decades the industry’s working proxy for “is this team productive” was “are they shipping.” Prototypes are now cheap. Specs write themselves. Tickets close. Velocity charts climb. The proxy has lost its signal — shipping looks identical whether the team is building the right thing or the wrong thing.

Her recommendation is structural: smaller teams of four to five people, with the product manager closer to the work, especially on AI features. The reason is that on AI features the question “is this good” cannot be answered by a metric. There is no equivalent of conversion rate for “did the model behave the way we intended.” Someone has to see the output. Someone has to judge. The smaller the team, the closer the judge sits to the work being judged.

She is not writing about engineering harnesses or prompt artifacts. But the underlying claim is the same: the old proxies for quality stopped working when AI made output cheap, so something else has to carry the verification load. For her that something is the human judgment loop, kept tight by team size. For Dudycz it is automated diagnostics. For Zhang and Xia it is canonical artifacts. Different mechanisms, same compensating function.

Three Vocabularies, One Building

If you read these three pieces in the same week — which we did — the convergence is hard to miss. AI compressed Act. The other phases of work — Observe, judge, align, verify — did not compress with it. The gap that opens when one phase runs ahead of the others is the verification gap. Each author is proposing a mechanism to close it.

Dudycz: automate Observe so it matches Act’s speed. Call that the harness.

Zhang and Xia: codify intent in artifacts that travel with the code, so Decide and Act stay synchronized. Call that SPDD.

Tharin: shrink the team so human judgment can keep up with output speed. Call that direction over speed.

Three vocabularies. One destination. The destination is verification infrastructure that scales with AI-induced velocity.

We have been writing about this destination from the engineering side. What the harness is and why it differs from agent execution. What an agent harness actually contains. How harness design changes for long-running applications. How structured reasoning becomes a governance surface. Those essays establish the term inside our practice. What this week added is independent confirmation that the term — or one of its synonyms — is now in active use by practitioners who never read our work.

What It Means for Buyers

When a concept gets named simultaneously by three independent practitioners using three different vocabularies, the concept is past its early-adopter phase. It is no longer a private metaphor for the people who saw the problem first. It is a recognized failure mode being addressed by multiple converging movements.

For buyers, the practical question shifts. It is no longer whether your team needs verification infrastructure to absorb AI velocity. The market has answered that. The practical question is which vocabulary your team will use to describe what they are building, because the choice of vocabulary is also the choice of which body of practice they will draw on.

If your team thinks in terms of OODA, Dudycz’s framing gives them a cybernetic story for why instrumentation matters and what to instrument first. If your team thinks in terms of artifacts and traceability, the Thoughtworks framing gives them a delivery-process story and a canvas to fill in. If your team thinks in terms of how product and engineering work together, Tharin’s framing gives them a structural reason to resist scaling team size as a response to AI productivity gains.

We use “harness” because it is the term our writing has built around. But the substance is portable. Whichever vocabulary lands in your organization, the work is the same: rebuild verification at the new speed, or watch reproducibility quietly leave the building.

The vocabulary will keep multiplying. The destination is the same.


This analysis synthesizes Vibing, Harness and OODA Loop (Oskar Dudycz, April 2026), Structured-Prompt-Driven Development (Wei Zhang and Jessie Jie Xia, Thoughtworks, April 2026), and Direction Over Speed (Leah Tharin, April 2026).

Victorino Group helps engineering organizations build the harness — verification infrastructure that lets AI-assisted teams move fast without losing reproducibility. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation