The Model Is Fungible. The System of Work Is Not.

a16z published a map of the AI startup terrain this month, and the geography is brutal. They borrowed the Wizard of Oz: there is a Yellow Brick Road, and there is the rest of Oz. The Yellow Brick Road is the well-paved path the frontier labs are walking, automating everything in their line of sight. Sit on that road, build a thin wrapper around a model call, and you will be paved over the next time GPT or Claude ships a feature. The rest of Oz is the defensible country: complex, regulated, operationally messy verticals where capability alone gets you nowhere.

The instinct is to read this as a warning about which markets to enter. It is actually a warning about what you build once you are there. The model you call is fungible. You can swap it next quarter. The system of work that wraps it, the part that makes its output trustworthy, compliant, deterministic, and operational, is not fungible at all. That system is the moat.

The Three Defensibility Tests

a16z offers three questions that separate a feature from a company. Run any AI product through them.

The first is the tools-and-steps test. Does the work require orchestrating many tools across many steps, or is it a single model call dressed up? A single call is a feature, and the lab that owns the model will eventually own the feature. Real defensibility starts when the product coordinates retrieval, validation, external systems, and human checkpoints into a sequence the model cannot perform on its own.

The second is the system-versus-tool test. Are you selling a tool the customer operates, or a system that operates on the customer’s behalf and carries the liability? Tools get commoditized. Systems that absorb operational risk get retained, because ripping them out means re-absorbing the risk.

The third is the P&L test, and it is the bluntest. Does the product touch a line on the customer’s income statement? Software that sits next to the work is a cost center forever negotiating its price down. Software that moves revenue or removes cost gets defended by the people whose numbers depend on it.

a16z’s portfolio data shows what passing these tests looks like in motion. 11x, an autonomous sales product, reports positive reply rates up fourfold in recent months and hundreds of millions of dollars in pipeline. That is not a model being clever. That is a system of work, instrumented and improving, sitting on a P&L line.

Governance Is the Load-Bearing Wall

a16z names four sources of vertical defensibility: data and learning flywheels, cross-vendor model routing, cost optimization, and governance as a control plane. The first three are real, but they are also the ones competitors expect and budget for. Governance is the one most teams treat as paperwork and discover, too late, is structural.

Governance as a control plane is not a compliance binder. It is the live layer that decides what the agent is allowed to do, records what it actually did, and gives a human the standing to sign off on the output. In a regulated vertical, the question a customer asks is never “is the model smart enough.” It is “can I defend this output to an auditor, a regulator, or a court.” Permissions plus audit, the boundary of allowed action plus the immutable record of taken action, is what turns a probabilistic generator into something an enterprise can put its name on.

Strip the governance layer out and the other three moats leak. A learning flywheel with no audit trail is a liability accumulator. Cross-vendor routing with no permission model is four ways to do the wrong thing. Cost optimization on an output nobody can defend is cheap garbage. Governance is the wall the rest of the structure leans on.

OpenAI Built the Proof Against Itself

The cleanest evidence comes from the lab most able to skip the scaffolding. OpenAI partnered with Thrive Holdings to build a self-improving tax agent on Codex, and the write-up is an unintentional confession: owning a regulated vertical took far more than raw capability.

The headline figures are real and they are large. OpenAI reports the agent handled 7,000 returns this season. Field completion at 75 percent climbed from 25 percent at launch to 86 percent in six weeks. One senior accountant’s workload on a return dropped from 180 hours to 15. Throughput rose roughly 50 percent. Drafts reached up to 97 percent accuracy.

Now read how those numbers were earned. They did not fall out of a bigger model. They came from a forward-deployed loop: practitioners sitting with the agent, defining what good looked like, building evaluation gates that the output had to pass before a human would trust it. The rental-properties path alone took about six weeks plus heavy oversight to reach 90 percent precision and recall. And here is the governance principle made concrete: ambiguous cases route back to engineers, not forced through the loop. The system knows the boundary of its own competence and refuses to cross it. That refusal is governance. It is the difference between an agent that produces 97 percent accurate drafts and one that produces confident nonsense the other 3 percent of the time with no flag raised.

OpenAI owns the model. They still had to build the system of work. If the frontier lab cannot win a regulated vertical on capability alone, the wrapper company calling that lab’s API certainly cannot, and certainly should not try.

What This Means for Where You Build

The strategic read is uncomfortable for anyone who built a horizontal app-layer business on the bet that being early to a good model was enough. Early is not a moat. The lab catches up by Tuesday. Pick a vertical complex enough that the scaffolding is the product, then build the scaffolding so deep that swapping the underlying model changes nothing the customer can feel.

We have argued before that capability is becoming a commodity and orchestration is the moat. The a16z map sharpens where that moat is hardest to copy: regulated verticals, where the orchestration must also be defensible to a third party who was not in the room. We have also traced the governance deficit inside self-improving agents and the missing controls in vertical AI broadly. The Tax AI case is those essays made flesh: a self-improving agent in a regulated vertical, made safe by the exact scaffolding those pieces said was missing elsewhere.

Do This Now

Take your most important AI product and run the three tests honestly. Tools and steps: is it more than one model call? System versus tool: do you carry the customer’s operational risk, or just hand them a faster keyboard? P&L: name the income-statement line you move. If you pass all three, find your governance layer and ask the auditor question: can a customer defend this output to a regulator. If the answer is no, you do not have a moat. You have a head start, and head starts expire.

Build the wall before you decorate the rooms.

This analysis synthesizes Avoiding Death on the Yellow Brick Road (a16z, May 2026), Building Self-Improving Tax Agents with Codex (OpenAI, May 2026).

Victorino Group helps enterprises build the governance scaffolding that makes vertical AI defensible. Let’s talk.