Governance You Cannot Trace Is Governance You Do Not Have

Only 12% of the pull requests that implement a security decision link back to the threat model that produced that decision. Dropbox measured this in its own codebase and published the number in June 2026. The other 88% of the time, a security requirement was agreed in a design review, and the code that was supposed to satisfy it shipped with no traceable connection to the agreement.

That is the state of governance at a company with a mature security function, a documented review process, and engineers who care. The review happened. The decision was made. The verification that the code honored the decision did not happen, because nobody could find the code that was supposed to honor it.

The number behind the number

The 12% is the headline, but the timing data underneath it explains why the link breaks.

Dropbox looked at when implementing pull requests actually open relative to the security review that governs them. Only 29% open within two weeks of the review. The median lands around five weeks out. More than half (54%) open more than a month after the review, and the tail stretches past eleven months. About 15% of the reviews were filed retroactively, written after the code already existed.

Read those numbers together and the failure mode is obvious. A design review produces a set of security requirements. Weeks pass. The engineer who writes the implementation is often not the person who sat in the review. By the time the code lands, the threat model is a document somebody wrote a month ago, in a different tool, with no mechanical tie to the diff under review. The reviewer approving the pull request is reviewing the code. They are not reviewing the code against the design, because the design is not in front of them.

This is the distinction that matters. “Review the code” and “review the implementation against the design” are different activities. The first checks whether the code is correct on its own terms. The second checks whether the code does what the organization decided it should do. Most review processes perform the first and quietly assume it covers the second. It does not.

Why the link disappears

The connection between a design decision and its implementation is fragile for a reason that has nothing to do with discipline. It is a tooling problem.

Design reviews live in one system: a doc, a wiki, a ticket. Code lives in another: the repository, the pull request, the diff. The two systems do not know about each other. The only thing connecting them is a human remembering to paste a link, and humans forget, especially five weeks later. Dropbox confirmed how weak the explicit link is when they tried to reconstruct the connections after the fact.

Here is the part worth sitting with. When Dropbox used semantic search across design docs and code to recover the missing links, they connected 80% of 150 design reviews to their implementing code. Of those recovered links, 69% were findable only through semantic search. They had no explicit reference at all. No pasted URL, no ticket number, no mention. The relationship existed in the work and was invisible in the metadata.

So the coverage problem is not that the work was sloppy. The work happened. The implementation matched the design more often than the explicit links suggested. What was missing was any traceable record connecting the two, which means there was no way to audit whether the design was honored without a human manually hunting for the code months later. Governance that depends on a human remembering to paste a link is governance that fails 88% of the time.

Governance as a product, not a checklist

The fix Dropbox built is the interesting part, and it generalizes well beyond security.

They wired design reviews and code together with MCP and semantic search, so that an agent can take a security review and go find the pull requests that implement it, then check whether the implementation actually satisfies the requirements the review specified. The methodology was not a one-shot demo. They validated it against 79 verified design-to-code pairs and then ran it across 150 reviews over 18 months. The auto-linking is the substrate. The alignment check is the product.

This reframes what a governance tool is for. The traditional model treats governance as a gate: a review happens, a box gets checked, the work proceeds. The box says the review occurred. It says nothing about whether the merged code matches what the review decided. That second question is the one that actually protects you, and it is the one no checklist answers.

Building governance as a product means the system continuously answers “does the implementation match the design” rather than “did a review happen.” The agent does the linking that humans forget. The agent reads the threat model and the diff together and flags where they diverge. The reviewer’s attention moves from rediscovering context to judging alignment. That is a better use of a senior engineer than archaeology.

Audit coverage, not pass rate

The metric most security and governance functions report is a pass rate: what percentage of reviews were approved, what percentage of pull requests passed the check. Dropbox’s data exposes why that metric lies. A 100% pass rate on the reviews that happen tells you nothing if only 12% of implementing code is traceable to a review at all. The pass rate measures the work you can see. Coverage measures the work you cannot.

Coverage is the harder number and the honest one. It asks: of all the security decisions we made, what fraction can we trace to the code that was supposed to implement them, and verify that it did? A team reporting 95% approval and 12% traceability is not a team with good governance and a paperwork problem. It is a team that does not know what 88% of its code does relative to its own security decisions.

If you run a governance, security, or platform function, this is the audit to run this quarter. Pull a representative sample of your design or security reviews from the last year. For each one, try to find the code that implements it and confirm the code honors the decision. Measure the percentage where you can do both. That number is your real coverage, and it will be lower than your pass rate. Then decide whether you want a human to keep doing that archaeology or whether you want to wire design to implementation so an agent does it on every merge. Dropbox already showed the second option works.

If you cannot trace a decision to the merged code, what you have is a record that a meeting occurred. Calling it governance is optimism.

This analysis synthesizes How Dropbox uses MCP + Dash for design and code security review (Dropbox Tech, June 2026).

Victorino Group helps teams make governance traceable from design review to merged code. Let’s talk.

The number behind the number

Why the link disappears

Governance as a product, not a checklist

Audit coverage, not pass rate

If this resonates, let's talk