- Home
- The Thinking Wire
- The Remediation Deficit: When AI Finds 1,596 Bugs and You Fix 97
The Remediation Deficit: When AI Finds 1,596 Bugs and You Fix 97
Anthropic published a number that should reorganize how every security team plans its 2026 budget. As of May 22, 2026, the company had used its own models to disclose 1,596 vulnerabilities in open source software. It had patched 97.
Read those two figures next to each other. Detection ran roughly sixteen times ahead of remediation. That ratio is not a failure of effort. It is the new shape of the work, and it is the most honest first-party evidence we have that the bottleneck in security has moved.
The Bottleneck Inverted
For two decades, finding vulnerabilities was the hard part. You hired specialists, ran fuzzers for days, paid bug bounties, and waited. Scarcity lived on the discovery side. Patching was comparatively cheap once you knew where to look.
Frontier models flipped that economy. A model can now read a codebase it has never seen, build a threat model, and surface exploitable flaws at a rate no human team can match. Anthropic reports that thorough threat models produced findings that were “exploitable 90 percent of the time.” Discovery is no longer the constraint. It is close to free.
What did not get cheaper is everything that happens after a finding lands: confirming it is real, ranking it against the other 1,595, writing a fix that does not break production, testing that fix, and shipping it through whatever review your organization requires. That work still runs at human speed, because it still requires human judgment and human accountability. The 1,596 to 97 ratio is what it looks like when one side of a pipeline gets a thousand times faster and the other side does not.
We have written before about verification debt, about the judgment bottleneck that speed creates, and about how output competence is decoupling from real competence. This is not a restatement of those arguments. It is the receipt. Anthropic’s own deployment produced the exact divergence those essays predicted, and it produced it in numbers.
The Flood Has a Cost on the Receiving End
A disclosure is not a gift. It is a unit of work handed to a maintainer who did not ask for it. When a model can generate findings at machine speed, the maintainers on the receiving end inherit a queue they cannot drain. We saw the early version of this dynamic when curl’s maintainer publicly pushed back against the flood of low-quality AI security reports. The 1,596 number is the same pressure, now measured.
The difference between a useful disclosure program and a denial-of-service attack on your maintainers is verification. An unverified finding costs the receiver time to triage, time to reproduce, and time to decide it was noise. Multiply that by a thousand and you have buried your most valuable people under a pile of maybes. Volume without verification is not security work. It is offloading your triage cost onto someone else.
Adversarial Verification as a Measurable Control
The part of Anthropic’s report that deserves the most attention is not the headline ratio. It is the method they used to make the findings trustworthy. They ran an adversarial verification step, a second model tasked with attacking each finding rather than confirming it. That step roughly halved the false-positive rate from initial discovery.
Halved. That is not a soft qualitative win. It is a control with a number attached, which means it is a control you can budget, test, and hold accountable.
This matters because it turns verification from a vague aspiration into an engineering surface. You can measure the false-positive rate before and after an adversarial pass. You can set a threshold below which a finding does not reach a human. You can A/B the adversary itself. A verification step that produces a measurable reduction in noise is the kind of thing a security leader can defend in a budget meeting, because it converts model output into something with a known signal-to-noise ratio.
The deeper point: the adversary is doing triage that would otherwise consume your scarce human attention. It does not replace the human at the end of the line. It protects that human’s time by clearing the obvious false positives before they arrive. The constraint was never the model’s ability to find bugs. The constraint is the finite supply of trustworthy human judgment, and an adversarial verifier is how you spend that judgment only on findings that survived an attack.
Humans Keep the Last Signature
None of this argues for removing people from the loop. It argues for putting them in the right place. The 97 patches that shipped did so because a human confirmed the finding mattered, approved the fix, and accepted responsibility for the change going to production. That signature is the point. Frontier models can compress everything up to that signature. They cannot hold the accountability the signature represents.
A verification pipeline that respects this looks like a funnel with widening machine capacity at the top and a deliberate human gate at the bottom. The model finds. The adversary culls. The ranking engine prioritizes by exploitability and blast radius. The human decides what gets fixed and signs the change. Each stage exists to make the human’s final decision cheaper and better informed, not to remove it.
The 1,596 to 97 ratio is uncomfortable only if you expected machines to fix bugs. They do not fix bugs. They find them, and increasingly they help you decide which ones are real. Fixing remains a judgment, a tradeoff against finite engineering hours, and an act of accountability. That is exactly where humans belong.
Do This Now
Pull your own numbers. For the last quarter, count vulnerabilities surfaced by any automated tool against vulnerabilities actually patched. If the ratio looks anything like sixteen to one, you do not have a detection problem. You have a remediation deficit, and adding another scanner will make it worse.
Then add one stage you probably skipped: an adversarial verification pass before any finding reaches a human. Measure the false-positive rate before and after. If you cannot cut it meaningfully, your discovery tooling is generating noise your people are paying to filter. If you can, you have just bought back the scarcest resource you have, which is trustworthy human attention, and you have a number to prove it.
Detection is solved. Verification and remediation are the work now. Build the pipeline that reflects that, and keep the last signature human.
This analysis synthesizes Using LLMs to Secure Source Code (Anthropic, May 2026).
Victorino Group helps teams build the verification and remediation pipeline that turns AI bug-finding into fixed code. Let’s talk.
All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation