- Home
- The Thinking Wire
- Thoughtworks Just Named the Coding-Agent Governance Pattern. Sensors. Read the CI Bill.
Thoughtworks Just Named the Coding-Agent Governance Pattern. Sensors. Read the CI Bill.
Two pieces landed in the same week of May 2026, written by people who do not appear to have read each other. Birgitta Boeckeler at Thoughtworks published Maintainability Sensors for Coding Agents. CloudBees published AI Is Writing More Code. Your CI Pipeline Can’t Keep Up.. One named the architecture. The other quantified what happens when the architecture is missing.
Together they finish a sentence the industry has been mumbling for a year: coding agents do not produce quality by accident, and CI is not where you discover the absence of quality. Quality lives in a layer Boeckeler calls sensors. CI is what you pay when that layer is empty.
If you are running coding agents in production and you have not drawn this layer yet, the rest of your governance stack is decorative.
What Boeckeler Actually Named
The Thoughtworks piece is a case study, not a manifesto. Boeckeler walked through a real project: a TypeScript and NextJS analytics dashboard integrating four external APIs. The interesting move is not the project. It is the explicit inventory of feedback loops the team built so the agent would answer to something other than the developer’s patience.
Eight computational sensors ran during coding. Four more ran on a slower cadence. The CI pipeline replayed all of them on push, plus deeper validation. The sensors were not exotic tools. ESLint for style. Dependency-cruiser for module-coupling rules. Semgrep for security and pattern matching. Custom scripts to flag coupling violations that no off-the-shelf tool catches. Boeckeler cites Vlad Khononov’s Modularity work as the lineage for what counts as a coupling violation worth flagging.
The two examples she gives are worth memorizing because they are the kind of debt coding agents produce by default:
- A single new date-range parameter touched more than forty files, because the agent threaded it through every layer instead of consolidating at the boundary.
- Three routes ended up with duplicate response-shaping code, because the agent generated each one in isolation without noticing the others.
These are not bugs. They pass tests. They ship features. They are exactly the kind of structural decay that human reviewers catch in pull requests when they have time and miss when they do not. The sensor layer is what catches them when nobody is paying attention.
The pattern Boeckeler named has three properties worth lifting:
- Automated. No human in the loop for the first response. The sensor fires, the agent reads the output, the agent corrects.
- Layered. Cheap sensors run constantly. Expensive sensors run on commit. Slowest sensors run in CI. Different cost, different cadence, same scoreboard.
- Authored. Some sensors are off-the-shelf. The valuable ones are custom, because they encode the architecture you actually care about, which is exactly the thing no vendor ships.
The word matters. We have written about review governance, self-improving agents, and budget approval workflows as separate threads. Sensors is the noun that ties them together. It is a Thoughtworks coinage and the lineage matters: the term comes from inside the firm that has shipped more enterprise refactoring projects than any consultancy on Earth. This is not theory imported from somewhere; it is the firm’s working vocabulary for a problem it has been paid to solve at scale.
What CloudBees Quantified
CloudBees is a vendor selling Smart Tests, so read their numbers with the seller’s discount. Even discounted, the shape of the data lines up too cleanly with the sensor argument to ignore.
The CloudBees post reports that daily AI-coding-tool users ship about sixty-five percent more pull requests than non-users. About one-third of CI failures in their customer base are flaky: no underlying change, just retry until green. A customer case they cite reduced regression test time by up to eighty percent, and brought pre-commit time from six hours down to two. The headline number, on their own scenario math: an estimated quarter of a million dollars per year in CI compute waste, for a fifty-engineer team.
These numbers are vendor-attributed. The mechanism behind them is not. If your agents produce sixty-five percent more pull requests and your sensors layer is the CI pipeline, then CI is now the bottleneck, the cost center, and the de-facto quality wall. None of those three things is what CI was designed for.
The CloudBees framing, stripped of the product pitch: CI was the implicit governance layer when humans wrote the code. Humans pre-filtered before pushing. Coding agents do not. They push everything to CI and let the pipeline tell them what is wrong. The agent’s economics work; the pipeline’s do not.
The sensor layer fixes the economics. The agent gets feedback locally, on the cheapest sensor that catches the issue. CI runs the expensive verification on code that already passed the cheap ones. Pre-commit drops because the slow tests stop being the first line of defense.
Two Pieces, One Argument
Read the Thoughtworks essay alone and the sensor layer sounds like a craft practice. Read the CloudBees post alone and the CI overrun sounds like a tooling problem the vendor will sell you out of. Read them together and the argument is sharper.
Sensors are the discipline. CI is the unpaid invoice when the discipline is absent.
The engineering implication is structural. If you are scaling coding agents and your only feedback machinery is the CI pipeline, you have outsourced your architecture review to a queue. The queue is slow, the queue is expensive, and the queue does not catch coupling violations because coupling violations pass the tests. The agent ships forty-file diffs and three duplicate route handlers and the pipeline says green. You discover the debt three months later when a feature change touches sixty files instead of six.
The leadership implication is financial. Quarter-of-a-million-a-year CI compute waste on a fifty-engineer team is a real number, and it is the visible portion of the bill. The invisible portion is the structural debt the pipeline did not catch because no sensor for it existed. That debt shows up on the velocity chart six months later as “the codebase got harder to change.” Nobody attributes it to the absence of a coupling sensor in February. The line item does not exist.
Sensor architecture is the line item that prevents the line item that does not exist.
What to Build, Concretely
Boeckeler’s project list is a working starter kit. You should expect to take three weeks to inventory and stand up the first cut.
Inventory the sensors you already run. Most teams have ESLint, Prettier, a type checker, unit tests, and integration tests. List them. Mark which run pre-commit, which run on push, which run in CI. You almost certainly do not have a coupling sensor. You almost certainly do not have a custom Semgrep rule for the architecture your team actually decided on three years ago.
Add the layer the agent will answer to first. A dependency-cruiser config that fails when a new file imports across an architectural boundary is a one-day project and catches the forty-file diff problem Boeckeler described. The agent will hit it and rewrite. You do not have to teach the agent the architecture; you have to give the agent a sensor that pings when the architecture is violated.
Add a coupling sensor for your top three pain points. What three things does your senior engineer flag in every code review? Duplicate response shapes? Stringly-typed IDs that should be branded types? Direct database access from controllers? Write a Semgrep rule for each. Run it on commit. The sensors are now teaching the agent what your senior engineer would have said.
Re-tier your CI. With local sensors firing, CI no longer needs to be the first wall. Move the cheapest sensors out of CI and into pre-commit. Cut the CI run by whatever percentage of its current duration was wasted catching things you can now catch locally. The CloudBees scenario suggests fifty percent is achievable. Even a quarter of that is real money.
Audit the agent’s feedback diet. What does your coding agent currently see when it makes a mistake? If the answer is “the test output if it remembers to run them,” that is the first thing to fix. The sensor outputs need to be readable by the agent as structured feedback, not buried in a terminal scroll.
Do This Now
Block four hours this week. Take the diagram of your current CI pipeline. Add one column to the left labeled “sensors that run before CI.” If the column is mostly empty, you have found the architecture work. Print the Boeckeler piece and read it with your platform lead. Print the CloudBees post and read it with whoever owns the CI budget. They are reading the same problem from opposite ends.
The teams that scale coding agents in 2026 will not be the teams with the most autonomous agents. They will be the teams whose agents answer to the most sensors before the pipeline has a chance to fail.
This analysis synthesizes Maintainability Sensors for Coding Agents (Thoughtworks, May 2026) and AI Is Writing More Code. Your CI Pipeline Can’t Keep Up. (CloudBees, May 2026).
Victorino Group helps engineering organizations design the sensor architecture and CI economics for governed AI development. Let’s talk.
All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation