Two Ways to Measure AI Adoption, Both Broken

TV
Thiago Victorino
7 min read
Two Ways to Measure AI Adoption, Both Broken

Tell your workforce to use AI, then watch two things happen at once. Coworkers quietly mark the people who admit to using it as lazy. Leadership starts counting tokens. Both reactions surfaced in the same week, from opposite ends of the org chart, and both wreck the one thing AI governance actually needs: people willing to say what they are really doing.

Atlassian’s Teamwork Lab ran a controlled experiment. Gergely Orosz published a teardown of Meta’s engineering org. Read together, they map the two failure modes of measuring human and AI work. One comes from peers. One comes from the top. Neither produces a number you can trust.

The peer penalty: honesty rated as laziness

Atlassian’s Teamwork Lab held the work product constant. Same output, same quality, 961 participants. The only variable was whether the worker disclosed using AI to produce it. Disclosure carried a cost. Peers rated the AI-disclosing worker as 10 times lazier than the identical non-disclosing one, and 24 percentage points less likely to be recommended for high-visibility projects.

The output was identical. The judgment was not. What got punished was the admission.

This runs against the official message. A separate Atlassian pulse survey put AI use among US knowledge workers at 94 percent. Almost everyone is using it. Almost no one is rewarded for saying so out loud. Molly Sands, who heads the Teamwork Lab, framed the contradiction plainly: “Companies are telling the workforce to use AI, but employees are penalizing each other for being honest about it.”

Worth naming the source incentive here. Atlassian sells workplace collaboration software and has a commercial stake in any “AI at work” story. The experiment is original and the design is sound, but a vendor running it is not a neutral party. The finding still holds, because the mechanism is familiar from every other status game inside a company.

The mechanism is what matters. When honesty about a behavior gets you rated lazier and passed over for visible work, the behavior does not stop. It goes quiet. People keep using the tools and stop reporting it. The one bright spot in Atlassian’s data points the same direction: the laziness penalty nearly disappears in companies that actively celebrate AI use. Where leadership makes disclosure safe, the stigma collapses. Where it stays ambient, usage submerges. This is the shadow AI dynamic we covered in Shadow AI Is the Symptom, Not the Disease, now with a controlled number attached to the cause.

The top-down version: counting as control

Meta took the opposite road and arrived at the same broken place. Orosz’s reporting describes engineers generating 60.2 trillion AI tokens in 30 days, roughly 900 million dollars at list pricing if you bought that volume retail. The number is presented internally as proof of adoption. It proves consumption, which is a different thing.

A token is an input, not an outcome. Counting tokens to measure engineering value is like counting keystrokes to measure writing, or counting meetings to measure decisions. The metric rewards volume, and people optimize for whatever you reward. Tie token counts to performance reviews, which Meta reportedly did, and you have not measured adoption. You have manufactured it.

The surveillance went further than counting. Per the reporting, Meta added mandatory keystroke and mouse tracking with no opt-out, and reassigned roughly 6,500 engineers, about one in every five or six, into a new Agent Data Optimization org. One engineer described the reassignment to Wired in stark terms: “It’s literally the gulag. You have zero purpose in life all of sudden.” (This is journalism aggregating internal sources and Reuters and Wired reporting, not a measured study. Treat the figures as reported, not audited.)

Whatever the precise numbers, the design intent is legible. Measure activity, enforce it through monitoring, attach it to careers. The predictable result is the same as the peer penalty, reached from the other direction. People perform the metric. An engineer who knows tokens feed their review will burn tokens. The signal you collect tells you how hard people are gaming the signal, and nothing about whether the work got better.

Same wreckage from both directions

Bottom-up stigma pushes real usage underground. Top-down surveillance pulls fake usage to the surface. Different mechanisms, identical damage: the number stops describing reality.

Governance needs the opposite. Every framework worth the name, from internal AI policy to ISO 42001, runs on an honest account of what people and systems are actually doing. You cannot govern shadow usage you cannot see, and you cannot govern theater you mistake for signal. Both measurement failures attack the same input. They corrupt the ground truth before any policy gets a chance to act on it.

The deeper error is shared too. Both approaches measure the wrong unit. Peer stigma judges the individual on optics, did they admit to using the tool. Meta judges the individual on activity, how many tokens did they burn. Optics and activity are both proxies, and both are easy to fake. We made the case in The Honesty Index that capability and trust diverge when you measure the wrong signal. Stigma and surveillance are two production examples of exactly that divergence.

What a working scoreboard measures instead

The unit that resists gaming is the team, and the thing worth measuring is the outcome. Not who disclosed. Not how many tokens. Whether the work shipped, held up, and improved over time, with humans and AI counted on the same board.

A team-level outcome scoreboard fixes both failures at once. It removes the individual optics game that drives peer stigma, because no one is scored on whether they confessed to using a tool. It removes the activity theater that drives surveillance, because token volume is an input the board ignores. What remains is the question both Meta and the anxious coworker were trying to answer and missed: is the work getting better, and is the mix of human and AI effort the reason. We laid out the measurement principle in Measuring AI in Software Development; stigma and surveillance are the two ways organizations are currently getting it wrong.

This also dissolves the disclosure problem. When the team owns the outcome, disclosing AI use carries no penalty, because the board never asked the individual to confess in the first place. Transparency stops being a personal risk and becomes a shared operating fact. That is the condition governance was waiting for.

Do this now

Audit how your organization currently reads AI adoption. If the answer is a usage dashboard, a token count, or an informal sense of who is “leaning on AI too much,” you are measuring optics or activity, and you are training your people to hide or perform. Replace the individual-activity view with a team-outcome view: pick one team, define what shipped and held up over a quarter, and measure the unit of work rather than the person. Make disclosure costless before you ask anyone to be honest, because the data shows they will not be otherwise.


This analysis synthesizes New Research Shows Honesty About AI Use at Work Is Backfiring (Atlassian Teamwork Lab, June 2026), Why Is Meta Destroying Its Engineering Organization? (The Pragmatic Engineer, June 2026).

Victorino Group helps teams measure human and AI work on one neutral scoreboard, not through stigma or surveillance. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation