AX: When Agents Take Over Doing, the Screen's Job Becomes Helping You Judge

TV
Thiago Victorino
7 min read
AX: When Agents Take Over Doing, the Screen's Job Becomes Helping You Judge

John Maeda has a phrase for what agents do to a task: “AX is about teleporting to the goal.”

You skip the steps. You skip the menus, the forms, the figuring-out of how. The agent files the expense report, drafts the contract clause, refactors the module, books the travel. The work that used to fill your screen happens somewhere you cannot see. What lands in front of you is a result, and one question: is this what I meant?

That question is the whole job now. And most of our interfaces were never built to answer it.

Two Designers, One Reframe

In June 2026, two senior designers reached the same conclusion from different directions. John Maeda, who has spent a career at the intersection of design and computation, published “What is AX?” and named the shift in Don Norman’s own terms. Pratik Joglekar, a senior product designer at HubSpot, wrote “Designing Uncertainty” in Smashing Magazine and arrived at the same place through probability.

Norman gave us two gulfs. The gulf of execution is the distance between what you want and figuring out how to make the system do it. The gulf of evaluation is the distance between what the system did and understanding whether it matched your intent. For forty years, design fought the first one. Buttons, affordances, onboarding flows, progressive disclosure, all of it aimed at making “how do I do this” easy.

Maeda’s argument is that agents close the gulf of execution almost entirely. The agent figures out how. The teleport happens. And the burden does not vanish; it slides across to the other gulf. Evaluation becomes the work. You are no longer asking “how do I do this.” You are asking “did the thing that just happened do what I needed, and can I trust it enough to ship it.”

Evaluation Can Be Fast, If the Surface Is Built for It

The reflexive worry is that judgment is slow. If a human has to check every agent output, the agent saves nothing.

Maeda offers a counterexample worth sitting with. Blind users, he notes, routinely process synthetic speech at two to three times the pace of ordinary conversation. The screen reader is a surface engineered, over decades, specifically for fast evaluation. It proves the burden is not fixed. A well-designed evaluation surface lets a person judge far faster than they could ever act.

That reframes the design brief. The screen’s job is no longer to walk you through a task. It is to compress a result down to the few signals you need to approve, reject, or correct it. A diff view does this for code. A redline does it for a contract. A confidence band does it for a forecast. Each strips the output to the dimension a human actually needs to judge. The rest is noise the agent already handled.

Most agent products today fail here. They show you the answer and a thumbs-up button. That is a surface optimized for the agent looking competent, not for the human deciding whether to trust it.

Optimize for Likelihood, Not Certainty

Joglekar reaches the same governance surface from the math. His line is precise: “Design decisions should be optimized for likelihood, not certainty.”

Deterministic software gave one output for one input, every time. You designed for that single path. Probabilistic systems give a distribution. The same prompt yields a strong answer, a mediocre one, and occasionally a confidently wrong one, and the interface has to hold all three. Designing for likelihood means the surface shows its own uncertainty. It flags the low-confidence case. It makes the moment of approval a real decision rather than a rubber stamp.

This is why Joglekar treats human-in-the-loop as mandatory oversight in high-stakes domains, not as a courtesy. In healthcare and finance, an unreviewed probabilistic output is a liability waiting to land. His cautionary cases are the ones the industry already learned from. Air Canada’s chatbot invented a bereavement-fare policy and a tribunal held the airline to it. Amazon’s recruiting tool taught itself to downgrade resumes that mentioned women and had to be scrapped. In both, the system executed fluently. No surface stood between the confident output and the consequence. The evaluation gulf was never designed, so no one crossed it.

AX Is the Design Language of Governance

Put the two essays together and the picture is sharp. Maeda names where the burden moves. Joglekar names what the surface must now carry. Both describe, without using the word, a governance layer.

Governance has been framed as an engineering concern: policy primitives, audit logs, access controls, the plumbing under the agent. That framing is incomplete. The place where a human actually grants or withholds trust is the interface. The approval click. The redline. The “send anyway” confirmation that does or does not appear before an agent emails a customer. Every one of those is a governance decision wearing a design costume.

AX, agent experience, is the name for designing that layer on purpose. It is not a coat of paint on top of a chatbot. It is the discipline of building the surface where agent work gets judged: what to surface, what to suppress, where to demand a human decision, how to make that decision fast and honest. When the evaluation surface is good, trust is earned at the speed of a glance. When it is absent, you get Air Canada.

This connects to a shift we have tracked across functions. We have written that the bottleneck is moving from doing to judging, and that output is decoupling from competence, which forces a dedicated verification layer. AX is what that verification layer looks like when a designer, not an engineer, owns the pixels. It is the same move we argued turns designers into governance engineers, now stated as a design gulf rather than a job title.

What Maeda’s Caveat Forces

Both pieces are essays, not studies. Maeda is synthesizing a shift he sees; Joglekar is arguing from principle and a few public failures. Neither offers a controlled measurement of how much faster evaluation gets when you design for it. Treat the screen-reader figure as an existence proof, not a benchmark for your product. The claim that survives is structural: the burden moves to evaluation, and the surface decides whether that burden is bearable.

That is enough to act on, because the cost of ignoring it is already visible in the failure cases.

Do This Now

Pick one agent feature you ship today. Find the moment a human approves its output. Ask three questions. What single signal would let someone judge this result in two seconds, and are you showing it. When the agent is uncertain, does the surface say so, or does it present every answer with the same confident finish. When the stakes are high enough that a wrong output is a liability, is there a real decision point, or just a button that says done. If the answers are no, you have an evaluation surface that looks finished and governs nothing. Redesign that one screen first. The teleport already works. The judging is where you keep the trust.


This analysis synthesizes What is AX? (John Maeda, June 2026), Designing Uncertainty: How AI Supercharges Probabilistic Thinking (Smashing Magazine, June 2026).

Victorino Group helps teams design the evaluation layer where agent work is judged and approved. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation