Your AI Decides Before It Thinks

You ask an LLM to reason through a problem. It produces a chain of thought: weighing options, considering tradeoffs, arriving at a conclusion. The reasoning looks deliberate. It looks like thinking.

According to new research from Esakkiraja et al., the conclusion was already encoded before the first reasoning token appeared. The chain of thought did not produce the decision. It justified one that was already made.

The Research

The paper “Therefore I Am. I Think” (April 2026, arXiv preprint) used linear probes to decode model activations at the moment before reasoning begins. The probes could predict with high confidence which tool the model would call, which action it would take. Not after deliberation. Before it.

The researchers went further. Using activation steering (injecting vectors into the model’s hidden states to flip the pre-encoded decision), they could change the model’s behavior in 7% to 79% of cases, depending on the task. The model would then generate reasoning that justified the new, steered decision.

This is the finding that matters: when the decision was flipped, the chain of thought flipped with it. The model produced what the researchers call “inflated deliberation,” generating more tokens, longer reasoning chains, to justify the externally imposed conclusion. The reasoning did not resist the flip. It rationalized it.

Why This Goes Beyond Interpretability

In March, we explored how Anthropic’s interpretability research revealed that Claude uses different computation than it describes. That work showed a mismatch between internal process and external explanation. The model computes one way and explains another.

This new research makes a stronger claim. The decision is not just explained differently than it was computed. The decision precedes the computation that is supposed to produce it. Chain of thought is not a flawed description of reasoning. It is a narrative constructed after the fact to accompany a pre-existing conclusion.

The distinction is important. A flawed description might still correlate with the actual process. A post-hoc rationalization has no obligation to.

What Inflated Deliberation Tells Us

When the researchers steered model activations to flip a decision, the chain of thought got longer. More hedging. More apparent weighing of alternatives. More tokens spent arriving at the same (now externally imposed) conclusion.

This pattern has a name in human psychology: motivated reasoning. People who hold a strong prior belief, when forced to consider alternatives, produce longer justifications for their original position. They do not reason more carefully. They reason more elaborately in the direction they were already going.

The LLM equivalent is structurally similar. The model “knows” its answer. When that answer is disrupted, it generates more reasoning to re-justify. The chain of thought is not a check on the decision. It is in service of it.

For governance, this creates a specific problem. Longer, more detailed chain-of-thought output looks like more careful reasoning to a human reviewer. An auditor reading a model’s reasoning trace might assign higher confidence to responses with elaborate deliberation. The research suggests the opposite conclusion: elaborate deliberation may signal that the model is working harder to justify a predetermined outcome.

The Governance Consequence

A growing number of AI oversight frameworks treat chain-of-thought inspection as an audit mechanism. The logic is intuitive: if the model shows its reasoning, we can verify whether the reasoning is sound. Several enterprise governance tools are built on this premise. Regulatory frameworks, including portions of the EU AI Act’s transparency requirements, implicitly assume that model explanations provide meaningful evidence of model decision-making.

The pre-encoding finding challenges this assumption at its foundation. If the decision is encoded before reasoning begins, then inspecting the reasoning inspects the rationalization. You are auditing the story the model tells about its decision, not the decision process itself.

As we documented in configuration-dependent safety research, the same model can behave radically differently based on prompt design. If decisions are pre-encoded in activation space, prompt sensitivity operates at a level that chain-of-thought monitoring cannot observe. The decision has already been made by the time the first reasoning token appears on screen.

And if sycophantic behavior is pre-decided rather than reasoned, then the model does not agree with you because your argument was persuasive. It agrees because agreement was encoded before it processed your argument.

What Monitoring Actually Needs

None of this means chain-of-thought is useless. It can still help human users understand model outputs. It can still serve as a pedagogical tool. What it cannot reliably serve as is evidence of how or why the model reached its conclusion.

Governance frameworks need to account for this limitation. Three practical shifts follow.

First, output verification must be independent of model self-explanation. The case for external validation (checking outputs against ground truth, running consistency tests, comparing across model runs) gets stronger when the model’s own account of its reasoning is unreliable. Verification infrastructure that relies on inspecting chain-of-thought is checking the wrong artifact.

Second, behavioral testing must take precedence over reasoning inspection. If decisions are pre-encoded, the only reliable way to evaluate model behavior is to observe what models do across a wide range of inputs. Not what they say they are doing. Not why they claim they did it. What they actually do, measured externally.

Third, activation-level monitoring deserves investment. The researchers demonstrated that linear probes can decode pre-reasoning decisions with high accuracy. This is a research technique today. It could become a governance technique. Monitoring at the activation level, before reasoning begins, offers a window into the actual decision process that chain-of-thought cannot provide.

The Caveat Section

This paper is an arXiv preprint, not peer-reviewed. It was submitted on April 1, 2026, which warrants a raised eyebrow. The author affiliations are not prominently listed.

The findings are also consistent with prior work. Anthropic’s interpretability research pointed in the same direction. The motivated reasoning parallel from cognitive science is well-established. And the observation that chain-of-thought is post-hoc construction rather than faithful process reporting has been accumulating evidence for over a year.

Treat this as a signal that strengthens an existing pattern, not as settled science. The directional conclusion (decisions before reasoning, chain-of-thought as rationalization) now has multiple independent lines of evidence. The specific magnitudes (7-79% steering success rates, probe confidence levels) need replication.

What This Means in Practice

The practical takeaway is straightforward. If your AI governance framework relies on reading chain-of-thought output to verify model reasoning, you are building on unreliable ground. The model may have decided before it reasoned. The reasoning may be a rationalization. The more elaborate the reasoning, the less you should trust it.

Build verification systems that check outputs, not explanations. Test behavior, not narratives. And if you want to understand what the model actually decided and when, look at the activations, not the tokens.

This analysis synthesizes Therefore I Am. I Think by Esakkiraja et al. (April 2026), with prior findings from Anthropic’s interpretability research (March 2026) and Mia Hopman’s scheming propensity research (March 2026).

Victorino Group helps enterprises build AI governance that verifies behavior, not just explanations. Let’s talk.