LLM Reasoning Happens Before the Words -- Not Because of Them
arXiv 2604.15726 argues that LLM reasoning occurs in latent transformer states, not in explicit Chain-of-Thought text. Three hypotheses tested quantitatively.

What If Chain-of-Thought Is Just the Transcript, Not the Thinking?
Here's the simple version. "Let's think step by step" makes LLMs perform better -- that's the core insight behind Chain-of-Thought (CoT) prompting. But a new position paper argues that the actual reasoning doesn't happen in the text the model writes. It happens in the transformer's internal latent states before any words are generated.
The CoT text might be a byproduct of reasoning, not the cause.
The Paper
Visualizing latent reasoning pathways inside the transformer
arXiv 2604.15726, published April 2026. This is a position paper -- it doesn't propose a new model. Instead, it challenges the prevailing understanding of why CoT works by testing three hypotheses quantitatively.
What Wasn't Adding Up
CoT prompting became standard practice after Wei et al.'s 2022 Google Brain paper. The empirical evidence was overwhelming: step-by-step reasoning improves accuracy.
But some observations didn't fit the clean narrative:
- Shuffling CoT text randomly sometimes barely hurt performance
- Models occasionally wrote incorrect reasoning steps but still reached the right answer
- Probing internal representations revealed that answer-related information was encoded before CoT text generation started
These anomalies pointed to an uncomfortable question: does CoT cause reasoning, or does the model finish reasoning internally and then narrate what it already decided?
Three Hypotheses, Tested
| Hypothesis | Claim | Finding |
|---|---|---|
| H1: Latent Reasoning | Reasoning occurs in transformer latent states | Supported -- answer info exists in internal representations before CoT text |
| H2: Explicit CoT | CoT text directly causes reasoning | Weakly supported -- helps, but effect is often independent of text quality |
| H3: Serial Compute | CoT's value is providing extra computation steps | Partially supported -- more compute helps, but doesn't fully explain results |
The key finding is H1. Probing experiments on middle-layer activations showed that answer-relevant information was already encoded in latent states before the model started generating CoT text. The written reasoning looks more like a post-hoc explanation than the reasoning itself.
Why This Matters
How this paper reshapes our understanding of CoT
If this paper is right, several things need rethinking.
First, OpenAI's o1/o3 reasoning models. These generate long CoT sequences as part of their reasoning process. But through this paper's lens, o1/o3's performance gains might come from additional computation steps (more tokens = more serial compute), not from the content of the reasoning text itself.
Second, Google's Gemini Thinking Mode. When Gemini shows you its "thinking process," is that the actual reasoning or an after-the-fact narration of reasoning that already happened internally?
Third, it connects to the 2025 "Thinking Without Words" research on abstract CoT. That work showed models can "think" using abstract tokens instead of natural language. This paper strengthens the theoretical foundation for that approach.
Limitations
This is a position paper, and its scope has clear boundaries.
- Experiments cover specific models and benchmarks. Generalization to all LLMs is an open question.
- The paper demonstrates that information exists in latent states, not how the reasoning mechanism works.
- This isn't arguing CoT is useless. CoT helps. The claim is that the reason it helps might be different from what we assumed.
CoT prompting won't change overnight. But our understanding of why it works definitely needs updating.
References
관련 기사

GPT-5.4 Thinking Ships — 33% Fewer Tokens, 33% Fewer Errors, and the Reasoning AI Tipping Point
OpenAI released GPT-5.4 Thinking with 33% fewer reasoning tokens, 33% fewer factual errors, and GDPVal 83.0%. Full model family, pricing, benchmarks, and what it means for developers.

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know
Complete technical breakdown of DeepSeek V4: MoE architecture (1T total, 32B active), Engram Memory, Dynamic Sparse Attention, benchmarks, pricing (50x cheaper than Claude), API usage, license terms, and geopolitical implications.

GPT-5.4 Deep Dive — The First General-Purpose Model That Actually Uses Your Computer
OpenAI released GPT-5.4 with 1M token context, native Computer Use achieving 75% on OSWorld (surpassing humans), and a full model family. Complete specs, benchmarks, and competitive analysis.
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
