spoonai
PaperLLMReasoningChain-of-Thought

LLM Reasoning Happens Before the Words -- Not Because of Them

arXiv 2604.15726 argues that LLM reasoning occurs in latent transformer states, not in explicit Chain-of-Thought text. Three hypotheses tested quantitatively.

·3분 소요·
공유
Latent reasoning vs explicit CoT comparison diagram
arXiv

What If Chain-of-Thought Is Just the Transcript, Not the Thinking?

Here's the simple version. "Let's think step by step" makes LLMs perform better -- that's the core insight behind Chain-of-Thought (CoT) prompting. But a new position paper argues that the actual reasoning doesn't happen in the text the model writes. It happens in the transformer's internal latent states before any words are generated.

The CoT text might be a byproduct of reasoning, not the cause.

The Paper

Latent reasoning mechanism visualization Visualizing latent reasoning pathways inside the transformer

arXiv 2604.15726, published April 2026. This is a position paper -- it doesn't propose a new model. Instead, it challenges the prevailing understanding of why CoT works by testing three hypotheses quantitatively.

What Wasn't Adding Up

CoT prompting became standard practice after Wei et al.'s 2022 Google Brain paper. The empirical evidence was overwhelming: step-by-step reasoning improves accuracy.

But some observations didn't fit the clean narrative:

  • Shuffling CoT text randomly sometimes barely hurt performance
  • Models occasionally wrote incorrect reasoning steps but still reached the right answer
  • Probing internal representations revealed that answer-related information was encoded before CoT text generation started

These anomalies pointed to an uncomfortable question: does CoT cause reasoning, or does the model finish reasoning internally and then narrate what it already decided?

Three Hypotheses, Tested

Hypothesis Claim Finding
H1: Latent Reasoning Reasoning occurs in transformer latent states Supported -- answer info exists in internal representations before CoT text
H2: Explicit CoT CoT text directly causes reasoning Weakly supported -- helps, but effect is often independent of text quality
H3: Serial Compute CoT's value is providing extra computation steps Partially supported -- more compute helps, but doesn't fully explain results

The key finding is H1. Probing experiments on middle-layer activations showed that answer-relevant information was already encoded in latent states before the model started generating CoT text. The written reasoning looks more like a post-hoc explanation than the reasoning itself.

Why This Matters

Implications for CoT prompting strategies How this paper reshapes our understanding of CoT

If this paper is right, several things need rethinking.

First, OpenAI's o1/o3 reasoning models. These generate long CoT sequences as part of their reasoning process. But through this paper's lens, o1/o3's performance gains might come from additional computation steps (more tokens = more serial compute), not from the content of the reasoning text itself.

Second, Google's Gemini Thinking Mode. When Gemini shows you its "thinking process," is that the actual reasoning or an after-the-fact narration of reasoning that already happened internally?

Third, it connects to the 2025 "Thinking Without Words" research on abstract CoT. That work showed models can "think" using abstract tokens instead of natural language. This paper strengthens the theoretical foundation for that approach.

Limitations

This is a position paper, and its scope has clear boundaries.

  • Experiments cover specific models and benchmarks. Generalization to all LLMs is an open question.
  • The paper demonstrates that information exists in latent states, not how the reasoning mechanism works.
  • This isn't arguing CoT is useless. CoT helps. The claim is that the reason it helps might be different from what we assumed.

CoT prompting won't change overnight. But our understanding of why it works definitely needs updating.


References

출처

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.

매일 30개+ 소스 분석 · 한국어/영어 이중 언어광고 없음 · 1-클릭 해지