GenericAgent — 100% Completion at 222k Tokens, Just 27.7% of Claude Code
arXiv 2604.17091 GenericAgent reaches 100% completion on Lifelong AgentBench with 222k input tokens — 27.7% of Claude Code, 15.5% of OpenClaw. A single principle (Context Information Density Maximization) unifies atomic tools, hierarchical memory, self-evolution, and context truncation.

27.7%
Same job, 27.7% of the tokens. GenericAgent reaches 100% completion on Lifelong AgentBench with 27.7% of Claude Code's input tokens and 15.5% of OpenClaw's. With token costs back in the spotlight in May, this is the strongest counterpunch.
In Plain English
Most agents went bigger context = better. GenericAgent inverts that: maximize the information density inside the context, and you can do more with less. A 30k context can be enough for a self-evolving agent — that's the headline.
Authors / Citation
Authors are the lsdefine GitHub maintainer group. arXiv ID 2604.17091, published April 21. Featured on Hugging Face Papers the same week and amplified by Mervin Praison's intro video.
Prior Limits
Two camps in self-evolving agent research: ① large context (100k+) with full history → better completion, ballooning costs; ② small context with external memory calls → cheaper but latency/consistency issues. Both treated context size as the primary variable; density was a side note.
Method
GenericAgent unifies four mechanisms under one principle, Context Information Density Maximization (CIDM):
- Atomic tools (9): A small toolkit that gives the LLM local-system control with near-zero token overhead per call.
- Hierarchical on-demand memory: Don't keep all history in context — retrieve only what the current step needs.
- Self-evolution: Crystallize each successful execution path into a reusable SOP/code in a personal skill tree.
- Context truncation: After a step, push unneeded history into delegated sub-agents and refresh the main context.
Results
| Model | Completion on Lifelong AgentBench | Input tokens | Relative cost |
|---|---|---|---|
| GenericAgent | 100% | 222k | 1.0× (base) |
| Claude Code | 100% | 802k | 3.61× |
| OpenClaw | 100% | 1,432k | 6.45× |
| GPT-5.4 base agent | 87% | 540k | 2.43× |
Two takeaways. First, parity completion at much lower token spend. Second, the heavier implication: a 30k self-evolving agent works, undermining the "bigger model + bigger context" default that's been industry-standard for three years.
Why It Matters
Industrial implications: token costs are back in the discourse — Anthropic Opus 4.7 reportedly uses +27% tokens vs Opus 4.6 for the same prompt (per HN/Reddit measurements). GenericAgent moves the opposite direction, -73% on the same task. Theoretical implication: capability isn't ruled by context size, it's ruled by information density. That reframes RAG, agent, and tool-use design.
Critiques
Yann LeCun (AMI Labs CEO): "Lifelong AgentBench is one benchmark. Long tail will say more." — Real-world long-tail tasks need additional validation. The 9 atomic tools also need redesign per domain; the paper offers only partial guidance for generalization.
Self-evolution security is another concern — crystallizing dangerous code into the skill tree is plausible without strong sandboxing. v1 is light on it.
TL;DR
In a token-cost-sensitive era, the answer may not be "bigger context" but "denser context." GenericAgent is the first quantitative case for it.
Sources
관련 기사

LLM Reasoning Happens Before the Words -- Not Because of Them
arXiv 2604.15726 argues that LLM reasoning occurs in latent transformer states, not in explicit Chain-of-Thought text. Three hypotheses tested quantitatively.

Memory as Metabolism: A Design for Companion Knowledge Systems
The paper analyzes the April 2026 wave of 'personal wiki' memory architectures (Karpathy's LLM Wiki, MemPalace, etc.) and proposes treating memory like metabolism — five

Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture
MemPalace blew up to 47k GitHub stars in two weeks and claims 96.6% Recall@5 on LongMemEval. This paper critically analyzes the design. Key findings: (1) a contrarian ver
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
