TOPMiniMaxM2.7Self-Evolving

This AI Rewrites Its Own Code — MiniMax M2.7's Self-Evolution Experiment

Q: Which companies or organizations are mentioned in this article?

The key entities covered in this article include MiniMax, M2.7, Self-Evolving, LLM, Agent, SWE-Bench, Open Source.

Q: When was this article published?

This article was published on 2026-04-12 by spoonai.

Q: What are the main topics covered in this article?

This article covers: 56.22% — And Why That Number Matters, The Backstory — Who Is MiniMax?, How Self-Evolution Actually Works, The Bigger Picture — China's AI Counteroffensive, What This Means for You.

MiniMax M2.7 autonomously improved itself over 100+ iterations, scoring 56.22% on SWE-Pro — near Claude Opus 4.6 levels — at 1/50th the price.

2026년 4월 12일 (일)·5분 소요·

MiniMax M2.7 self-evolving LLM concept illustration — Unsplash

56.22% — And Why That Number Matters

SWE-Pro is one of the hardest benchmarks in AI right now. It doesn't test whether a model can chat or write poetry — it tests whether a model can read real GitHub issues, understand codebases, and actually fix bugs. Think of it as a practical software engineering exam for AI.

A model just scored 56.22% on it. That's within striking distance of Claude Opus 4.6's best (57.3%) and on par with GPT-5.3 Codex.

The model is called M2.7. The company behind it is MiniMax — a Chinese AI startup most people outside China haven't heard of.

Here's the kicker: the way it reached that score was fundamentally different from anything we've seen before.

The Backstory — Who Is MiniMax?

MiniMax was founded in 2021 by Yan Junchi, a former VP at SenseTime, one of China's biggest computer vision companies. The startup initially focused on consumer AI products for the Chinese market — music generation, video creation, that sort of thing.

But as the LLM race heated up globally, MiniMax pivoted hard. Its M2.5 model, released in late 2025, showed surprisingly strong performance on coding and agent benchmarks. It was a signal that this company had bigger ambitions than consumer apps.

Then in early April 2026, M2.7 dropped. And the AI community took notice.

Model	SWE-Pro	PinchBench	Price (input/output per 1M tokens)	Key Feature
MiniMax M2.7	56.22%	86.2%	$0.30 / $1.20	Self-evolving
Claude Opus 4.6	57.3%	87.4%	$15 / $75	Best all-around
GPT-5.3 Codex	56.22%	84.1%	$10 / $30	Code-focused
GPT-5.4	57.7%	85.3%	$12 / $60	Latest frontier

Look at those prices. M2.7's input tokens cost 1/50th of Claude Opus. Output tokens are 1/60th.

How Self-Evolution Actually Works

The Loop That Changes Everything

Normally, training an AI model requires heavy human involvement. Researchers prepare data, write training code, analyze results, tune hyperparameters, and repeat this cycle hundreds of times.

M2.7 handled 30–50% of this process on its own.

Here's the specific loop it ran autonomously:

Analyze failed task trajectories
Diagnose root causes of failures
Modify its scaffold code (the framework it operates within)
Run evaluations with the modified version
Compare results against the previous version
Keep improvements, revert regressions

It ran this cycle over 100 times without human intervention. The model was debugging itself, rewriting its own operational code, and measuring whether those changes actually helped.

Speed as a Weapon

M2.7 generates roughly 100 tokens per second — noticeably faster than Claude Opus 4.6 or GPT-5 class models. In agentic workflows, speed directly translates to cost savings. Faster completion means fewer tokens consumed per task, which means lower pipeline costs across the board.

An AI that rewrites its own code. Right now it's handling 30–50% of its development workflow. But the direction matters more than the percentage — we're watching the role of human researchers shift from "builder" to "supervisor."

Built for Agents, Not Chat

M2.7 isn't trying to be your friendly chatbot. It's designed as an agent backend — something you plug into third-party tools and harnesses to execute complex tasks autonomously. That's why its benchmark focus is on SWE-Pro (coding ability) and PinchBench (agent capability) rather than general conversation quality.

The Bigger Picture — China's AI Counteroffensive

The timing of M2.7 is no coincidence. The AI model landscape is shifting fast in April 2026.

In January, Zhipu AI's GLM-5.1 — an open-source 744B parameter MoE model released under the MIT license — scored 58.4% on SWE-Bench Pro, topping the global leaderboard and beating both GPT-5.4 and Claude Opus 4.6 in coding benchmarks.

Now MiniMax joins the charge. Chinese AI startups aren't just playing catch-up anymore — they're leading in specific domains, particularly coding and agentic tasks.

The real story here is economics. If M2.7 delivers Claude-level performance at 1/50th the price for agent workloads, the calculus for companies deploying AI agents at scale changes completely. Monthly API costs that used to run in the millions could drop to tens of thousands.

What This Means for You

Two things worth watching.

First, agent economics are about to get disrupted. The biggest barrier to putting agent systems in production has been API costs. At M2.7 pricing levels, running hundreds of concurrent agents becomes financially viable for the first time.

Second, the self-evolution paradigm is just getting started. Today it's scaffold code modifications. Tomorrow it could be models managing their own training pipelines end-to-end. The human researcher's job description is evolving from "hands-on builder" to "strategic supervisor."

There are clear limitations, of course. M2.7 is specialized for coding and agent tasks — it falls well short of Claude or GPT for general conversation and creative writing. And "self-evolution" sounds grander than it currently is. True self-evolution — where a model modifies its own architecture — remains far off.

But the direction is unmistakable. The very early signs of AI building AI are here.

This AI Rewrites Its Own Code — MiniMax M2.7's Self-Evolution Experiment

56.22% — And Why That Number Matters

The Backstory — Who Is MiniMax?

How Self-Evolution Actually Works

The Loop That Changes Everything

Speed as a Weapon

Built for Agents, Not Chat

The Bigger Picture — China's AI Counteroffensive

What This Means for You

Sources

출처

관련 기사

The Week Open Source Caught Up: Gemma 4 and GLM-5.1

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

DeepSeek V4 Just Shattered the Open-Source Ceiling With 1 Trillion Parameters

56.22% — And Why That Number Matters

The Backstory — Who Is MiniMax?

How Self-Evolution Actually Works

The Loop That Changes Everything

Speed as a Weapon

Built for Agents, Not Chat

The Bigger Picture — China's AI Counteroffensive

What This Means for You

Sources

출처

관련 기사

The Week Open Source Caught Up: Gemma 4 and GLM-5.1

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

DeepSeek V4 Just Shattered the Open-Source Ceiling With 1 Trillion Parameters

AI 트렌드를 앞서가세요