GLM-5.1 Just Topped SWE-Bench Pro – And It's Fully Open Source

An Open-Source Model Just Beat Every Closed Model at Coding

58.4. That's the SWE-Bench Pro score posted by GLM-5.1 on April 7 – the highest any model has ever achieved on one of the most practical coding benchmarks in the industry. The model that posted it wasn't built by OpenAI or Anthropic. It came from Z.ai, a Beijing-based company formerly known as Zhipu AI.

GLM-5.1 beat OpenAI's GPT-5.4 (57.7) and Anthropic's Claude Opus 4.6 (57.3) – and here's the kicker: it ships under the MIT license. Fully open source, free to download, free to use commercially. No restrictions.

This is effectively the first time an open-source model has taken the top spot on SWE-Bench Pro, the benchmark that asks AI to fix real bugs in real open-source projects.

Where Z.ai Came From

Z.ai started as Zhipu AI, a Tsinghua University spinoff founded in 2019. The company built its reputation on the GLM (General Language Model) series, initially focused on Chinese-language capabilities.

The turning point came in 2024 when GLM-4 approached GPT-4 level performance on global benchmarks. Investment poured in. Then on January 8, 2026, the company IPO'd on the Hong Kong Stock Exchange – becoming the first publicly traded foundation model company in the world.

Metric	Value
IPO Date	January 8, 2026 (Hong Kong)
Capital Raised	HKD 4.35B (~$558M)
Market Cap	~$31.3B
Founded	2019 (Tsinghua University spinoff)
HQ	Beijing, China

The IPO capital went straight into GLM-5 series development. Three months later, GLM-5.1 arrived.

Under the Hood – 744B Parameters, 40B Active

The MoE Architecture

GLM-5.1 runs on a Mixture-of-Experts (MoE) architecture with 744 billion total parameters but only 40 billion active at inference time. Think of it like having dozens of specialist sub-models inside one giant model – for each input, only the most relevant specialists activate.

This design gives you the knowledge of a 744B model at the compute cost of a 40B model.

Spec	GLM-5.1	Claude Opus 4.6	GPT-5.4
Total Parameters	744B (MoE)	Undisclosed	Undisclosed
Active Parameters	40B	Undisclosed	Undisclosed
Context Window	200K tokens	200K tokens	1M tokens
Max Output Length	131,072 tokens	Undisclosed	Undisclosed
SWE-Bench Pro	58.4	57.3	57.7
License	MIT (fully open)	Proprietary	Proprietary

8-Hour Autonomous Coding

The standout feature is what Z.ai calls "agentic engineering." Give GLM-5.1 a coding task and it can work on it autonomously for up to eight hours. It plans, writes code, runs tests, identifies failures, and iterates – mimicking the full work cycle of a software engineer through an entire workday.

What SWE-Bench Pro Actually Tests

SWE-Bench Pro isn't a simple code completion test. It takes real GitHub issues from real open-source projects and asks the model to read the issue, navigate the codebase, modify multiple files, and make the tests pass. It's the closest thing we have to measuring how an AI would perform as an actual software engineer.

Until now, the top of that leaderboard was exclusively occupied by proprietary models. GLM-5.1 changed that.

The Bigger Picture – Open Source Is Closing the Gap Fast

April 2026 has been a landmark month for open-source AI. On April 2, Google released Gemma 4 under Apache 2.0 – four model sizes spanning from smartphones to workstations. Days later, Z.ai's GLM-5.1 took the SWE-Bench Pro crown under MIT license.

The momentum traces back to DeepSeek-V3's success in late 2025, which proved that open-weight models could compete at the frontier level. That shifted the Overton window for what open source could achieve, and the results are now cascading through 2026.

For startups and developers, this changes the calculus. The question is no longer "can open source match proprietary models?" It's "do I even need a proprietary API anymore?"

What This Means for You

Three practical takeaways from GLM-5.1's arrival.

First, the cost of high-end coding agents could drop significantly. Until now, the best coding AI required paid API access to Claude or GPT. With GLM-5.1 under MIT license, self-hosting becomes a viable path to top-tier coding performance at a fraction of the cost.

Second, companies with sensitive codebases have a new option. If sending code through external APIs makes your security team nervous, you can now run the best-performing coding model on your own infrastructure.

Third, competition just got healthier. A strong open-source challenger in coding AI will accelerate price cuts and performance improvements from proprietary providers too.

That said, topping SWE-Bench Pro doesn't mean GLM-5.1 is the best model at everything. For general conversation, creative work, and other tasks, Claude and GPT may still hold advantages. But in coding – the most commercially practical AI use case – open source just planted its flag at the summit.

GLM-5.1 Just Topped SWE-Bench Pro – And It's Fully Open Source

An Open-Source Model Just Beat Every Closed Model at Coding

Where Z.ai Came From

Under the Hood – 744B Parameters, 40B Active

The MoE Architecture

8-Hour Autonomous Coding

What SWE-Bench Pro Actually Tests

The Bigger Picture – Open Source Is Closing the Gap Fast

What This Means for You

References

출처

관련 기사

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

DeepSeek V4 Just Shattered the Open-Source Ceiling With 1 Trillion Parameters

Qwen 3.5 Medium Beats Sonnet 4.5 on Benchmarks — and It's Free

An Open-Source Model Just Beat Every Closed Model at Coding

Where Z.ai Came From

Under the Hood – 744B Parameters, 40B Active

The MoE Architecture

8-Hour Autonomous Coding

What SWE-Bench Pro Actually Tests

The Bigger Picture – Open Source Is Closing the Gap Fast

What This Means for You

References

출처

관련 기사

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

DeepSeek V4 Just Shattered the Open-Source Ceiling With 1 Trillion Parameters

Qwen 3.5 Medium Beats Sonnet 4.5 on Benchmarks — and It's Free

AI 트렌드를 앞서가세요