GLM-5.1 Just Topped SWE-Bench Pro – And It's Fully Open Source
China's Z.ai released GLM-5.1, scoring 58.4 on SWE-Bench Pro to beat Claude Opus 4.6 (57.3) and GPT-5.4 (57.7). The 744B MoE model ships under MIT license.

An Open-Source Model Just Beat Every Closed Model at Coding
58.4. That's the SWE-Bench Pro score posted by GLM-5.1 on April 7 – the highest any model has ever achieved on one of the most practical coding benchmarks in the industry. The model that posted it wasn't built by OpenAI or Anthropic. It came from Z.ai, a Beijing-based company formerly known as Zhipu AI.
GLM-5.1 beat OpenAI's GPT-5.4 (57.7) and Anthropic's Claude Opus 4.6 (57.3) – and here's the kicker: it ships under the MIT license. Fully open source, free to download, free to use commercially. No restrictions.
This is effectively the first time an open-source model has taken the top spot on SWE-Bench Pro, the benchmark that asks AI to fix real bugs in real open-source projects.
Where Z.ai Came From
Z.ai started as Zhipu AI, a Tsinghua University spinoff founded in 2019. The company built its reputation on the GLM (General Language Model) series, initially focused on Chinese-language capabilities.
The turning point came in 2024 when GLM-4 approached GPT-4 level performance on global benchmarks. Investment poured in. Then on January 8, 2026, the company IPO'd on the Hong Kong Stock Exchange – becoming the first publicly traded foundation model company in the world.
| Metric | Value |
|---|---|
| IPO Date | January 8, 2026 (Hong Kong) |
| Capital Raised | HKD 4.35B (~$558M) |
| Market Cap | ~$31.3B |
| Founded | 2019 (Tsinghua University spinoff) |
| HQ | Beijing, China |
The IPO capital went straight into GLM-5 series development. Three months later, GLM-5.1 arrived.
Under the Hood – 744B Parameters, 40B Active
The MoE Architecture
GLM-5.1 runs on a Mixture-of-Experts (MoE) architecture with 744 billion total parameters but only 40 billion active at inference time. Think of it like having dozens of specialist sub-models inside one giant model – for each input, only the most relevant specialists activate.
This design gives you the knowledge of a 744B model at the compute cost of a 40B model.
| Spec | GLM-5.1 | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
| Total Parameters | 744B (MoE) | Undisclosed | Undisclosed |
| Active Parameters | 40B | Undisclosed | Undisclosed |
| Context Window | 200K tokens | 200K tokens | 1M tokens |
| Max Output Length | 131,072 tokens | Undisclosed | Undisclosed |
| SWE-Bench Pro | 58.4 | 57.3 | 57.7 |
| License | MIT (fully open) | Proprietary | Proprietary |
8-Hour Autonomous Coding
The standout feature is what Z.ai calls "agentic engineering." Give GLM-5.1 a coding task and it can work on it autonomously for up to eight hours. It plans, writes code, runs tests, identifies failures, and iterates – mimicking the full work cycle of a software engineer through an entire workday.
What SWE-Bench Pro Actually Tests
SWE-Bench Pro isn't a simple code completion test. It takes real GitHub issues from real open-source projects and asks the model to read the issue, navigate the codebase, modify multiple files, and make the tests pass. It's the closest thing we have to measuring how an AI would perform as an actual software engineer.
Until now, the top of that leaderboard was exclusively occupied by proprietary models. GLM-5.1 changed that.
The Bigger Picture – Open Source Is Closing the Gap Fast
April 2026 has been a landmark month for open-source AI. On April 2, Google released Gemma 4 under Apache 2.0 – four model sizes spanning from smartphones to workstations. Days later, Z.ai's GLM-5.1 took the SWE-Bench Pro crown under MIT license.
The momentum traces back to DeepSeek-V3's success in late 2025, which proved that open-weight models could compete at the frontier level. That shifted the Overton window for what open source could achieve, and the results are now cascading through 2026.
For startups and developers, this changes the calculus. The question is no longer "can open source match proprietary models?" It's "do I even need a proprietary API anymore?"
What This Means for You
Three practical takeaways from GLM-5.1's arrival.
First, the cost of high-end coding agents could drop significantly. Until now, the best coding AI required paid API access to Claude or GPT. With GLM-5.1 under MIT license, self-hosting becomes a viable path to top-tier coding performance at a fraction of the cost.
Second, companies with sensitive codebases have a new option. If sending code through external APIs makes your security team nervous, you can now run the best-performing coding model on your own infrastructure.
Third, competition just got healthier. A strong open-source challenger in coding AI will accelerate price cuts and performance improvements from proprietary providers too.
That said, topping SWE-Bench Pro doesn't mean GLM-5.1 is the best model at everything. For general conversation, creative work, and other tasks, Claude and GPT may still hold advantages. But in coding – the most commercially practical AI use case – open source just planted its flag at the summit.
References
관련 기사

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

DeepSeek V4 Just Shattered the Open-Source Ceiling With 1 Trillion Parameters

Qwen 3.5 Medium Beats Sonnet 4.5 on Benchmarks — and It's Free
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
