spoonai
TOPGLM-5.1Z.aiZhipu AI

GLM-5.1 Just Topped SWE-Bench Pro – And It's Fully Open Source

China's Z.ai released GLM-5.1, scoring 58.4 on SWE-Bench Pro to beat Claude Opus 4.6 (57.3) and GPT-5.4 (57.7). The 744B MoE model ships under MIT license.

공유
GLM-5.1 model achieving first place on SWE-Bench Pro benchmark
Unsplash

An Open-Source Model Just Beat Every Closed Model at Coding

58.4. That's the SWE-Bench Pro score posted by GLM-5.1 on April 7 – the highest any model has ever achieved on one of the most practical coding benchmarks in the industry. The model that posted it wasn't built by OpenAI or Anthropic. It came from Z.ai, a Beijing-based company formerly known as Zhipu AI.

GLM-5.1 beat OpenAI's GPT-5.4 (57.7) and Anthropic's Claude Opus 4.6 (57.3) – and here's the kicker: it ships under the MIT license. Fully open source, free to download, free to use commercially. No restrictions.

This is effectively the first time an open-source model has taken the top spot on SWE-Bench Pro, the benchmark that asks AI to fix real bugs in real open-source projects.


Where Z.ai Came From

Z.ai started as Zhipu AI, a Tsinghua University spinoff founded in 2019. The company built its reputation on the GLM (General Language Model) series, initially focused on Chinese-language capabilities.

The turning point came in 2024 when GLM-4 approached GPT-4 level performance on global benchmarks. Investment poured in. Then on January 8, 2026, the company IPO'd on the Hong Kong Stock Exchange – becoming the first publicly traded foundation model company in the world.

Metric Value
IPO Date January 8, 2026 (Hong Kong)
Capital Raised HKD 4.35B (~$558M)
Market Cap ~$31.3B
Founded 2019 (Tsinghua University spinoff)
HQ Beijing, China

The IPO capital went straight into GLM-5 series development. Three months later, GLM-5.1 arrived.


Under the Hood – 744B Parameters, 40B Active

The MoE Architecture

GLM-5.1 runs on a Mixture-of-Experts (MoE) architecture with 744 billion total parameters but only 40 billion active at inference time. Think of it like having dozens of specialist sub-models inside one giant model – for each input, only the most relevant specialists activate.

This design gives you the knowledge of a 744B model at the compute cost of a 40B model.

Spec GLM-5.1 Claude Opus 4.6 GPT-5.4
Total Parameters 744B (MoE) Undisclosed Undisclosed
Active Parameters 40B Undisclosed Undisclosed
Context Window 200K tokens 200K tokens 1M tokens
Max Output Length 131,072 tokens Undisclosed Undisclosed
SWE-Bench Pro 58.4 57.3 57.7
License MIT (fully open) Proprietary Proprietary

8-Hour Autonomous Coding

The standout feature is what Z.ai calls "agentic engineering." Give GLM-5.1 a coding task and it can work on it autonomously for up to eight hours. It plans, writes code, runs tests, identifies failures, and iterates – mimicking the full work cycle of a software engineer through an entire workday.

What SWE-Bench Pro Actually Tests

SWE-Bench Pro isn't a simple code completion test. It takes real GitHub issues from real open-source projects and asks the model to read the issue, navigate the codebase, modify multiple files, and make the tests pass. It's the closest thing we have to measuring how an AI would perform as an actual software engineer.

Until now, the top of that leaderboard was exclusively occupied by proprietary models. GLM-5.1 changed that.


The Bigger Picture – Open Source Is Closing the Gap Fast

April 2026 has been a landmark month for open-source AI. On April 2, Google released Gemma 4 under Apache 2.0 – four model sizes spanning from smartphones to workstations. Days later, Z.ai's GLM-5.1 took the SWE-Bench Pro crown under MIT license.

The momentum traces back to DeepSeek-V3's success in late 2025, which proved that open-weight models could compete at the frontier level. That shifted the Overton window for what open source could achieve, and the results are now cascading through 2026.

For startups and developers, this changes the calculus. The question is no longer "can open source match proprietary models?" It's "do I even need a proprietary API anymore?"


What This Means for You

Three practical takeaways from GLM-5.1's arrival.

First, the cost of high-end coding agents could drop significantly. Until now, the best coding AI required paid API access to Claude or GPT. With GLM-5.1 under MIT license, self-hosting becomes a viable path to top-tier coding performance at a fraction of the cost.

Second, companies with sensitive codebases have a new option. If sending code through external APIs makes your security team nervous, you can now run the best-performing coding model on your own infrastructure.

Third, competition just got healthier. A strong open-source challenger in coding AI will accelerate price cuts and performance improvements from proprietary providers too.

That said, topping SWE-Bench Pro doesn't mean GLM-5.1 is the best model at everything. For general conversation, creative work, and other tasks, Claude and GPT may still hold advantages. But in coding – the most commercially practical AI use case – open source just planted its flag at the summit.

References

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.