spoonai
TOPOpen SourceGLMGemma

The Week Open Source Caught Up: Gemma 4 and GLM-5.1

Google released Gemma 4 under Apache 2.0 and Z.ai released GLM-5.1 under MIT this week. GLM-5.1 hit 58.4 on SWE-Bench Pro, edging past Claude Opus 4.6 and GPT-5.4 – the first open-source model to lead a major coding benchmark.

·7분 소요·Z.ai Releases GLM-5.1: 754B Model Tops SWE-Bench Pro
공유
Open source code and terminal — symbolizing the GLM-5.1 and Gemma 4 releases
Source: Unsplash / Markus Spiske

58.4. That's the score that put open source on top of coding benchmarks for the first time.

On April 7, Z.ai (formerly Zhipu AI) released GLM-5.1 as an open-weights model. Its SWE-Bench Pro score: 58.4. Right below it: GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3.

The margin is small. The meaning is huge. This is the first time a permissively-licensed open-source model has taken the top slot on a major coding benchmark.

Two days earlier, on April 5, Google released Gemma 4 under Apache 2.0 – no commercial restrictions, no user-count caps, no clickthrough license. Frontier-class performance shipped fully open, twice in one week.

Here's the deal: this isn't a one-off. It's the moment the long-running "open source is 12-18 months behind" narrative stopped being true.


To understand this, you need to know what SWE-Bench Pro is

SWE-Bench is a benchmark Princeton researchers released in 2023. You hand the model a real GitHub issue from a real repo, and ask it to produce a pull request that passes the test suite. It's about as close to "actually doing the job" as benchmarks get – not toy code completions, but real repository-scale debugging.

SWE-Bench Pro is the harder variant: bigger codebases, messier issues, longer context windows. It's the one the frontier labs compete on when they want to show real coding ability.

Here's how the top score has moved over the last year.

Date Leader SWE-Bench Pro License
2025-06 GPT-5.0 38.5 Closed
2025-10 Claude Opus 4.5 49.1 Closed
2026-01 GPT-5.4 Thinking 55.2 Closed
2026-03 Claude Opus 4.6 57.3 Closed
2026-04 GLM-5.1 58.4 Open (MIT)

Twenty points of improvement in 10 months, and the current record holder is free to download.

Anatomy of GLM-5.1

1) 754B parameters, MoE architecture

GLM-5.1 is a 754-billion parameter Mixture of Experts (MoE) model. Think of it like this: instead of activating the entire neural network for every query, MoE routes each input through a small subset of "expert" subnetworks. You get the knowledge of a giant model at the inference cost of a much smaller one.

Active parameters per forward pass are around 62B. Z.ai claims inference cost runs about a third of GPT-5.4's, with throughput roughly 2.1x faster based on HuggingFace-published benchmarks.

2) MIT license – the real story

This is where it gets interesting. GLM-5.1 ships under the MIT license, which is about as permissive as it gets. You can download it, modify it, deploy it, charge money for it, without asking anybody.

For comparison: Meta's Llama 4 uses the "Llama Community License," which blocks any service with more than 700M monthly active users. Google's earlier Gemma models had a custom "Gemma Terms of Use." Gemma 4 this week dropped all that and shipped as Apache 2.0. GLM-5.1 is one step more open.

Model Params License Commercial limits
Llama 4 500B MoE Llama Community 700M MAU cap
Gemma 4 135B dense Apache 2.0 None
GLM-5.1 754B MoE MIT None
DeepSeek V4 671B MoE DeepSeek License Restricted

3) Gemma 4 – Google's "catch us if you can" card

The Gemma 4 release got less attention than GLM-5.1, but structurally it's just as important. Google formalized a dual strategy: Gemini for the frontier, Gemma for the open ecosystem.

Gemma 4 ships in four sizes: 2B, 9B, 27B, and 135B. The 27B variant is the sweet spot for a single high-end GPU. The 135B version reportedly matches GPT-5.0 on MATH Level 5 (a high school math reasoning benchmark), which would have been unthinkable for an open model a year ago.

Google has historically been conservative with open weights. This is the first Gemma release where "actually usable frontier" is a fair description.

The bigger picture: why now?

A year ago, the consensus was that open source trailed frontier labs by 12-18 months. That gap has effectively collapsed this week. Three reasons.

First, training recipes leaked. DeepSeek published detailed MoE training notes in late 2024. Meta published chunks of Llama 4 infrastructure. Those recipes flowed to Chinese, European, and academic labs, and suddenly "we can do this too" stopped being wishful thinking.

Second, compute costs dropped. The $100M price tag for a frontier training run in 2024 is now closer to $20M in 2026. That's the same efficiency trend Anthropic's $30B story captures from a different angle – frontier labs are spending less per FLOP, and open labs are catching the benefit.

Third, frontier labs priced themselves into a corner. As enterprise revenue scaled, pricing crept up. API costs for Claude Opus 4.6 and GPT-5.4 are now high enough that open-source alternatives have genuine economic pull, not just ideological appeal.

Tier Representative model Strength Weakness
Closed frontier GPT-5.4, Claude Opus 4.6 Best average benchmarks Price, lock-in
Open frontier GLM-5.1, Gemma 4 135B Competitive, fully permissive Self-hosting overhead
Local optimum Gemma 4 27B, Qwen 3 32B Runs on one GPU Capability ceiling
Edge Phi-4, Gemma 4 2B Mobile/embedded Limited reasoning

"Open frontier" became a real category this week. GLM-5.1 is the first model that competes head-to-head with GPT-5.4 while being free.

The question is no longer "how good is your model?" The question is "how well does your platform deliver it?"

So what actually changes?

For developers, the first thing that shifts is prototype economics. A side project that used to burn $300-500/month on Claude or GPT API calls can now run on GLM-5.1 or Gemma 4 for $20-50, or essentially free if you own GPU hardware. r/LocalLLaMA spent the week trading 4-bit quantized GLM-5.1 builds – reports suggest it fits on a single RTX 5090.

For startups, the strategic reset is real. "Claude API wrapper" as a business model just got riskier. The counter-move is owning domain-specific data and fine-tuning open models on top. Legal, medical, and financial verticals especially have room to ship models that outperform general-purpose frontier on their narrow task.

For enterprise IT, this is a vendor lock-in escape route. Until now, the performance gap forced companies to accept Claude or GPT even when data-residency rules or geopolitics made them nervous. That forced trade-off is gone. For regulated EU and Asian markets, GLM-5.1 and Gemma 4 rewrite the deployment options overnight.

The competition isn't done escalating, either. Meta is preparing its first open-source release under Alexandr Wang's leadership, with a delivery window that rumor-mill places in late April. DeepSeek V5 is reportedly in late testing. The next open frontier drop could come within weeks.

One-sentence summary of the week.

The open source gap closed. The competition moved from "who has the best model" to "who ships it best."

References

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.