TOPOpen SourceGLMGemma

The Week Open Source Caught Up: Gemma 4 and GLM-5.1

Google released Gemma 4 under Apache 2.0 and Z.ai released GLM-5.1 under MIT this week. GLM-5.1 hit 58.4 on SWE-Bench Pro, edging past Claude Opus 4.6 and GPT-5.4 – the first open-source model to lead a major coding benchmark.

2026년 4월 11일 (토)·7분 소요

Open source code and terminal — symbolizing the GLM-5.1 and Gemma 4 releases — Source: Unsplash / Markus Spiske

58.4. That's the score that put open source on top of coding benchmarks for the first time.

On April 7, Z.ai (formerly Zhipu AI) released GLM-5.1 as an open-weights model. Its SWE-Bench Pro score: 58.4. Right below it: GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3.

The margin is small. The meaning is huge. This is the first time a permissively-licensed open-source model has taken the top slot on a major coding benchmark.

Two days earlier, on April 5, Google released Gemma 4 under Apache 2.0 – no commercial restrictions, no user-count caps, no clickthrough license. Frontier-class performance shipped fully open, twice in one week.

Here's the deal: this isn't a one-off. It's the moment the long-running "open source is 12-18 months behind" narrative stopped being true.

To understand this, you need to know what SWE-Bench Pro is

SWE-Bench is a benchmark Princeton researchers released in 2023. You hand the model a real GitHub issue from a real repo, and ask it to produce a pull request that passes the test suite. It's about as close to "actually doing the job" as benchmarks get – not toy code completions, but real repository-scale debugging.

SWE-Bench Pro is the harder variant: bigger codebases, messier issues, longer context windows. It's the one the frontier labs compete on when they want to show real coding ability.

Here's how the top score has moved over the last year.

Date	Leader	SWE-Bench Pro	License
2025-06	GPT-5.0	38.5	Closed
2025-10	Claude Opus 4.5	49.1	Closed
2026-01	GPT-5.4 Thinking	55.2	Closed
2026-03	Claude Opus 4.6	57.3	Closed
2026-04	GLM-5.1	58.4	Open (MIT)

Twenty points of improvement in 10 months, and the current record holder is free to download.

Anatomy of GLM-5.1

1) 754B parameters, MoE architecture

GLM-5.1 is a 754-billion parameter Mixture of Experts (MoE) model. Think of it like this: instead of activating the entire neural network for every query, MoE routes each input through a small subset of "expert" subnetworks. You get the knowledge of a giant model at the inference cost of a much smaller one.

Active parameters per forward pass are around 62B. Z.ai claims inference cost runs about a third of GPT-5.4's, with throughput roughly 2.1x faster based on HuggingFace-published benchmarks.

2) MIT license – the real story

This is where it gets interesting. GLM-5.1 ships under the MIT license, which is about as permissive as it gets. You can download it, modify it, deploy it, charge money for it, without asking anybody.

For comparison: Meta's Llama 4 uses the "Llama Community License," which blocks any service with more than 700M monthly active users. Google's earlier Gemma models had a custom "Gemma Terms of Use." Gemma 4 this week dropped all that and shipped as Apache 2.0. GLM-5.1 is one step more open.

Model	Params	License	Commercial limits
Llama 4	500B MoE	Llama Community	700M MAU cap
Gemma 4	135B dense	Apache 2.0	None
GLM-5.1	754B MoE	MIT	None
DeepSeek V4	671B MoE	DeepSeek License	Restricted

3) Gemma 4 – Google's "catch us if you can" card

The Gemma 4 release got less attention than GLM-5.1, but structurally it's just as important. Google formalized a dual strategy: Gemini for the frontier, Gemma for the open ecosystem.

Gemma 4 ships in four sizes: 2B, 9B, 27B, and 135B. The 27B variant is the sweet spot for a single high-end GPU. The 135B version reportedly matches GPT-5.0 on MATH Level 5 (a high school math reasoning benchmark), which would have been unthinkable for an open model a year ago.

Google has historically been conservative with open weights. This is the first Gemma release where "actually usable frontier" is a fair description.

The bigger picture: why now?

A year ago, the consensus was that open source trailed frontier labs by 12-18 months. That gap has effectively collapsed this week. Three reasons.

First, training recipes leaked. DeepSeek published detailed MoE training notes in late 2024. Meta published chunks of Llama 4 infrastructure. Those recipes flowed to Chinese, European, and academic labs, and suddenly "we can do this too" stopped being wishful thinking.

Second, compute costs dropped. The $100M price tag for a frontier training run in 2024 is now closer to $20M in 2026. That's the same efficiency trend Anthropic's $30B story captures from a different angle – frontier labs are spending less per FLOP, and open labs are catching the benefit.

Third, frontier labs priced themselves into a corner. As enterprise revenue scaled, pricing crept up. API costs for Claude Opus 4.6 and GPT-5.4 are now high enough that open-source alternatives have genuine economic pull, not just ideological appeal.

Tier	Representative model	Strength	Weakness
Closed frontier	GPT-5.4, Claude Opus 4.6	Best average benchmarks	Price, lock-in
Open frontier	GLM-5.1, Gemma 4 135B	Competitive, fully permissive	Self-hosting overhead
Local optimum	Gemma 4 27B, Qwen 3 32B	Runs on one GPU	Capability ceiling
Edge	Phi-4, Gemma 4 2B	Mobile/embedded	Limited reasoning

"Open frontier" became a real category this week. GLM-5.1 is the first model that competes head-to-head with GPT-5.4 while being free.

The question is no longer "how good is your model?" The question is "how well does your platform deliver it?"

So what actually changes?

For developers, the first thing that shifts is prototype economics. A side project that used to burn $300-500/month on Claude or GPT API calls can now run on GLM-5.1 or Gemma 4 for $20-50, or essentially free if you own GPU hardware. r/LocalLLaMA spent the week trading 4-bit quantized GLM-5.1 builds – reports suggest it fits on a single RTX 5090.

For startups, the strategic reset is real. "Claude API wrapper" as a business model just got riskier. The counter-move is owning domain-specific data and fine-tuning open models on top. Legal, medical, and financial verticals especially have room to ship models that outperform general-purpose frontier on their narrow task.

For enterprise IT, this is a vendor lock-in escape route. Until now, the performance gap forced companies to accept Claude or GPT even when data-residency rules or geopolitics made them nervous. That forced trade-off is gone. For regulated EU and Asian markets, GLM-5.1 and Gemma 4 rewrite the deployment options overnight.

The competition isn't done escalating, either. Meta is preparing its first open-source release under Alexandr Wang's leadership, with a delivery window that rumor-mill places in late April. DeepSeek V5 is reportedly in late testing. The next open frontier drop could come within weeks.

One-sentence summary of the week.

The open source gap closed. The competition moved from "who has the best model" to "who ships it best."

References

Frequently Asked Questions

What is the article "The Week Open Source Caught Up: Gemma 4 and GLM-5.1" about?

Google released Gemma 4 under Apache 2.0 and Z.ai released GLM-5.1 under MIT this week. GLM-5.1 hit 58.4 on SWE-Bench Pro, edging past Claude Opus 4.6 and GPT-5.4 – the first open-source model to lead a major coding benchmark.

Why is this news important?

On April 7, Z.ai (formerly Zhipu AI) released GLM-5.1 as an open-weights model. Its SWE-Bench Pro score: 58.4. Right below it: GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3.

Which companies or organizations are mentioned in this article?

The key entities covered in this article include Open Source, GLM, Gemma, SWE-Bench, Zhipu, Google, LLM.

When was this article published?

This article was published on 2026-04-11 by spoonai.

What are the main topics covered in this article?

This article covers: 58.4. That's the score that put open source on top of coding benchmarks for the first time., To understand this, you need to know what SWE-Bench Pro is, Anatomy of GLM-5.1, The bigger picture: why now?, So what actually changes?.

The Week Open Source Caught Up: Gemma 4 and GLM-5.1

58.4. That's the score that put open source on top of coding benchmarks for the first time.

To understand this, you need to know what SWE-Bench Pro is

Anatomy of GLM-5.1

1) 754B parameters, MoE architecture

2) MIT license – the real story

3) Gemma 4 – Google's "catch us if you can" card

The bigger picture: why now?

So what actually changes?

References

Frequently Asked Questions

관련 기사

Gemma 4 Is Here — And It's Finally Apache 2.0

This AI Rewrites Its Own Code — MiniMax M2.7's Self-Evolution Experiment

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

58.4. That's the score that put open source on top of coding benchmarks for the first time.

To understand this, you need to know what SWE-Bench Pro is

Anatomy of GLM-5.1

1) 754B parameters, MoE architecture

2) MIT license – the real story

3) Gemma 4 – Google's "catch us if you can" card

The bigger picture: why now?

So what actually changes?

References

Frequently Asked Questions

관련 기사

Gemma 4 Is Here — And It's Finally Apache 2.0

This AI Rewrites Its Own Code — MiniMax M2.7's Self-Evolution Experiment

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know

AI 트렌드를 앞서가세요