Gemini 3.1 Ultra ships 2M tokens — and runs code in the loop

2,000,000

Two million tokens. That's the new ceiling on Gemini 3.1 Ultra, and it holds across text, image, audio, and video in the same context. The bigger surprise wasn't the number — it was the second card next to it: a built-in code execution sandbox so the model can write, run, and test code inline.

This lands one day after OpenAI's GPT-5.4 announcement (1M tokens, multi-step autonomy). Google answered with double the context and inline execution.

The players — DeepMind and Google Cloud

Google DeepMind, led by Demis Hassabis, runs as one merged research-and-product org since the 2023 Brain/DeepMind merger. Gemini 1.5 introduced 1M tokens; 2.5 hardened multimodal alignment; 3.1 doubles to 2M and adds the sandbox.

Vertex AI is the revenue channel. Vertex competes head-on with AWS Bedrock and Azure OpenAI; with full-codebase analysis now feasible in a single call, Vertex picks up a clear differentiator.

[IMG#1]

What's new

Spec	Gemini 3.1 Ultra	Gemini 2.5 Pro	GPT-5.4	Claude 4.5 Opus
Context	2,000,000	1,000,000	1,000,000	500,000
Multimodal	text/image/audio/video	text/image/audio/video	text/image	text/image
Code execution	Built-in sandbox	External tools	Code Interpreter	External tools
Input price ($/1M)	$1.25	$1.25	$5.00	$15.00
Output price ($/1M)	$5.00	$5.00	$15.00	$75.00

Pricing is the loudest signal. Google held Pro pricing while doubling context. Where OpenAI and Anthropic have been raising frontier prices, Google froze them.

The code sandbox is the real story

Marketed as the Code Execution Tool. Two key properties: (1) the model writes code and immediately runs it inside a gVisor-isolated environment, with results returned into the same context. (2) 2M tokens can hold code, output, and data simultaneously — meaning "analyze codebase → patch → test → draft PR" can run inside one conversation.

OpenAI's Code Interpreter demonstrated this pattern earlier, but its smaller context limited it to small projects. Google removed that ceiling.

Who wins

Google — Vertex AI for the first time leads on both price and capability simultaneously, a slot GPT-5 held last year. AdSense, Workspace, and Cloud's AI-adjacent revenue look set to accelerate next quarter.

Developers — full-monorepo single-call analysis. Cursor, Cline, Aider are likely to make Gemini their default and keep Claude/GPT as fallbacks.

Anthropic — short-term pressure. Claude 4.5 Opus is $15/M input at 500K context; Gemini is $1.25/M at 2M. Long-context coding workloads are likely to migrate.

[IMG#2]

Past context races

Round 1 (2023): Anthropic's Claude 100K, OpenAI's GPT-4 Turbo 128K. Round 2 (2024): Gemini 1.5 1M, Anthropic 200K, OpenAI 128K → 256K. Round 3 (now): Google 2M without a price hike.

The pattern is clear — context length isn't a durable moat; "double at the same price" is.

Counter-moves

OpenAI's response is GPT-5.4: differentiate on multi-step autonomy (OSWorld-V 75%), not context length.

Anthropic with Claude Sonnet 4.6 emphasizes coding/tool-use accuracy over raw context length.

Meta's rumored Llama 5 line will likely answer with "open weights + 1M tokens" — competing on self-hosting, not price.

Stakes

Wins: Google — Vertex revenue, agent-coding default, cloud AI share.
Wins: Developers — flat price + double context = practical large-repo analysis.
Loses: Anthropic — short-term loss in long-context coding workloads, partly offset by MCP standard control.
Watching: OpenAI — does GPT-5.5 match 2M and at what price?
Watching: Cloud Big 3 — AWS/Azure may answer with their own price moves.

Skeptical view

Simon Willison: "Marketing 2M tokens vs. actual recall accuracy at the tail end are different metrics — needs long-context retrieval benchmarks before declaring victory."

Yann LeCun (Meta Chief AI Scientist): "Reasoning, not token length, defines the next leap."

What changes for you

For builders — Vertex pricing makes Gemini 3.1 the default for new coding-agent stacks; keep Claude/GPT as fallbacks. The Long Context cookbook is the starting point.

For founders — domains where long context is essential (law, medicine, software) can win on Vertex alone in the short term. Build a multi-LLM abstraction layer from day one in case pricing changes.

For investors — watch GOOG Q2: Cloud growth rate is the leading indicator.

For end users — Google AI Studio and the Gemini app expose parts of the 2M experience for free. Drop a long PDF or video to test it.

3-Line Summary

Gemini 3.1 Ultra ships with 2M-token context and a built-in code execution sandbox.
Pricing matches the prior Pro tier — Google froze frontier pricing.
OpenAI and Anthropic counter-cycles shorten; coding-agent defaults are in flux.

References

Google DeepMind — Gemini 3.1 announcement
AI Studio — Long Context Cookbook
TechCrunch — Gemini 3.1 hands-on
Bloomberg — Google AI strategy
Simon Willison — Long Context notes

Gemini 3.1 Ultra ships 2M tokens — and runs code in the loop

2,000,000

The players — DeepMind and Google Cloud

What's new

The code sandbox is the real story

Who wins

Past context races

Counter-moves

Stakes

Skeptical view

What changes for you

3-Line Summary

References

출처

관련 기사

Google Gemini 3.1 Ultra Ships With 2M Token Context and Native Multimodal Reasoning

Gemini 3.1 Ultra ships — 2M context, native text·image·audio·video multimodal

Gemini Just Redefined Google Workspace — A Complete Breakdown of the Docs, Sheets, Slides, and Drive Overhaul

2,000,000

The players — DeepMind and Google Cloud

What's new

The code sandbox is the real story

Who wins

Past context races

Counter-moves

Stakes

Skeptical view

What changes for you

3-Line Summary

References

출처

관련 기사

Google Gemini 3.1 Ultra Ships With 2M Token Context and Native Multimodal Reasoning

Gemini 3.1 Ultra ships — 2M context, native text·image·audio·video multimodal

Gemini Just Redefined Google Workspace — A Complete Breakdown of the Docs, Sheets, Slides, and Drive Overhaul

AI 트렌드를 앞서가세요