Gemini 3.1 Ultra ships 2M tokens — and runs code in the loop
Google announced Gemini 3.1 Ultra with a 2M-token multimodal context and a built-in code execution sandbox. Same price as the prior Pro tier.

2,000,000
Two million tokens. That's the new ceiling on Gemini 3.1 Ultra, and it holds across text, image, audio, and video in the same context. The bigger surprise wasn't the number — it was the second card next to it: a built-in code execution sandbox so the model can write, run, and test code inline.
This lands one day after OpenAI's GPT-5.4 announcement (1M tokens, multi-step autonomy). Google answered with double the context and inline execution.
The players — DeepMind and Google Cloud
Google DeepMind, led by Demis Hassabis, runs as one merged research-and-product org since the 2023 Brain/DeepMind merger. Gemini 1.5 introduced 1M tokens; 2.5 hardened multimodal alignment; 3.1 doubles to 2M and adds the sandbox.
Vertex AI is the revenue channel. Vertex competes head-on with AWS Bedrock and Azure OpenAI; with full-codebase analysis now feasible in a single call, Vertex picks up a clear differentiator.
[IMG#1]
What's new
| Spec | Gemini 3.1 Ultra | Gemini 2.5 Pro | GPT-5.4 | Claude 4.5 Opus |
|---|---|---|---|---|
| Context | 2,000,000 | 1,000,000 | 1,000,000 | 500,000 |
| Multimodal | text/image/audio/video | text/image/audio/video | text/image | text/image |
| Code execution | Built-in sandbox | External tools | Code Interpreter | External tools |
| Input price ($/1M) | $1.25 | $1.25 | $5.00 | $15.00 |
| Output price ($/1M) | $5.00 | $5.00 | $15.00 | $75.00 |
Pricing is the loudest signal. Google held Pro pricing while doubling context. Where OpenAI and Anthropic have been raising frontier prices, Google froze them.
The code sandbox is the real story
Marketed as the Code Execution Tool. Two key properties: (1) the model writes code and immediately runs it inside a gVisor-isolated environment, with results returned into the same context. (2) 2M tokens can hold code, output, and data simultaneously — meaning "analyze codebase → patch → test → draft PR" can run inside one conversation.
OpenAI's Code Interpreter demonstrated this pattern earlier, but its smaller context limited it to small projects. Google removed that ceiling.
Who wins
Google — Vertex AI for the first time leads on both price and capability simultaneously, a slot GPT-5 held last year. AdSense, Workspace, and Cloud's AI-adjacent revenue look set to accelerate next quarter.
Developers — full-monorepo single-call analysis. Cursor, Cline, Aider are likely to make Gemini their default and keep Claude/GPT as fallbacks.
Anthropic — short-term pressure. Claude 4.5 Opus is $15/M input at 500K context; Gemini is $1.25/M at 2M. Long-context coding workloads are likely to migrate.
[IMG#2]
Past context races
Round 1 (2023): Anthropic's Claude 100K, OpenAI's GPT-4 Turbo 128K. Round 2 (2024): Gemini 1.5 1M, Anthropic 200K, OpenAI 128K → 256K. Round 3 (now): Google 2M without a price hike.
The pattern is clear — context length isn't a durable moat; "double at the same price" is.
Counter-moves
OpenAI's response is GPT-5.4: differentiate on multi-step autonomy (OSWorld-V 75%), not context length.
Anthropic with Claude Sonnet 4.6 emphasizes coding/tool-use accuracy over raw context length.
Meta's rumored Llama 5 line will likely answer with "open weights + 1M tokens" — competing on self-hosting, not price.
Stakes
- Wins: Google — Vertex revenue, agent-coding default, cloud AI share.
- Wins: Developers — flat price + double context = practical large-repo analysis.
- Loses: Anthropic — short-term loss in long-context coding workloads, partly offset by MCP standard control.
- Watching: OpenAI — does GPT-5.5 match 2M and at what price?
- Watching: Cloud Big 3 — AWS/Azure may answer with their own price moves.
Skeptical view
Simon Willison: "Marketing 2M tokens vs. actual recall accuracy at the tail end are different metrics — needs long-context retrieval benchmarks before declaring victory."
Yann LeCun (Meta Chief AI Scientist): "Reasoning, not token length, defines the next leap."
What changes for you
For builders — Vertex pricing makes Gemini 3.1 the default for new coding-agent stacks; keep Claude/GPT as fallbacks. The Long Context cookbook is the starting point.
For founders — domains where long context is essential (law, medicine, software) can win on Vertex alone in the short term. Build a multi-LLM abstraction layer from day one in case pricing changes.
For investors — watch GOOG Q2: Cloud growth rate is the leading indicator.
For end users — Google AI Studio and the Gemini app expose parts of the 2M experience for free. Drop a long PDF or video to test it.
3-Line Summary
- Gemini 3.1 Ultra ships with 2M-token context and a built-in code execution sandbox.
- Pricing matches the prior Pro tier — Google froze frontier pricing.
- OpenAI and Anthropic counter-cycles shorten; coding-agent defaults are in flux.
References
- Google DeepMind — Gemini 3.1 announcement
- AI Studio — Long Context Cookbook
- TechCrunch — Gemini 3.1 hands-on
- Bloomberg — Google AI strategy
- Simon Willison — Long Context notes
출처
관련 기사

Google Gemini 3.1 Ultra Ships With 2M Token Context and Native Multimodal Reasoning
Google launches Gemini 3.1 Ultra with a 2-million token context window and native multimodal reasoning across text, image, audio, and video. Benchmarks match GPT-5.4 at one-third the API cost.

Gemini 3.1 Ultra ships — 2M context, native text·image·audio·video multimodal
Google released Gemini 3.1 Ultra with a 2M-token window and reasoning trained jointly across text, image, audio, and video. Lands the same week as GPT-5.4.

Gemini Just Redefined Google Workspace — A Complete Breakdown of the Docs, Sheets, Slides, and Drive Overhaul
Google deeply integrated Gemini across Workspace. Sheets auto-fill is 9x faster, Docs draft from cross-app data, Drive gets AI search. Full specs, benchmarks, and competitive analysis.
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
