spoonai
TOPLLMOpenAIGPT-5.4

GPT-5.4 Deep Dive — The First General-Purpose Model That Actually Uses Your Computer

OpenAI released GPT-5.4 with 1M token context, native Computer Use achieving 75% on OSWorld (surpassing humans), and a full model family. Complete specs, benchmarks, and competitive analysis.

·5분 소요·
공유
GPT-5.4 Computer Use demonstration
Image: OpenAI

From "AI That Answers" to "AI That Acts"

On March 5, OpenAI released GPT-5.4 — the first general-purpose AI model with native Computer Use capabilities. Not a separate agent product. Not an experimental beta. Computer Use is built into the model itself, available through both the API and ChatGPT.

GPT-5.4 scores 75.0% on OSWorld-Verified (surpassing human experts at 72.4%), packs a 1-million-token context window (8x the previous generation), and reduces factual errors by 33% while using 33% fewer reasoning tokens. Here's the deal: AI just crossed the line from "tool that helps you think" to "agent that does things for you."

Background: The Road to Computer Use

Timeline Model/Product What Happened
October 2024 Anthropic Claude Computer Use First Computer Use concept demo (beta)
January 2025 OpenAI Operator Separate agent product for Computer Use
March 2025 Google Project Mariner Chrome browser automation agent
March 2026 OpenAI GPT-5.4 First general-purpose model with native Computer Use

The key difference: all previous Computer Use implementations were either separate agent products or experimental betas. GPT-5.4 bakes it into the core model, available as a standard feature across API and ChatGPT.

Core Specs

Metric GPT-5.4 GPT-5.2 Improvement
Context Window 1M tokens (922K in + 128K out) 128K ~8x
OSWorld-Verified 75.0% 47.3% +27.7pp (human: 72.4%)
WebArena-Verified 67.3% Browser automation
GDPVal 83.0% Human expert level
Per-claim error reduction -33% baseline
Reasoning token usage -33% baseline

OSWorld-Verified tests whether AI can navigate real desktop environments — opening browsers, filling forms, managing files, switching between applications. Human experts score 72.4%. GPT-5.4 scores 75.0%, crossing the human threshold for the first time.

GDPVal measures performance on economically valuable tasks — emails, spreadsheets, report drafting, data cleaning. At 83.0%, GPT-5.4 has reached "can this AI actually generate economic value" territory.

The Model Family

GPT-5.4 ships as a family, not a single model:

Model Key Feature Target API Pricing (per 1M tokens)
GPT-5.4 Thinking Reasoning-first, shows plan before solving ChatGPT Plus/Team/Pro Included
GPT-5.4 Pro High-performance + Computer Use Pro/Enterprise Included
GPT-5.4 (API) Full capability Developers $3 input / $15 output
GPT-5.4 mini Fast coding/reasoning, 2x+ speed High-volume API $0.40 / $1.60
GPT-5.4 nano Ultra-lightweight, edge devices Mobile/embedded $0.10 / $0.40

The nano pricing is remarkable: $0.10 per million input tokens — cheaper than GPT-3.5 was, with performance exceeding GPT-4 on many benchmarks. Computer Use is no longer a premium-only feature.

Tool Search and Financial Plugins

GPT-5.4 introduces Tool Search — the model autonomously discovers and selects the right tool from a large set of APIs, plugins, and functions for the current task. Previously, developers had to pre-specify which tools were available.

According to VentureBeat, GPT-5.4 also ships with native financial plugins for Microsoft Excel and Google Sheets — analyzing financial data, generating charts, and building pivot tables from natural language instructions.

Competitive Landscape

Benchmark GPT-5.4 Claude 4.6 Opus Gemini 3.1 Pro
OSWorld-Verified 75.0%
BrowseComp 82.7 84.0 85.9
GDPVal 83.0%

Anthropic pioneered Computer Use with Claude in October 2024. Claude 4.6 Opus scores 84.0 on BrowseComp (vs. GPT-5.4's 82.7), excelling at pixel-level screen manipulation. But GPT-5.4 leads on OSWorld's broader desktop automation tasks.

Google focuses on browser automation through Project Mariner. Gemini 3.1 Pro tops BrowseComp at 85.9 but lags behind in general-purpose desktop Computer Use. Google's advantage is deep Workspace integration.

The Computer Use market is still early. The winning model will be the one that can reliably handle real-world business tasks — not just benchmark scenarios.

OpenAI's Current Position

GPT-5.4 arrives at a pivotal moment for OpenAI. The company's annualized revenue has surpassed $25 billion — the fastest revenue scaling in software history. For comparison, Google took 5 years and Facebook took 7 years to reach similar milestones. OpenAI achieved it in roughly 3.5 years from ChatGPT's November 2022 launch. Reports suggest OpenAI is considering an IPO by late 2026. The company remains unprofitable, but the revenue trajectory is unprecedented.

What This Means for Developers

Agent architectures are changing. The combination of 1M token context and Tool Search shifts the paradigm from "pre-define your tools" to "let the model find what it needs."

RPA is being disrupted. As Computer Use matures, significant portions of the traditional RPA (Robotic Process Automation) market — valued at $13B+ — will shift to AI agents that understand context rather than following rigid scripts.

Security becomes critical. AI controlling actual computer interfaces means prompt injection and misaligned instructions can have real-world consequences. Sandboxing and permission systems will be essential.

Costs keep falling. The mini and nano variants signal that Computer Use is democratizing rapidly. What cost hundreds of dollars per task in 2025 may cost pennies in 2027.

References

출처

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.

매일 30개+ 소스 분석 · 한국어/영어 이중 언어광고 없음 · 1-클릭 해지