spoonai
TOPOpenAIResponses APIAgent

OpenAI Put a Terminal in Its API – From Model Company to Agent Platform

OpenAI's Responses API now includes Shell tool, hosted containers, Skills, and Context Compaction. An agent infrastructure that maintains accuracy across 5-million-token sessions.

OpenAI Responses API
Source: OpenAI

The Terminal Just Arrived

OpenAI quietly handed over something important. They extended the Responses API with direct shell terminal access. Now when you call the API, the model can type grep, fire off curl requests, process data with awk, and directly control the computer. This is different.

To understand why this matters, think back to what the API was before. Text in, text out. Function calling pushed it further, but ultimately the developer had to decide which tools the model could use. The model had guardrails. Its decision-making power was bounded.

Now it's not.

The model has hands.

From API to Agent Platform

This upgrade is more than a feature release. It's a signal that OpenAI is changing its entire business direction.

Think about OpenAI's original business model. Sell models. You pay per API call. But look at what the industry actually wants now. Not models. Agents. Autonomous systems that move without constant human guidance.

OpenAI is making a clear move here. They're not staying a "model company." They're becoming something else entirely.

The Four Pillars

Let's see what actually got added and why each one matters:

Feature What It Does The Shift
Shell Tool Command-line access (grep, curl, awk, pipes) Model controls computer directly
Hosted Containers Debian 12 with Python 3.11, Node.js 22, Java 17, Go 1.23, Ruby 3.1 Code execution and data processing in one place
Context Compaction 5-million-token sessions without accuracy loss Agents that run for hours or days
Agent Skills Standardized, reusable agent capabilities Building a real agent ecosystem

Each one is important. Together they're transformative.

Shell Tool: The Computer Finally Has Hands

Up until now, AI models could only make text. Function calling helped, but you still had to pre-define which tools they could touch. The model followed paths you carved for it. Limited autonomy.

Now the model gets shell access. This means:

  • Explore the file system
  • Run commands (search with grep, call APIs with curl)
  • Pipe commands together (cat data.txt | grep "pattern" | wc -l)
  • Analyze results and decide what to do next

The model can adapt to the actual environment. It can think "What tool do I need for this?" instead of picking from a fixed menu.

Look at Triple Whale's case study. They ran a 5-million-token session with 150 tool calls. The accuracy didn't drop. This happened because of Context Compaction – which we'll get to next.

Before, as tokens stacked up, you'd lose information or see errors compound. Now an agent can run for days and still be accurate. That's not minor.

Context Compaction: Remember for Longer

Context Compaction isn't just truncation. It's different.

Most APIs have a context window. Hit the limit, you truncate or compress. It's like saying "forget what we said before, just remember recent stuff." The problem: important information disappears.

OpenAI's Context Compaction doesn't summarize old context. It compresses it. Dense information without loss of precision.

What does that enable?

  • Multi-day complex data analysis workflows
  • Multi-step problem solving without context reset
  • Long-running projects where history actually matters

GPT-5.4 carries a 1-million-token context window and scored 75% on OSWorld-V benchmark. That last number is the real story. It means the model can autonomously handle real computer tasks. Not toy problems. Actual work.

Hosted Containers: A Sandbox That Actually Works

Here's the problem with giving models computer access: security. You can't just let them loose on a server.

OpenAI's answer is hosted containers. The container_auto feature spins up a Debian 12 environment with:

  • Python 3.11
  • Node.js 22
  • Java 17
  • Go 1.23
  • Ruby 3.1

The model can write and execute code in basically any language. The key is isolation. Even if a model tries to do something destructive, it stays confined to its container. One user's model can't touch another's infrastructure.

What becomes possible?

  • Data analysis: Run Python scripts directly
  • Web scraping: Deploy Node.js crawlers without infrastructure
  • System automation: Complex shell pipelines for operational tasks
  • API integration: Connect to external services without middleman code

The model went from "I can suggest code" to "I can execute code."

Agent Skills: The Foundation of an Ecosystem

OpenAI's standardization of "Skills" is a long-game move.

Today, every company builds agents differently. Your approach, their approach, everyone reinventing. Nothing gets reused. You build the same wheel repeatedly.

When OpenAI standardizes Skills, they're saying: agents can inherit the capabilities of previous agents. If one agent learned "send email," the next one doesn't rebuild it. It just inherits that skill.

This is more revolutionary than it sounds. The shift is real:

Old way: Each company develops its own agent in isolation New way: Skills become modular, composable, shareable

Think about it as building blocks. Specialized mini-agents handling specific tasks. Larger agents composing them together.

When OpenAI has 900M weekly active users, 50M+ subscribers, and $2B monthly revenue behind a standard, that standard becomes the industry standard. Full stop.

What Actually Changed?

Let's zoom out and compare the two strategies.

Old Strategy: Sell Models

  • Revenue: Per-API-call
  • OpenAI's role: Make the best model
  • Developer decides how to use it
  • OpenAI doesn't control the workflow

New Strategy: Sell Agent Platform

  • Revenue: Hosted execution, context, infrastructure
  • OpenAI's role: Provide complete infrastructure
  • Skills, containers, compaction – all included
  • OpenAI controls the ecosystem

The first is easier. Build a good model, ship an API, collect money.

The second is harder. You have to build infrastructure, set standards, create an ecosystem that others want to build within. OpenAI is doing that now.

Why Now? The Timing Question

This isn't random. Three things converged:

First, models got good enough. GPT-5.4 scoring 75% on OSWorld-V means OpenAI can't argue "our model is too limited without infrastructure." The model is powerful. Now infrastructure is the moat.

Second, competition got serious. Claude, Gemini, Llama – everyone releasing models. A good model alone doesn't win anymore. You need more.

Third, developers changed what they want. They stopped asking "can I use this model?" They started asking "what can I build with this platform?"

What Actually Changes For People

Theory is fine. What's different in practice?

For developers: Building agents becomes simpler. Shell access, containers, long-running sessions – you don't manage any of that. OpenAI handles it. You focus on the logic.

For enterprises: The scope of automation expands dramatically. Until now, AI could analyze. Now AI can execute. Code review, data pipelines, system monitoring – these become candidates for autonomous agents.

For OpenAI: Higher-value sales. Selling per-API-call has a ceiling. Selling "hosted agent execution" per unit-of-work has no ceiling. With 900M weekly actives and 50M+ subscribers, this is a path to exponential growth.

The Risks Nobody's Talking About

Not everything here is upside.

Security concerns are real. Models having direct computer access is powerful and dangerous. Containers help, but they're not bulletproof. A compromised model could theoretically cause damage within its sandbox. OpenAI's betting isolation is enough, but that's an assumption.

Cost is another one. Context Compaction lets you run long sessions, but 5-million-token sessions won't be cheap. Not every developer can afford that. Early pricing could lock out the smaller players.

And then there's the reliability question. We still don't fully trust models doing autonomous actions. A model could execute something unintended. Make unexpected choices. Introduce bugs nobody anticipated. For many organizations, trusting an agent to run unsupervised is still a big ask.

The Bigger Picture: Platform Wars Have Started

What OpenAI did here is simple on the surface. Profound underneath.

Model company to platform company. Per-call sales to infrastructure sales. Developer tools to agent ecosystem.

If this works, OpenAI isn't just "a company with good models." They become the foundation layer. Like AWS for cloud. Every AI-powered automation runs on OpenAI's infrastructure. That's the endgame.

Competitors will follow. Google, Anthropic, Meta – they'll all build similar infrastructure. But the first mover usually owns the standard. If OpenAI defined what Skills look like, that definition likely sticks industry-wide.

The Real Significance

Step back. This isn't about new features. This is a reshape of the AI industry's economics.

When the model was the product, differentiation was straightforward: make better models. Now the infrastructure is the product. Context handling, execution environment, skill composition – these become the competitive advantages.

OpenAI is signaling they're betting on infrastructure dominance. The model will keep improving (it always does). But winning won't be about having the best GPT next year. It'll be about having the platform everyone builds on.

That's a much harder position to dislodge. And much more valuable.

References

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.