AWS's AI chip backlog hit $225B — and that's why Marvell doubled in 2026
Per May 11 reporting, Amazon's AI chip backlog is $225B. Custom silicon — Graviton, Trainium2/3, and Inferentia — is winning both internal AWS workloads and external customers. Design partner Marvell has doubled in 2026 since the December 2024 five-year extension. Trainium2 saw the fastest ramp in AWS history; Trainium3 is essentially sold out by mid-2026.

$225B — the single number turning AWS into a chip business
Here's the deal. The Motley Fool's May 11 read on Amazon AWS pegs the AI chip backlog at $225B. Two reasons that's a big number. First, it isn't AWS cloud revenue backlog — it's the slice of cloud backlog tied to AI accelerators and custom silicon. Second, the comparable category one year ago was $80B. The number tripled in twelve months.
The custom silicon lineup: (1) Graviton — Arm-based general-purpose CPU, now in its fifth generation, (2) Trainium 2/3 — AI training accelerators, (3) Inferentia — inference accelerator. Together they account for ~35% of AWS's total compute instances. Five years ago that share was near zero — the fastest in-house silicon transition by any hyperscaler. More importantly, Trainium 2 and 3 are pulling Anthropic's Project Rainier (announced 2024) into a single 1GW cluster that powers more than half of Anthropic's Claude training infrastructure.
The direct beneficiary is design partner Marvell Technology. After the December 2024 five-year design partnership extension with AWS, Marvell — down ~10% in 2025 — has doubled (+100%) so far in 2026. Trainium-series ASIC design IP and next-generation optical interconnects are Marvell's next growth engine. In the first four months of 2026, Marvell's data center segment revenue jumped +95% YoY, and backlog is at an all-time high.
Each player — Amazon, AWS, Marvell, Anthropic, Nvidia
Amazon (AWS). Q1 2026 revenue near the high $30Bs (annualizing $150B+), operating margin around 38%. AI workloads are ~25% of that, and roughly half of those workloads run on custom silicon. So AWS already runs about half its AI infrastructure on its own chips. CEO Andy Jassy on the Q1 call: 2026 AI capex is $130B, of which $40B is custom silicon. Trainium3 has bottom-of-the-curve orders for 500K-class chip volumes from key customers.
Trainium series. Trainium2 was announced at re:Invent 2024, mass production from Q1 2025. AWS describes it as "the fastest ramp in AWS history" — driven by (1) Anthropic's 400K Trainium2 commitment via Project Rainier, (2) AWS's own ML workloads (Alexa, Bedrock, SageMaker) migrating fast, (3) reports that part of Apple Foundation Model training runs on Trainium2. Trainium3 was announced in December 2024 with H2 2026 mass shipments — and H2 inventory was effectively pre-committed at the announcement.
Marvell Technology. AWS's core ASIC design partner for Trainium. Supplies parts of Trainium chip design IP, optical transceivers, and HBM controllers. The December 2024 five-year extension is effectively a co-engineering commitment to the next-gen AWS chip roadmap. Q1 2026 data center revenue: $2.5B, an all-time high (+95% YoY). Market cap is in the $130B range as of late April — doubling in months.
Anthropic. AWS's largest external Trainium customer. Project Rainier (announced 2024) is a 1GW-scale Trainium cluster on a single site for next-gen Claude training. Even with the May 2026 Anthropic-SpaceX Colossus 1 compute deal, Trainium dependence persists. A single model maker now accounts for ~30% of a single chip company's roadmap — an unusual structure.
Nvidia. Direct competitor. AWS remains one of Nvidia's biggest hyperscaler GPU customers, but with Trainium share rising fast, AWS has shifted to a "Nvidia + Trainium" dual-sourcing posture. From Nvidia's view, AWS is one of its biggest single customers — and one of its biggest threats. Trainium is the first hyperscaler in-house chip materially eating Nvidia GPU share.
Substance — backlog composition, capex, Marvell upside
$225B backlog composition. Simplified: (1) Anthropic multi-year commitment ~$80B, (2) Apple AI infrastructure ~$40B (Apple Foundation Model training), (3) enterprise customer aggregate ~$50B (Pfizer, Roche, JPMorgan, etc.), (4) government and defense commitment ~$30B, (5) other ~$25B. These are committed forward revenue lines — recognized over 3–5 years.
Capex. AWS guided 2026 capex above $130B. Of that, ~$40B is custom silicon (chip manufacturing + data center integration). Vs. Microsoft Azure's $100B and Google Cloud's $80B for the year, Amazon is #1 in absolute size. With cumulative capex above $400B over five years, AWS's hyperscaler infrastructure lead looks structurally locked.
| Item | Value |
|---|---|
| AWS AI chip backlog (May 11, 2026) | $225B |
| Same category 1 year ago | $80B |
| 2026 AWS total capex | $130B+ |
| Custom silicon capex | $40B |
| Marvell 2026 YTD stock | +100% |
| Marvell data center YoY | +95% |
Trainium pricing and margin. Trainium2 internal AWS BOM is estimated at $4,000–$5,000 per chip, rented as instances at $20–$25 per hour. Comparable Nvidia H200 rents at $35–$40 per hour — Trainium is ~35–40% cheaper. That gap is what's pulling Anthropic and Apple in.
How Marvell wins. Marvell collects (1) Trainium ASIC design license fees plus per-chip royalty, (2) optical transceivers — interconnect across Trainium clusters, (3) HBM controller IP. Of Marvell's 2026 data center guide of $10B, roughly half is AWS-related. So a single customer drives ~25% of Marvell revenue — deep lock-in.
Who gets what
Amazon's win. First, capex efficiency. Custom silicon means AWS captures Nvidia's margin (60–70%) for itself. At the same compute unit cost, that translates to 30–40% lower instance prices. Second, lock-in — once Anthropic trains on Trainium, weights and infra are tied to AWS. Third, leverage — "we'll just expand custom silicon" becomes a real card in Nvidia negotiations.
Anthropic — gain and dependence. Gain: ~35–40% cheaper compute than equivalent Nvidia GPUs, improving training cost structure ~30% as a single model maker. Dependence: AWS becomes Anthropic's largest infrastructure and capital partner. Effectively the "AWS-Anthropic" stack becomes one technology fabric — Anthropic's biggest structural risk over the next 18–24 months.
Marvell's win. Most direct. Of 2026 revenue guide of $25B, the data center $10B is the growth core. Given AWS's $225B backlog, future design wins are essentially booked over the next 3–5 years. The market is already pricing in a $130B → $200B path.
Nvidia — pain and hedge. Pain: clear AWS share erosion. But AWS remains a top GPU customer, and the Rubin launch could win share back. Nvidia's hedge is scaling its own DGX Cloud — bypassing hyperscalers and selling direct.
Apple's win. Putting some Foundation Model training on AWS Trainium reduces Apple's own GPU cluster capex. Independent of the AI Extensions feature in iOS 27 (which calls Claude/Gemini), Apple's own training appears to dual-source between AWS and Google.
Other hyperscalers' loss. As AWS custom silicon share rises, Microsoft Azure and Google Cloud face pressure to accelerate Azure Maia and Google TPU. Microsoft's Maia ramp is behind schedule; Google's TPU v6/v7 progresses well but external customer expansion (beyond Anthropic, Salesforce) is slow.
Past parallels — wins and losses
Win: Apple Silicon transition (2020–2024). Apple migrated from Intel to M-series over four years. Result: -40% CPU BOM share in laptops, +5pp margin improvement, 30–50% performance gain. Trainium economics map to this. Difference: Apple did consumer; AWS is doing data center — bigger scale, bigger margin lift.
Win: Google TPU evolution (2016–present). Google launched TPU v1 in 2016 and is now at v7. Cut Nvidia GPU capex significantly by using TPU internally; from 2024 began external TPU sales (Anthropic primary training partial, Salesforce, Adobe). AWS Trainium follows the playbook with a larger external customer base from the start.
Loss: Microsoft custom ARM server chips (2017–2019). Microsoft tried Cavium ThunderX-based ARM servers, hit efficiency issues, paused. Eventually shipped Cobalt 100 in 2024. AWS rapidly iterated Graviton through 5 generations while Microsoft stalled — the biggest gap in hyperscaler chip strategy.
Loss: HP Itanium bet (2001–2017). HP-Intel co-developed Itanium that ultimately died. Cause: failure to win external ISV ecosystem. AWS Trainium's risk: PyTorch/CUDA compatibility — Neuron SDK supports PyTorch but porting from CUDA still has friction. If unresolved, mid-tier customer expansion stalls beyond anchors like Anthropic.
Competitor counter-plays
Nvidia. Rubin (H2 2026) + DGX Cloud expansion. If Rubin delivers 3–4x performance over H100/H200, unit-cost efficiency closes the gap with Trainium. DGX Cloud lets Nvidia bypass hyperscalers and sell direct.
Microsoft Azure. Accelerate Azure Maia v2 ramp. Pulling Anthropic Trainium dependence to Microsoft is implausible (49% OpenAI stake), but Microsoft is sketching Maia as the core OpenAI training chip. Maia v3 ramps in 2027.
Google Cloud. TPU v7 + Vertex AI bundles for external customer expansion. Anthropic's GTC mention of "TPU + Trainium multi-sourcing" is a positive Google signal. A new multi-year Salesforce deal is reportedly booked.
AMD. MI400 series ramps alongside Nvidia Rubin. AWS already deploys some AMD GPUs and AMD is becoming a hyperscaler standard option. Not directly competitive with Trainium, but enables "Nvidia + Trainium + AMD" triple-sourcing.
Tenstorrent / Cerebras / Groq. Startup side. Given AWS backlog scale, some workloads may spill over. Groq's inference speed and Cerebras's wafer-scale chips can take niches Trainium doesn't address.
So what changes — by persona
ML engineer. Learning AWS Neuron (Trainium SDK) is the safest bet for the next 2–3 years. PyTorch compatibility exists, but advanced optimizations need Trainium-specific code. Single-CUDA betting is over — welcome to "CUDA + Neuron + TPU XLA" multi-backend.
Startup founder. Using Trainium instances on AWS Bedrock saves 30–40% on training cost. But you don't run pure PyTorch — Neuron compilation adds a 1–2 month migration. Worth evaluating for any model house with 24+ month training horizons.
Investor. Hyperscaler-chip-design-partner equities (Marvell, Broadcom, Astera Labs) are direct beneficiaries. Marvell is already up +100% with valuation tension, but the $225B AWS backlog leaves more upside. Nvidia near-term impact is limited; long-term ceiling depends on how much hyperscaler in-house silicon takes.
Cloud customer (enterprise). Calling Claude through AWS Bedrock means you're running on Trainium. Compute efficiency at the same token price means token prices likely decline 10–20% over the next 12 months. AWS-Anthropic becomes the most price-competitive option.
Regulator. Hyperscaler vertical integration is accelerating. AWS-Anthropic combines (1) compute + model + data in one company, (2) raises entry barriers for other model labs. The US FTC and EU Commission are already examining the Microsoft-OpenAI relationship; AWS-Anthropic likely enters the same scope.
References
- The Motley Fool: Amazon's AI Chip Backlog Stands at a Massive $225 Billion
- The Motley Fool: Amazon's $20 Billion Chip Business Raises a Big Question for Investors
- Marvell Technology: Q4 FY2026 Earnings Press Release
- AWS: Announcing Trainium3 and the next generation of AWS AI infrastructure
- Anthropic: Project Rainier — building the world's largest AI supercomputer with AWS
출처
관련 기사

Amazon's Custom Silicon Reaches ~$50B Standalone Run Rate; Trainium2 Sold Out, Trainium3 Nearly Subscribed

Amazon Rolls Out Health AI to All US Customers — Free Virtual Care for 200M Prime Members

Amazon Just Poured Another $25B Into Anthropic — And Wired 5GW of Trainium to Their Back
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
