Meta Unveils 4 Generations of MTIA Custom Chips — Building an Nvidia-Free Inference Stack
Meta announced MTIA 300/400/450/500 simultaneously. RISC-V architecture, Broadcom partnership, 6-month release cadence. Full specs, strategy, and competitive analysis.

Hundreds of Thousands of Chips Already Running in Production
Meta just announced four generations of its custom AI chip, MTIA (Meta Training and Inference Accelerator), simultaneously. Most chip companies take 2–3 years per generation. Meta is shipping four within two years. And MTIA 300 is already deployed in production — powering content recommendations across Facebook and Instagram right now.
This isn't just a chip announcement. It's Meta declaring that it's structurally reducing its Nvidia dependency. Meta currently owns one of the world's largest Nvidia GPU fleets (600,000+ H100/B200 units), and MTIA is the strategy to move AI inference workloads onto cheaper, custom silicon.
Why Big Tech Builds Custom Chips
Every major tech company is now designing custom AI chips: Google (TPU), Amazon (Trainium/Inferentia), Microsoft (Maia), Apple (M-series), and Meta (MTIA). The economics are straightforward:
| Motivation | Detail |
|---|---|
| Cost | Nvidia GPUs cost $30K–$40K each. Custom chips deliver equivalent inference at 1/3–1/5 the cost |
| Supply security | Nvidia GPU shortages have persisted for 2+ years |
| Workload optimization | Purpose-built chips are far more power-efficient for specific tasks |
| Strategic autonomy | Depending on a single supplier for core infrastructure is risky |
For Meta specifically, inference cost is the killer. Meta's AI services (Llama-based assistant, content recommendations, ad targeting) run tens of billions of inference calls daily. At that scale, even small per-inference savings translate to billions of dollars annually.
MTIA Generations — Full Spec Comparison
| Chip | Process | FP8 FLOPS | HBM Bandwidth | Partner | Status | Timeline |
|---|---|---|---|---|---|---|
| MTIA 300 | 5nm | Baseline | Baseline | Broadcom | In production | 2025 |
| MTIA 400 | 5nm+ | 400% over 300 | Same as 300 | Broadcom | Production-ready | 2026 H1 |
| MTIA 450 | 3nm | Improved over 400 | 2x over 300 | Broadcom | Taped out | 2026 H2 |
| MTIA 500 | 3nm+ | Improved over 450 | Improved over 450 | Broadcom | In design | 2027 H1 |
The 300→400 jump is the headline: 400% more FP8 FLOPS means 4x inference throughput at the same power budget. Tom's Hardware called it "directly targeting Nvidia's mid-range inference market." The 450 doubles HBM bandwidth, enabling on-chip serving of Llama 3 70B-class models.
Meta announced a 6-month release cadence — a "tick-tock" strategy reminiscent of Intel's CPU playbook. Maintaining this pace in semiconductor design is extraordinarily ambitious and relies on deep integration with Broadcom as the ASIC design partner.
RISC-V Architecture
MTIA uses the RISC-V open-source instruction set architecture rather than ARM or x86. This gives Meta freedom to add custom AI-specific instructions without licensing fees — significant when manufacturing hundreds of thousands of chips.
Meta's Inference Scale
| Metric | Value |
|---|---|
| Daily active users | ~3.2 billion (Facebook + Instagram + WhatsApp) |
| Daily inference calls | Tens of billions |
| GPU fleet | 600K+ Nvidia (H100/B200) |
| Annual AI infra spend | ~$60B+ (2026) |
| MTIA in production | Tens of thousands (MTIA 300, expanding) |
At 10 billion inference calls/day at $0.001 each, that's $10M/day or $3.65B/year. Halving that cost with MTIA saves $1.8B+ annually. This is why Meta is developing four chip generations simultaneously.
MTIA 300 currently handles: content recommendations (News Feed, Explore, Reels), ad targeting (real-time bidding), content moderation (harmful content detection), and portions of Meta AI assistant inference. Ad targeting alone drives 95%+ of Meta's revenue — moving this to MTIA directly improves operating margins.
Competitive Landscape: Big Tech Custom Silicon
| Company | Chip | Primary Use | Current Gen | Edge |
|---|---|---|---|---|
| TPU v6 (Trillium) | Training + Inference | v6 | 10+ years experience, JAX | |
| Amazon | Trainium2 / Inferentia3 | Training / Inference | Gen 3 | AWS customer base |
| Microsoft | Maia 100 | Inference | Gen 1 | Azure + OpenAI integration |
| Meta | MTIA 300–500 | Inference (priority) | 300 in production | Llama-optimized, internal only |
Meta's differentiator: MTIA is never sold externally. It's used exclusively for Meta's own services, which means chip design can be 100% optimized for Meta's specific workloads.
The Nvidia Relationship
Meta isn't abandoning Nvidia. The split is strategic: training (Llama 4, 5 pre-training) stays on Nvidia GPUs. Inference (serving billions of users) moves to MTIA. Training needs general-purpose high performance; inference benefits from workload-specific optimization. CNBC reported Meta emphasizes the "complementary relationship" between MTIA and Nvidia hardware.
Why It Matters
Meta's simultaneous four-generation MTIA reveal signals that custom silicon is now a core competency for competing in AI — not just software (models), but hardware (chips). The 6-month cadence, RISC-V architecture, Broadcom partnership, and production deployment of MTIA 300 demonstrate serious execution capability.
For the broader industry, this accelerates the shift away from Nvidia's monopoly on AI compute. As more workloads move to custom silicon, the AI inference chip market becomes increasingly fragmented — and Nvidia's pricing power erodes. The age of Nvidia GPU monopoly in inference is ending. For developers, this means writing hardware-portable code through frameworks like PyTorch and ONNX becomes increasingly important as the chip landscape diversifies.
References
관련 기사

Nexthop AI Raises $500M at $4.2B Valuation — Building the Networking Layer for AI Data Centers
Former Arista COO's Nexthop AI raises $500M Series B. Three new AI-optimized switches, disaggregated spine architecture, and the AI networking market explained.

Meta Llama 4 Scout: 10M Token Context and Open-Source's Arrival at GPT-4 Territory
Meta's 17B active parameter MoE model achieves a 10M token context window, runs on a single H100, and outperforms Gemma 3 and Gemini 2.0 Flash-Lite on major benchmarks. Here's what changes.

Meta Cuts 700 Jobs, Pours $167B Into AI – The Metaverse Era Is Over
Meta lays off 700 from Reality Labs and other teams while planning $135B in AI infrastructure spending. Since 2022, Zuckerberg has cut 25,000 jobs. The biggest AI pivot in tech history.
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
