TOPFuriosaAINPUKorea AI

FuriosaAI's RNGD Goes Live With 4,000 Units

Q: Which companies or organizations are mentioned in this article?

The key entities covered in this article include FuriosaAI, NPU, Korea AI, Inference, Hardware.

Q: When was this article published?

This article was published on 2026-04-05 by spoonai.

Q: What are the main topics covered in this article?

This article covers: 4,000 units, Here's the deal, The breakdown, The bigger picture, What actually changes.

Korean AI chip startup FuriosaAI put 4,000 units of its second-gen RNGD NPU into commercial deployment — the first time Korean-designed AI silicon runs real production inference at scale.

2026년 4월 5일 (일)·5분 소요·

4,000 units

FuriosaAI secured an initial 4,000-unit shipment of its second-generation NPU, the RNGD ("Renegade"), and moved into commercial deployment. For Korean-designed AI silicon, this is the first time a fabless startup chip runs real serving infrastructure rather than pilot workloads.

Here's the deal

FuriosaAI launched in 2017 in Seoul. Its founder June Paik came out of Samsung and AMD, where he worked as a chip architect. The first product, Warboy, targeted vision inference. Last year, Furiosa pivoted to LLM inference with the second-generation RNGD.

The name "Renegade" is not accidental. The company framed the chip internally as "The Renegade Bet" — a direct challenge to Nvidia's CUDA ecosystem. Furiosa raised a $115M Series C in 2023, and RNGD is being produced on a 4nm/5nm process through a mix of TSMC and Samsung Foundry.

When	Milestone
2017	FuriosaAI founded
2021	First-gen Warboy launches (vision inference)
2023	Series C $115M, RNGD announced
2024	RNGD engineering samples distributed
Q3 2025	LG AI Research Exaone serving tests
Q4 2025	RNGD enters mass production
Q2 2026	4,000 units enter commercial deployment

Korea has a dozen AI chip startups in various stages. Only a handful have reached real deployment. Rebellions, backed by KT and Saudi capital, is preparing datacenter rollouts. FuriosaAI is the first to ship RNGD into paying production.

The breakdown

Where RNGD sits technically

RNGD is inference-only, not a training chip. The primary target is serving LLMs in the 70B to 120B parameter range. Public specs are still partial, but the rough shape:

Metric	RNGD	Reference: H100 SXM
Process	TSMC 5nm (some reports say Samsung 4nm)	TSMC 4nm
Peak compute	~256 TFLOPS BF16	~989 TFLOPS BF16
Memory	HBM3 48GB, ~1.5TB/s	HBM3 80GB, ~3.35TB/s
Power	~150W	~700W
Target models	70B LLM single card, 120B two-card	70B LLM single card

On peak compute, RNGD is well below H100. The bet isn't on peak — it's on performance per watt. Serving a 70B model at 150W changes the datacenter TCO calculation in a way that matters when power accounts for 30–40% of GPU infrastructure operating cost. For specific workloads, "half the speed at a fifth of the power" wins on total cost.

Who buys this

The domestic customer list is easy to guess.

Naver Cloud / NCP: HyperCLOVA X serving, with domestic stack policy tailwinds
Kakao: Kanana LLM plus in-app agent features
LG AI Research: Exaone 3 and 4 serving, existing partnership
Korean telcos: KT, SKT, LG U+ each running their own LLM projects

Internationally, Japan, the Middle East, and parts of Europe have been looking for non-Nvidia options. With US-China decoupling accelerating, "third-country" silicon suppliers have real strategic value, and Furiosa is positioned as one of a small group that fits that niche.

Software stack — the real challenge

The hardware is actually the easy part. FuriosaAI has been building its own SDK, PyTorch and Hugging Face conversion toolchains, and a serving engine that targets vLLM and TensorRT-LLM compatibility.

CUDA is 90% of Nvidia's moat.

Every AI chip startup of the last five years has heard this sentence. Benchmarks win you a pitch meeting. What kills most attempts is the moment a developer asks "does my existing PyTorch code run here with one line changed?" and the answer is "well, mostly." RNGD's commercial outcome will depend less on the chip spec and more on how friction-free Furiosa can make that first-hour developer experience.

The bigger picture

Several threads meet here.

First, the accelerating competition in inference-only silicon. Huawei's 950PR (our coverage), Google's Ironwood TPU, Meta's upcoming MTIA 450/500, and AWS Inferentia are all hunting the same growth market. RNGD is one of the few startup-scale entrants in that field — most inference silicon now comes from hyperscalers or national champions.

Second, Korea's AI semiconductor strategy. The government updated its national "K-Semiconductor" plan in 2023 to prioritize AI chips and has been running NPU cluster pilots from 2024 through 2026. Mobilint, FuriosaAI, and Rebellions are the main beneficiaries of that policy, and RNGD moving into commercial service reads as "graduated from the lab."

Third, the ongoing global story of reducing Nvidia dependency. Meta's recent layoffs and AI restructuring (our coverage) were partly a rebalancing toward in-house MTIA silicon. Every hyperscaler is now running a "Nvidia plus our own chip" hybrid. Korea needs startups to occupy that second slot domestically, and Furiosa is the first to fill it.

What actually changes

Short term, most developers won't touch RNGD directly. Instead, expect it to show up as an "LLM inference instance" option on Naver Cloud, LG cloud, and telco clouds. If Furiosa prices aggressively, HyperCLOVA X and Exaone 4 API costs could fall in the coming quarters.

Longer term, the bigger shift is ecosystem building. Once Korean AI startups have a reason to target RNGD — cost, sovereignty, or policy — community tooling, documentation, and open-source support start to accumulate. That's the actual turning point. Selling chips is hard; building an ecosystem on top of them is much harder and much more valuable.

Furiosa founder June Paik said it clearly in a recent interview:

Building AI chips in Korea isn't about beating Nvidia. It's about making sure there's one more option that runs without Nvidia.

Four thousand units is the first moment where "one more option" becomes real. The next six months of production inference will be the actual test — not the benchmark, but whether real workloads run without a dozen escape hatches to Nvidia.

FuriosaAI's RNGD Goes Live With 4,000 Units

4,000 units

Here's the deal

The breakdown

Where RNGD sits technically

Who buys this

Software stack — the real challenge

The bigger picture

What actually changes

References

출처

관련 기사

NVIDIA GTC 2026: $1 Trillion in Orders and Why AI Infrastructure Demand Won't Stop

Huawei's 950PR Is China's Bet on Inference-Only Silicon

42.5 ExaFLOPS: Google's Ironwood TPU Rewrites the Inference Playbook

4,000 units

Here's the deal

The breakdown

Where RNGD sits technically

Who buys this

Software stack — the real challenge

The bigger picture

What actually changes

References

출처

관련 기사

NVIDIA GTC 2026: $1 Trillion in Orders and Why AI Infrastructure Demand Won't Stop

Huawei's 950PR Is China's Bet on Inference-Only Silicon

42.5 ExaFLOPS: Google's Ironwood TPU Rewrites the Inference Playbook

AI 트렌드를 앞서가세요