TOPHuaweiAI ChipChina

Huawei Ascend 950PR — 2.8x H20 FP4, and ByteDance + Alibaba Are Already Stockpiling It

1.56 PFLOPS FP4, 112GB HiBL 1.0 HBM, $6,900 per card. Made on SMIC N+2, targeting 750K units shipped in 2026. What US sanctions accidentally built.

2026년 4월 13일 (월)·8분 소요

Huawei Ascend 950PR AI inference chip hero image — Source: Huawei

$6,900 per card, and Huawei claims 2.8x an NVIDIA H20

Unveiled at China Partner Conference on March 20, the Ascend 950PR has quickly become the biggest AI chip story of Q2 2026. Single-card 1.56 PFLOPS FP4, 1 PFLOPS FP8, 112 GB memory. DDR version prices at about 50,000 yuan ($6,900), HBM version at 70,000 yuan ($9,600). Against NVIDIA's H20 — the other chip in its class — Huawei claims 2.8x FP4 compute.

One more number. On March 27, Reuters reported ByteDance and Alibaba had placed bulk orders. ByteDance's total commitment to Huawei silicon reportedly reaches $5.6B.

SMIC is producing it on the N+2 process (roughly equivalent to 7nm), with a 750,000 unit target for 2026. If that ships, China's domestic AI inference market is effectively running without NVIDIA.

What this actually is — three differentiators

The Ascend 950PR is "the Chinese-made AI inference chip you can use when you can't buy an NVIDIA H20." Three things define the positioning.

First, FP4-native architecture. The 950PR is the first Chinese AI accelerator to support FP4 inference at scale. FP4 is one precision step below the FP8/BF16 that H20 is optimized for, and it's become the preferred format for recent model releases — DeepSeek V3/V4, Qwen3, GLM-4 — because it halves memory usage with little accuracy loss. Huawei timed this chip for that shift.

Second, HiBL 1.0 — Huawei's own HBM. SK Hynix, Samsung, and Micron's HBM is increasingly restricted for China delivery. Huawei responded with HiBL (High-Bandwidth Low-power) 1.0, in-house HBM shipping at 112 GB capacity and 1.4 TB/s bandwidth. NVIDIA H20 runs 4.0 TB/s, so bandwidth is still a weakness. But raw capacity — 112 GB — exceeds H20's 96 GB.

Third, CUDA-compatible CANN Next. Huawei's CANN Next SDK mirrors CUDA's thread block, warp, and kernel launch semantics. About 80% of standard PyTorch inference code runs with config changes only, no rewrites. This software portability is the main reason ByteDance and Alibaba moved fast.

Huawei headquarters in Bantian, Shenzhen — center of the Ascend 950PR design and production ecosystem Source: commons.wikimedia.org · CC BY-SA 3.0

Core specs — lined up against NVIDIA H20 and H100

The competitor Huawei actually targets is the H20 — the China-spec, performance-reduced NVIDIA chip. The wins and losses are specific.

Metric	Huawei Ascend 950PR	NVIDIA H20	NVIDIA H100
FP4 compute	1.56 PFLOPS	~0.56 PFLOPS	N/A
FP8 compute	1 PFLOPS	1.56 PFLOPS	3.96 PFLOPS
Memory capacity	112 GB HiBL 1.0	96 GB HBM3	80 GB HBM3
Memory bandwidth	1.4 TB/s	4.0 TB/s	3.35 TB/s
Interconnect	LingQu 2.0 TB/s	NVLink 900 GB/s	NVLink 900 GB/s
TDP	600 W	400 W	700 W
Process	SMIC N+2 (7nm class)	TSMC 4N (5nm class)	TSMC 4N (5nm class)
Price per card	$6,900–$9,600	~$12,000	~$30,000
China legal to sell?	✓	✗ (further restricted)	✗

Huawei wins on FP4, memory capacity, interconnect, and price. It loses on bandwidth and process density. But inside China, there's no legal way to buy H20 or H100 at volume — so the relevant question shifts from "does it win" to "can you actually get it."

Feature breakdown

Atlas 350 card and LingQu fabric

The 950PR chip ships on the Atlas 350 accelerator card. 600W TDP — lower than H100's 700W, higher than H20's 400W. Data centers can plan around an H100-class power envelope. For scale-out, Huawei built LingQu, an in-house interconnect at 2.0 TB/s. Nominally that beats NVLink's 900 GB/s, but the NVSwitch-scale fabric for 256-GPU rack-level networking is still something Huawei hasn't fully matched.

CANN Next and CUDA portability

The software stack is what matters for engineers. CANN Next exposes thread block, warp, and kernel launch primitives that map closely onto CUDA. PyTorch, vLLM, and TensorRT-LLM backend plugins are rolling out fast. Huawei publicly pushes MindSpore, but ByteDance's production benchmarks are reportedly running on PyTorch. The "80% portable" figure translates to "the other 20% is CUDA-specific kernel code that has to be rewritten" — and engineers note that 20% is where 80% of LLM throughput lives.

AI datacenter server racks — the kind of infrastructure where ByteDance and Alibaba will deploy Ascend 950PR at scale Source: commons.wikimedia.org · CC BY 2.0

Pricing + launch timeline

Item	Date / condition
Official unveil	2026-03-20, China Partner Conference
Mass production starts	April 2026 ("next month" per reports)
Volume shipments	2H 2026
2026 unit target	750,000 cards
DDR version price	50,000 yuan ($6,900)
HBM version price	70,000 yuan ($9,600)
Sample delivery history	Jan 2026: ByteDance, Alibaba received
Roadmap	950DT → 951 → 960 → 970 (sequential release)

Reuters reports ByteDance and Alibaba received engineering samples in January 2026 and ran production-grade inference benchmarks. The March unveil was a formal launch of a product already validated at customer sites.

Who this is for

ByteDance, Alibaba, Tencent, Baidu, and other Chinese hyperscalers: They are the target. US export controls block large-scale H100/H200/B200 buys, and Huawei is the only credible domestic alternative at scale. ByteDance's $5.6B commitment means TikTok and Douyin's recommendation models and the Doubao LLM will increasingly run on Huawei silicon.

Mid-size Chinese AI startups: With H20 grey-market prices running $25,000-$35,000, a $6,900 alternative is real. The chip is optimized for exactly the kind of FP4-friendly models Chinese startups deploy — DeepSeek R1/V4, Qwen3 32B-72B.

Developers in the US, Europe, and Korea: You can't buy one directly. Indirect access only, via rented instances on Alibaba Cloud, Tencent Cloud, or Huawei Cloud. But as a benchmark reference, the 950PR's public spec sheet is the most detailed look in years at how close China's 7nm fabrication has gotten.

Competitive response and market position

NVIDIA has not commented publicly as of April 15, but leaked internal memos suggest an H20 successor called B20 is being fast-tracked for the China market. B20 would scale H20's performance down further to stay below US export control thresholds.

AMD, Broadcom, and other Western chipmakers have effectively ceded China. AMD MI300X is export-restricted. Broadcom is focused on Google, Meta, and other US hyperscalers.

China's AI chip market isn't "when can we buy NVIDIA again." It's "how fast can we internalize Huawei."

Other Chinese AI chip startups — Cambricon, Hygon, Biren — don't have Huawei's scale or software ecosystem. Huawei is settling into the de facto standard position for Chinese AI infrastructure.

The bigger picture — sanctions are building a separate ecosystem

The US Commerce Department (BIS) has been phase-restricting China exports since 2022 — H100 first, then H800/A800, then H20 follow-ons. Korea, Japan, and the Netherlands have also restricted ASML EUV equipment exports. After five years of these measures, China has accelerated into full-stack self-sufficiency.

SMIC's N+2 process (7nm-class) demonstrated it could yield 7-billion-transistor AI accelerators when Kirin 9000S shipped in the Mate 60 Pro in 2023. Ascend 950PR confirms that same process can scale to mass-production AI accelerators just 18 months later. During the same window TSMC moved from 3nm to 2nm — so the node gap actually widened. But the 950PR's existence shows node density isn't decisive for defending the Chinese domestic market.

Geopolitically, the dual system is now visible. NVIDIA and AMD set the standard in the West. Huawei sets it in China. The two ecosystems are splitting at every layer — software stack (CUDA vs CANN), memory (HBM3e vs HiBL), interconnect (NVLink vs LingQu). The longer this divergence runs, the more expensive it becomes to reunify.

So what actually changes

NVIDIA shareholders and US policymakers: The thesis "sanctions slow China down" is fraying. Whether the B20 successor defends China market share is the next big question. If Commerce restricts B20 too, the irony is that it would hand Huawei a clean monopoly — the opposite of the intended outcome.

AI companies outside China: You can't use Huawei directly, but Chinese companies running LLM infrastructure much cheaper creates competitive pressure. Expect a repeat of the late-2024 DeepSeek moment — R1 matching OpenAI o1 at a fraction of the training cost. Accelerated Chinese open-weight releases translate directly into global price pressure.

Western engineers in practice: Chinese models on Hugging Face — Qwen3, DeepSeek V4, GLM-4 — are increasingly high-quality. The fact that they're trained and served on Huawei silicon raises governance questions. Enterprise RAG and fine-tuning pipelines that use these weights need a longer risk checklist, not a shorter one.

Korean semiconductor industry: SK Hynix and Samsung HBM exports to China are already restricted. Huawei's move to in-house HiBL means the Chinese market is a permanently lost HBM customer. Nvidia, AMD, and Google TPU HBM demand remains, so short-term impact is contained. Mid-term, "Chinese AI demand = inaccessible" has to become the planning assumption.

References

Frequently Asked Questions

What is the article "Huawei Ascend 950PR — 2.8x H20 FP4, and ByteDance + Alibaba Are Already Stockpiling It" about?

1.56 PFLOPS FP4, 112GB HiBL 1.0 HBM, $6,900 per card. Made on SMIC N+2, targeting 750K units shipped in 2026. What US sanctions accidentally built.

Why is this news important?

Unveiled at China Partner Conference on March 20, the Ascend 950PR has quickly become the biggest AI chip story of Q2 2026. Single-card 1.56 PFLOPS FP4, 1 PFLOPS FP8, 112 GB memory. DDR version prices at about 50,000 yuan ($6,900), HBM version at 70,000 yuan ($9,600). Against NVIDIA's H20 — the othe

Which companies or organizations are mentioned in this article?

The key entities covered in this article include Huawei, AI Chip, China, Product Launch, Ascend.

When was this article published?

This article was published on 2026-04-13 by spoonai.

What is the original source of this article?

The original source is CNBC (Reuters) (https://www.cnbc.com/2026/03/27/bytedance-alibaba-planning-to-order-huaweis-new-ai-chip-reuters.html).

What are the main topics covered in this article?

This article covers: $6,900 per card, and Huawei claims 2.8x an NVIDIA H20, What this actually is — three differentiators, Core specs — lined up against NVIDIA H20 and H100, Feature breakdown, Pricing + launch timeline.

Huawei Ascend 950PR — 2.8x H20 FP4, and ByteDance + Alibaba Are Already Stockpiling It

$6,900 per card, and Huawei claims 2.8x an NVIDIA H20

What this actually is — three differentiators

Core specs — lined up against NVIDIA H20 and H100

Feature breakdown

Atlas 350 card and LingQu fabric

CANN Next and CUDA portability

Pricing + launch timeline

Who this is for

Competitive response and market position

The bigger picture — sanctions are building a separate ecosystem

So what actually changes

References

Frequently Asked Questions

출처

관련 기사

Huawei's 950PR Is China's Bet on Inference-Only Silicon

DeepSeek Is Building Its Own Chip — Because It Doesn't Fully Trust Nvidia or Huawei

The AI Chip Supply War — Tesla, ASML, Huawei, FluidStack Moved in One Week

$6,900 per card, and Huawei claims 2.8x an NVIDIA H20

What this actually is — three differentiators

Core specs — lined up against NVIDIA H20 and H100

Feature breakdown

Atlas 350 card and LingQu fabric

CANN Next and CUDA portability

Pricing + launch timeline

Who this is for

Competitive response and market position

The bigger picture — sanctions are building a separate ecosystem

So what actually changes

References

Frequently Asked Questions

출처

관련 기사

Huawei's 950PR Is China's Bet on Inference-Only Silicon

DeepSeek Is Building Its Own Chip — Because It Doesn't Fully Trust Nvidia or Huawei

The AI Chip Supply War — Tesla, ASML, Huawei, FluidStack Moved in One Week

AI 트렌드를 앞서가세요