AI Chips

DeepSeek V4 Ditches Nvidia, Will Run Entirely on Huawei Chips

China's biggest tech firms ordered hundreds of thousands of Huawei Ascend 950PR chips to run DeepSeek's next model.

Liza Chan
Liza ChanAI & Emerging Tech Correspondent
April 4, 20264 min read
Share:
Close-up of AI server hardware with green circuit board traces and Chinese semiconductor branding

DeepSeek's upcoming V4 model will run exclusively on Huawei-designed chips, according to a report from The Information published Friday. Alibaba, ByteDance, and Tencent have placed bulk orders for hundreds of thousands of Huawei's Ascend 950PR accelerators ahead of the launch, which is expected within weeks. Neither DeepSeek nor Huawei responded to requests for comment.

The Hangzhou-based lab spent months working with Huawei and chip designer Cambricon Technologies to rewrite portions of V4's code for compatibility with Chinese-made hardware. Nvidia was shut out of the process entirely, receiving no early access to the model for optimization, a break from how these things normally work before a major release. Reuters had reported this exclusion earlier in the year.

The chip that makes it possible

Huawei's Ascend 950PR debuted at the company's China Partner Conference on March 20, packaged inside the Atlas 350 accelerator card. On paper, it looks competitive: 1.56 petaflops of FP4 compute, 112GB of proprietary HBM, and what Huawei claims is 2.87 times the single-card performance of Nvidia's H20. Zhang Dixuan, head of Huawei's Ascend computing business, provided those figures at the conference.

But the H20 is Nvidia's restricted, China-compliant chip, not its best hardware. Comparing against it is a bit like bragging about outrunning someone with a limp. The 950PR still trails the H200, according to The Decoder, and the previous-generation Ascend 910C delivered roughly 60% of the H100's inference performance by DeepSeek's own internal assessment.

There's also the power draw. The Atlas 350 pulls 600 watts, about 1.5 times the H20's consumption. Data center operators will need to weigh throughput gains against rack-level power budgets, a tradeoff that doesn't show up in Huawei's marketing slides.

Why this matters more than V3 did

DeepSeek's previous models, V3 and R1, triggered a massive selloff in January 2025, wiping nearly $589 billion from Nvidia's market cap in a single day. Those models, though, were trained on Nvidia's own H800 chips. The irony wasn't lost on anyone: China's biggest AI challenge to Silicon Valley ran on American hardware.

V4 removes that irony. A trillion-parameter model running entirely on domestic silicon would be the clearest evidence yet that U.S. export controls are driving Chinese self-sufficiency rather than preventing it. V4 reportedly uses a mixture-of-experts architecture with roughly one trillion total parameters but only activates about 37 billion per token, keeping inference costs manageable despite the model's size.

Two additional V4 variants are also in development, each optimized for different tasks and built for Chinese chips, according to The Information's sources.

So does the hardware actually work?

This is the question nobody can answer yet. The bulk orders from Alibaba, ByteDance, and Tencent suggest confidence, or at least a willingness to bet on domestic supply chains after years of uncertainty around Nvidia export licenses. The surge in demand has already pushed Ascend chip prices up by 20%, per The Decoder's reporting.

But confidence and performance are different things. Huawei's real breakthrough with the 950PR may be less about raw compute and more about software. The chip's updated CANN framework now offers better compatibility with Nvidia's CUDA programming model, according to Wccftech, which lowers migration costs for developers who've spent years building on Nvidia's stack. ByteDance and Alibaba have reportedly started placing orders partly because of this improved compatibility.

Running a frontier model is different from running benchmarks on a trade show floor. DeepSeek's engineers rewrote code to match the Ascend's Da Vinci core architecture, which suggests the port wasn't trivial. Whether V4 performs comparably to what it could do on Nvidia's best hardware remains an open question, and one DeepSeek is unlikely to answer directly.

What comes next

Huawei plans to produce around 600,000 Ascend 910C chips in 2026, roughly double the previous year's output, with total Ascend die production reaching up to 1.6 million units, Bloomberg reported last year. The Ascend 950DT, targeting training workloads, is scheduled for the fourth quarter of 2026.

V4's launch will be closely watched for benchmark results against Western frontier models. Internal claims put it above 80% on SWE-bench Verified and 90% on HumanEval, though no independent evaluation exists. If those numbers hold up on Huawei hardware at production scale, the argument that export controls are containing China's AI capabilities gets a lot harder to make.

Tags:DeepSeekHuaweiAscend 950PRAI chipsNvidiaChina AIexport controlsmixture of expertsAlibabaByteDance
Liza Chan

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

DeepSeek V4 Drops Nvidia, Runs Entirely on Huawei Chips | aiHola