AI Chips

NVIDIA GTC 2026: Jensen Huang Sees $1 Trillion in AI Chip Orders as Vera Rubin Ships

NVIDIA doubled its demand forecast and launched a full-stack AI platform. The numbers are staggering, and so are the questions.

Oliver Senti
Oliver SentiSenior AI Editor
March 17, 20266 min read
Share:
Jensen Huang presenting on stage at the SAP Center during NVIDIA GTC 2026 keynote in San Jose, California

Jensen Huang took the stage at the SAP Center in San Jose on Monday and did what he does best: made a trillion dollars sound inevitable. At NVIDIA's annual GTC developer conference, the CEO said he now sees at least $1 trillion in purchase orders for Blackwell and Vera Rubin systems through 2027, double the $500 billion figure he quoted just six months ago.

The stock briefly spiked about 4.8% on the number before settling back to a 1.65% gain. Traders apparently did what traders do: ran the math against consensus estimates and decided the headline was louder than the upside. But the figure itself is worth sitting with. A trillion dollars in confirmed demand for AI compute infrastructure from a single vendor. Even adjusted for Huang's well-documented talent for showmanship, that is a lot of GPUs.

The Rubin GPU, by the numbers

The centerpiece of GTC 2026 is the Vera Rubin NVL72, a rack-scale system pairing 72 Rubin GPUs with 36 Vera CPUs connected through NVLink 6. NVIDIA first showed the platform at CES in January, but GTC filled in the technical picture. Each Rubin GPU packs 336 billion transistors on TSMC's 3nm process, carries 288 GB of HBM4 memory, and delivers 22 TB/s of memory bandwidth per chip. That bandwidth number, nearly tripling Blackwell's 8 TB/s, is where the real story is.

NVIDIA claims 50 petaflops of FP4 inference and 35 petaflops of FP4 training per GPU, which the company frames as 5x and 3.5x improvements over Blackwell respectively. Those gains rely heavily on the NVFP4 data format, a new precision level that squeezes more throughput out of each operation. Whether 4-bit precision holds up across the range of models enterprises actually care about is a question NVIDIA is less eager to answer in detail. The rack-level numbers are even larger: 3.6 exaflops of FP4 inference, 20.7 TB of HBM4 capacity, and 260 TB/s of aggregate NVLink 6 bandwidth. Huang claimed that last figure exceeds the total bandwidth of the internet, which is the kind of comparison that sounds impressive until you realize nobody can actually verify it.

NVIDIA says it is also shipping Vera Rubin samples to cloud partners including AWS, Google Cloud, Microsoft, and Oracle, with volume availability in the second half of 2026.

Memory is the bottleneck now

Here's the thing Huang is actually saying, underneath all the product names and acronyms: the industry figured out how to train models. The hard part now is running them cheaply. Inference, not training, is where the money is, and inference is memory-bound. The jump to HBM4, with its wider interface and roughly 2.8x bandwidth increase over HBM3e, is Rubin's most consequential technical change. Everything else, the transistor count, the new tensor cores, the NVLink upgrades, exists to take advantage of that memory bandwidth.

This framing also explains the Groq 3 LPU announcement. NVIDIA unveiled its first chip from Groq, the inference-focused startup it acquired for $20 billion in December. The Groq 3 LPU is an SRAM-packed accelerator designed to sit alongside Vera Rubin racks and boost token generation rates. Huang said a Groq LPX rack, holding 256 LPUs, can increase tokens-per-watt by 35x. That's NVIDIA's own claim against its own hardware, so interpret accordingly. But the strategic logic is clear: Huang is worried about losing inference workloads to specialized chips, and buying Groq was cheaper than losing the market.

The roadmap goes deep

NVIDIA also previewed Kyber, a prototype rack architecture with 144 GPUs arranged vertically for higher density. Kyber will ship inside Vera Rubin Ultra in the second half of 2027. And the next GPU architecture after Rubin, codenamed Feynman, is slated for 2028. Three generations mapped out. NVIDIA is essentially telling customers: commit now, and we will keep the performance curve steep enough to justify the capital.

NemoClaw and the OpenClaw question

Roughly two hours into the keynote, Huang pivoted to software with the kind of conviction he usually reserves for silicon. The subject was NemoClaw, an open-source enterprise AI agent platform built on top of OpenClaw, the wildly popular agentic AI tool created by Austrian developer Peter Steinberger in January.

Huang called OpenClaw "the most popular open-source project in the history of humanity," which is a bold claim even by his standards (Linux might have something to say about it). But NemoClaw isn't really about OpenClaw's popularity. It is about OpenClaw's security problems. Enterprises want agents that can act autonomously, but they also don't want those agents rifling through sensitive files and phoning home. NemoClaw wraps OpenClaw in a security layer called OpenShell, adds a privacy router for cloud-hosted LLM calls, and bundles NVIDIA's own Nemotron models for local inference.

The platform is hardware-agnostic, which is interesting. NVIDIA doesn't require its own GPUs. But it does integrate with NeMo, NVIDIA's broader AI software suite, and if you're running NemoClaw agents on NVIDIA hardware, the performance advantages are presumably non-trivial. It's the CUDA playbook applied to agents: give away the software, sell the silicon.

NemoClaw launched as an alpha. "Expect rough edges," NVIDIA's developer docs state, which is at least honest.

What gamers get

NVIDIA unveiled DLSS 5, and Huang called it "the GPT moment for graphics." The technology uses a neural rendering model to add photorealistic lighting and material detail to game frames in real time, which sounds like a big deal until you learn the GTC demo required two RTX 5090s running in parallel: one rendering the game, the other running the DLSS 5 model. NVIDIA says the shipping version will work on a single GPU. Fall 2026 is the target. Bethesda, Capcom, Ubisoft, and Warner Bros. Games are signed on.

I'm not sure what to make of the early screenshots. Some look stunning. Others have a slightly uncanny sharpness that suggests the neural model is overriding the game's art direction rather than enhancing it. NVIDIA says developers will get controls for intensity, color grading, and masking. They'll need them.

And then there's the PC chip

The N1X got a mention. It is an ARM-based SoC developed with MediaTek, featuring 20 custom ARM cores and an integrated GPU that NVIDIA claims matches the RTX 5070. Target market: Windows laptops and workstations for local AI inference. Dell and Lenovo reportedly have at least eight laptop models in development. But NVIDIA has been teasing the N1X for over a year now, and a DigiTimes report noted the chip has already been delayed at least once due to silicon modifications. Specs on paper are encouraging. Shipping hardware is another thing entirely.

GTC 2026 runs through March 19, with over 1,000 sessions across ten venues in downtown San Jose. The Q&A session with financial analysts is Tuesday, which is where the $1 trillion number will get pressure-tested. NVIDIA also has an interview scheduled with CNBC, so expect Huang to keep selling. He's good at it.

Tags:NVIDIAGTC 2026Vera RubinJensen HuangAI chipsDLSS 5NemoClawGPUartificial intelligenceHBM4
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

NVIDIA GTC 2026: Vera Rubin, DLSS 5, $1T AI Demand | aiHola