Every time NVIDIA holds a launch event, the news fills up with a pile of names: B200, GB200, GB300, NVL72, Rubin. They all sound impressive, but are they different things, or just different ways of saying the same thing?

This piece lays out the Blackwell generation all at once. First we separate the three levels — GPU, superchip, and rack-scale system — then look at how the specs differ, where it’s shipping, and the role of the next-gen Rubin and Taiwan’s contract manufacturers. This is the deep-dive version of Gate 1, “AI chips,” in The AI Hardware Supply Chain, End to End.


First, Get It Straight: GPU, Superchip, and Rack Are Three Levels

What trips people up most in the news is mixing three different levels together. Let’s pull them apart:

  • B200 / B300: a single GPU, the most basic compute chip.
  • GB200 / GB300: one Grace CPU paired with two GPUs, bound into a single “superchip.”
  • GB200 / GB300 NVL72: 36 Grace CPUs and 72 GPUs strung together with a high-speed link into a full rack, operated as one supercomputer.

So when you see GB300 NVL72, it means a full rack of 72 GPUs; when you see B300, it means the single GPU inside. Hold onto this small-to-large ladder and the numbers later won’t get confusing.


What Is Blackwell? Why Is It So Powerful

Blackwell is the AI chip architecture NVIDIA launched in 2024 and shipped in volume from 2025 into 2026, succeeding the previous Hopper generation (H100/H200).

Its most crucial design is stitching two large compute dies into a single GPU. Because of process limits, there’s a ceiling on how big a single die can be, so NVIDIA uses a 10 TB/s ultra-fast link (NV-HBI) to join two dies into a GPU that “looks like one,” packing about 208 billion transistors in total, with manufacturing handed to TSMC’s 4NP process.

On compute, Blackwell leans on an ultra-low-precision number format called NVFP4 (think of it as representing numbers in a more economical way, trading that for more operations per second), pushing AI inference throughput up in one big jump. In a sentence: Blackwell is a flagship AI GPU that “works like two in one.”


Core-Data Snapshot

Below we put the Blackwell generation’s key specs side by side. First, three terms: HBM is the high-speed memory sitting next to the GPU, PFLOPS is how many floating-point operations it can do per second, and CoWoS is the advanced packaging that seals the GPU and HBM together. Numbers follow NVIDIA’s published figures.

ProductLevelKey specsStatus
B200GPU192GB HBM3E, 8 TB/s bandwidth, NVFP4 about 10 PFLOPS, power about 1200WShipping
B300 (Blackwell Ultra)GPU288GB HBM3E, 8 TB/s bandwidth, NVFP4 about 15 PFLOPS, power about 1400WShipping
GB200 NVL72Rack (72 GPU + 36 Grace)NVFP4 about 720 PFLOPS, 13.4TB HBM3E, rack on the order of 120kWShipping
GB300 NVL72Rack (72 Ultra + 36 Grace)NVFP4 about 1,080 PFLOPS, 20TB HBM3E, rack on the order of 120kWDeploying
Vera Rubin NVL72Next-gen rackSingle Rubin 288GB HBM4 / 22 TB/s, rack NVFP4 inference about 3,600 PFLOPSH2 2026 (preliminary specs)

(The NVFP4 figures for Blackwell here are dense values; the sparse values NVIDIA’s marketing often cites are roughly double. Rack power varies with the power-delivery and cooling configuration; the Rubin rack figure is an inference measure and can’t be compared directly with Blackwell’s dense values.)


From B200 to B300: Same Architecture, Pushed Up a Notch

Launched in 2025 and rolling into commercial deployment from the second half of the year, the B300 — officially codenamed Blackwell Ultra — is the beefed-up version of the same architecture.

The two upgrades you feel most: memory goes from 192GB to 288GB HBM3E, up 50%, fitting larger models; low-precision (NVFP4) dense compute is also up about 50%. The cost is power rising from about 1,200 watts to about 1,400 watts. The rack-scale GB300 NVL72 therefore pulls memory from 13.4TB up to 20TB, making it better suited to running inference on very large models. For cloud providers, this is a “same production line, specs jump up a notch” upgrade that rides the existing momentum, with no need to redo the whole architecture.


Where Is It Shipping?

Blackwell isn’t a PowerPoint spec — it’s already running for real.

NVIDIA officially labels Blackwell “full production,” with both HGX B200 and B300 shipping. The rack-scale GB300 NVL72 has landed too: cloud provider CoreWeave was first to deploy it commercially in mid-2025, and Microsoft Azure went further in October 2025, standing up a production-grade cluster of thousands of GB300 GPUs for OpenAI. In other words, the AI compute expansion of 2026 still leans on Blackwell and Blackwell Ultra as the mainstay.

The capacity bottleneck is still the same old place: how many Blackwell can ship is limited by TSMC’s CoWoS advanced packaging and HBM memory supply, the two gates the earlier standalone pieces have already broken down.


The Next-Gen Rubin: Announced, but Don’t Rush to Say It Replaces Blackwell

NVIDIA has formally announced the next-gen platform Vera Rubin, and the chip itself has already entered full production. The rack-scale Vera Rubin NVL72 is built from 72 Rubin GPUs plus 36 Vera CPUs; each Rubin switches to next-gen HBM4 memory (288GB, 22 TB/s bandwidth), and the rack’s NVFP4 inference compute reaches up to about 3,600 PFLOPS, another big jump above Blackwell.

But tap the brakes here. The product page still marks the specs as “preliminary, subject to change,” with an official target of deployment by cloud providers like AWS, Google Cloud, and Microsoft in the second half of 2026. Research firms also estimate that NVIDIA’s high-end GPU shipments in 2026 will still be dominated by Blackwell (its share rising from roughly 60% toward 70%), with Rubin still carrying supply-chain tuning and schedule risk — and the validation and supply of HBM4 is the single most critical variable. So the pragmatic view is: 2026 is Blackwell’s home turf, and Rubin is the next baton in the queue, not an immediate handover.


Taiwan’s Role at This Gate

Chip design sits with NVIDIA and manufacturing with TSMC, so the job of “assembling the rack-scale system and making it mass-producible and shippable” falls mainly to Taiwan.

Foxconn has publicly shown off a full Vera Rubin NVL72 system; supply-chain reports name Quanta, Wistron and Wiwynn, Inventec, and other Taiwanese ODM/EMS players as participating in the contract manufacturing of GB200/GB300 rack-scale systems, and also note that NVIDIA, racing to lock up capacity, has reserved part of certain Taiwanese makers’ server-plant space through 2026. In other words, Taiwan doesn’t just make the wafers and packaging — assembling and shipping “a whole rack of AI supercomputer” is also a key global base. This is only an industry map and makes no investment judgment on any individual stock.


Key Takeaways for This Gate

After looking at Blackwell, first remember that small-to-large ladder: B200/B300 are GPUs, GB200/GB300 are superchips, NVL72 is a system of 72 GPUs in a full rack.

Technically, Blackwell uses “two dies stitched into one” plus NVFP4 low-precision compute to push throughput high; the B300 (Blackwell Ultra) then nudges both memory and compute up about 50% each. The mainstay in 2026 is Blackwell — already shipping in volume and really deployed in the cloud; Rubin is the announced next baton, targeting a second-half debut, but its specs are still preliminary, with HBM4 and the rack-scale supply chain as the biggest variables.

To learn about the HBM that feeds data to these GPUs and the CoWoS that binds the chips together, see What Is HBM and What Is CoWoS; to see how all eight gates of the chain string together, head back to the supply-chain overview.