Every time you read AI news, you hear that “some company just bought tens of thousands more GPUs.” Why GPUs, and not the more impressive-sounding CPU? What’s the magic in this chip that has AI companies the world over scrambling to get it?
This piece spells the GPU out. First what it is and how it differs from a CPU, then why AI can’t do without it, who leads the market, and how the generations from H100 to GB300 differ. This is the entry-level take on Gate 1, “AI chips,” in The AI Hardware Supply Chain, End to End.
What Is a GPU?
GPU stands for Graphics Processing Unit. It was originally designed for gaming and 3D graphics, and its defining trait is a large number of simple, small compute cores that can handle tens of thousands of similar calculations at once.
That “strength in numbers” trait later turned out to be a great fit for running AI, because training a neural network is, at its core, breaking a problem into countless tiny matrix operations and computing them together. The GPU was practically born for this kind of scene.
Here’s an analogy: a CPU is like a professor of advanced math, focused on solving one hard problem at a time; a GPU is like a thousand schoolchildren who only know arithmetic, each handed a small piece — and computing it together is far faster. And what AI needs to compute is exactly that scene: hundreds of millions of simple problems all hitting at once.
Core-Data Snapshot
The few numbers below help you grasp the scale of this GPU gate. The market-share figures are mostly estimates from research or policy bodies, so read them for order of magnitude.
| Topic | Data | Timing / Nature |
|---|---|---|
| NVIDIA AI GPU market share | About 80%-plus (cloud in-house ASICs counted separately; varies by measure) | 2025-2026, research / policy-body estimate |
| NVIDIA data-center annual revenue | About $193.7B for FY2026 | NVIDIA official earnings |
| Flagship GPU memory: H200 / B300 | H200 about 141GB HBM3e; B300 about 288GB HBM3e | 2024-2025, official specs |
| GB300 NVL72 rack | 72 Blackwell Ultra GPUs, 20TB of GPU memory | 2026, official specs |
| Next-gen Vera Rubin | Planned for partner supply from 2H 2026 | Official roadmap / forward-looking |
How a GPU Differs from a CPU
The biggest difference between the two lies in “the number of cores and how the work is divided.”
A CPU has few cores, but each one is powerful and all-around capable, paired with plenty of cache and branch control; it excels at step-by-step work that requires judging as it goes — operating systems, databases, program logic. A GPU is the reverse: it packs the chip full of many smaller cores, chasing the total throughput of “doing many things at once,” which suits matrix math, image processing, scientific simulation, and deep learning.
Inside an AI server, these two chips actually divide the labor: the CPU schedules data and manages the program and the network, while the GPU runs the most demanding model computations. So the two are partners, each playing its own role.
Why AI Can’t Do Without the GPU
The key is one word: parallel.
Training and inference for AI models are, at bottom, a vast number of matrix and vector operations happening at the same time. Two kinds of cores inside a GPU matter especially: one is the CUDA Core, a large supply of general-purpose small compute units that handle ordinary parallel work; the other is the Tensor Core, purpose-built to speed up the matrix multiplication AI uses most, with support for low-precision formats like FP16, FP8, and FP4 (trading fewer bits for faster computation). Most AI compute is spent on this kind of math.
Alongside it, you also need high-speed HBM (High Bandwidth Memory) to feed model weights and data fast enough, so the compute cores aren’t sitting idle waiting on data. To learn more about this memory, see the HBM gate.
The AI GPU Market: Who Supplies It
This gate is highly concentrated in NVIDIA.
By research- and policy-body estimates, NVIDIA holds about 80%-plus of the AI GPU market; if you fold the cloud giants’ in-house ASICs into “AI accelerators,” the measure shifts. Its moat isn’t only in the chip — it’s also in the CUDA software ecosystem: the programs developers have written all run on it, and lifting the whole stack to another vendor is costly. AMD chases with its Instinct MI series (such as the MI350 with 288GB HBM3E); cloud giants like Google, AWS, and Microsoft take the in-house ASIC route, cutting costs and differentiating within their own clouds (you can read about that thread in the ASIC gate).
A reminder: share and revenue shift with new products, earnings, and how things are measured. What’s described here is the industry landscape, not a verdict on any single stock.
A Tour of the Mainstream AI GPU Generations
Let’s lay out the workhorse generations of recent years:
| Generation / Product | Positioning | Memory |
|---|---|---|
| H100 (Hopper) | Previous-gen training / inference workhorse | 80-94GB HBM |
| H200 (Hopper) | Hopper memory upgrade | About 141GB HBM3e |
| B200 (Blackwell) | New-gen workhorse | About 180-192GB HBM3e (by SKU) |
| B300 (Blackwell Ultra) | Memory pushed higher | About 288GB HBM3e |
| GB200 / GB300 | Grace CPU + Blackwell GPU Superchip / platform | GB300 NVL72 rack: 72 GPUs, 20TB memory |
| Vera Rubin | Next-gen roadmap (from 2H 2026) | HBM4 |
Two trends are worth grasping: first, memory keeps getting bigger and bandwidth keeps climbing; second, the shift from “a single chip” to “a rack-scale system,” lashing dozens of GPUs together with high-speed interconnect into one supercomputer. To see the details of the flagship generation, read on to the Blackwell gate.
Taiwan’s Role in This Gate
A GPU’s architecture isn’t designed in Taiwan (NVIDIA designs it in the United States), but its physical manufacturing relies heavily on Taiwan. NVIDIA’s Blackwell is produced on TSMC’s custom 4NP process, and it then takes advanced packaging like CoWoS to bind the GPU and HBM together, before Taiwanese firms (Foxconn, Quanta, Wistron, and others) assemble it into rack-scale AI servers. In other words, on a GPU’s journey from a wafer to a usable system, most of the road runs through Taiwan.
Key Takeaways for This Gate
After looking at the GPU, first remember its gift: parallel computing. A large number of small cores firing at once is exactly the fit for AI’s appetite for “a massive number of simple calculations done together” — that’s why AI can’t do without it.
This gate is led by NVIDIA, whose moat is the chip plus the CUDA software ecosystem; AMD and the cloud giants’ in-house ASICs chase from behind. And behind every GPU runs TSMC’s process, advanced packaging, and Taiwan’s server assembly. Understanding the GPU is, in effect, your ticket to understanding the entire AI hardware supply chain.
To see how memory feeds the data, see HBM; to see the flagship generation specs, see Blackwell; to see the cloud giants’ in-house chips, see ASIC; to look back at all eight gates of the chain, head back to the supply-chain overview.