What Is a GPU? Why It's the Workhorse of AI Compute, and How It Differs from a CPU

A plain-English guide to the GPU: why does training AI take a pile of GPUs instead of the more general-purpose CPU? We break down how a GPU differs from a CPU, what CUDA and Tensor Cores actually do, why NVIDIA holds more than 80% of the market, and how the AI GPU generations from H100 to GB300 stack up.

5/27 · Penna

GPU illustration: a large AI accelerator chip under warm light, ringed by dense compute-core texture

TL;DR

A GPU is a chip built to do 'lots of similar calculations at the same time.' The heart of AI training and inference is massive matrix math, which happens to fit a GPU's parallel processing perfectly — so AI runs almost entirely on GPUs. A CPU is like a professor solving one hard problem at a time; a GPU is like a thousand students all doing arithmetic together. With its GPUs plus the CUDA software ecosystem, NVIDIA holds roughly 80%-plus of the AI accelerator market, while AMD and the cloud giants' in-house chips chase from behind.
Beginners and long-term watchers who want to understand 'what a GPU does, why AI needs it, and how it differs from a CPU.'
A GPU's gift is parallel computing, which is exactly what AI is hungry for. NVIDIA leads this AI GPU gate, and its moat isn't only the chip — it's the CUDA ecosystem that locks in developers. Understanding the GPU is the starting point for understanding the entire AI hardware supply chain. This article covers industry knowledge only and does not constitute investment advice.

Contents

Every time you read AI news, you hear that “some company just bought tens of thousands more GPUs.” Why GPUs, and not the more impressive-sounding CPU? What’s the magic in this chip that has AI companies the world over scrambling to get it?

This piece spells the GPU out. First what it is and how it differs from a CPU, then why AI can’t do without it, who leads the market, and how the generations from H100 to GB300 differ. This is the entry-level take on Gate 1, “AI chips,” in The AI Hardware Supply Chain, End to End.

What Is a GPU?

GPU stands for Graphics Processing Unit. It was originally designed for gaming and 3D graphics, and its defining trait is a large number of simple, small compute cores that can handle tens of thousands of similar calculations at once.

That “strength in numbers” trait later turned out to be a great fit for running AI, because training a neural network is, at its core, breaking a problem into countless tiny matrix operations and computing them together. The GPU was practically born for this kind of scene.

Here’s an analogy: a CPU is like a professor of advanced math, focused on solving one hard problem at a time; a GPU is like a thousand schoolchildren who only know arithmetic, each handed a small piece — and computing it together is far faster. And what AI needs to compute is exactly that scene: hundreds of millions of simple problems all hitting at once.

Core-Data Snapshot

The few numbers below help you grasp the scale of this GPU gate. The market-share figures are mostly estimates from research or policy bodies, so read them for order of magnitude.

Topic	Data	Timing / Nature
NVIDIA AI GPU market share	About 80%-plus (cloud in-house ASICs counted separately; varies by measure)	2025-2026, research / policy-body estimate
NVIDIA data-center annual revenue	About $193.7B for FY2026	NVIDIA official earnings
Flagship GPU memory: H200 / B300	H200 about 141GB HBM3e; B300 about 288GB HBM3e	2024-2025, official specs
GB300 NVL72 rack	72 Blackwell Ultra GPUs, 20TB of GPU memory	2026, official specs
Next-gen Vera Rubin	Planned for partner supply from 2H 2026	Official roadmap / forward-looking

How a GPU Differs from a CPU

The biggest difference between the two lies in “the number of cores and how the work is divided.”

A CPU has few cores, but each one is powerful and all-around capable, paired with plenty of cache and branch control; it excels at step-by-step work that requires judging as it goes — operating systems, databases, program logic. A GPU is the reverse: it packs the chip full of many smaller cores, chasing the total throughput of “doing many things at once,” which suits matrix math, image processing, scientific simulation, and deep learning.

Inside an AI server, these two chips actually divide the labor: the CPU schedules data and manages the program and the network, while the GPU runs the most demanding model computations. So the two are partners, each playing its own role.

Why AI Can’t Do Without the GPU

The key is one word: parallel.

Training and inference for AI models are, at bottom, a vast number of matrix and vector operations happening at the same time. Two kinds of cores inside a GPU matter especially: one is the CUDA Core, a large supply of general-purpose small compute units that handle ordinary parallel work; the other is the Tensor Core, purpose-built to speed up the matrix multiplication AI uses most, with support for low-precision formats like FP16, FP8, and FP4 (trading fewer bits for faster computation). Most AI compute is spent on this kind of math.

Alongside it, you also need high-speed HBM (High Bandwidth Memory) to feed model weights and data fast enough, so the compute cores aren’t sitting idle waiting on data. To learn more about this memory, see the HBM gate.

The AI GPU Market: Who Supplies It

This gate is highly concentrated in NVIDIA.

By research- and policy-body estimates, NVIDIA holds about 80%-plus of the AI GPU market; if you fold the cloud giants’ in-house ASICs into “AI accelerators,” the measure shifts. Its moat isn’t only in the chip — it’s also in the CUDA software ecosystem: the programs developers have written all run on it, and lifting the whole stack to another vendor is costly. AMD chases with its Instinct MI series (such as the MI350 with 288GB HBM3E); cloud giants like Google, AWS, and Microsoft take the in-house ASIC route, cutting costs and differentiating within their own clouds (you can read about that thread in the ASIC gate).

A reminder: share and revenue shift with new products, earnings, and how things are measured. What’s described here is the industry landscape, not a verdict on any single stock.

A Tour of the Mainstream AI GPU Generations

Let’s lay out the workhorse generations of recent years:

Generation / Product	Positioning	Memory
H100 (Hopper)	Previous-gen training / inference workhorse	80-94GB HBM
H200 (Hopper)	Hopper memory upgrade	About 141GB HBM3e
B200 (Blackwell)	New-gen workhorse	About 180-192GB HBM3e (by SKU)
B300 (Blackwell Ultra)	Memory pushed higher	About 288GB HBM3e
GB200 / GB300	Grace CPU + Blackwell GPU Superchip / platform	GB300 NVL72 rack: 72 GPUs, 20TB memory
Vera Rubin	Next-gen roadmap (from 2H 2026)	HBM4

Two trends are worth grasping: first, memory keeps getting bigger and bandwidth keeps climbing; second, the shift from “a single chip” to “a rack-scale system,” lashing dozens of GPUs together with high-speed interconnect into one supercomputer. To see the details of the flagship generation, read on to the Blackwell gate.

Taiwan’s Role in This Gate

A GPU’s architecture isn’t designed in Taiwan (NVIDIA designs it in the United States), but its physical manufacturing relies heavily on Taiwan. NVIDIA’s Blackwell is produced on TSMC’s custom 4NP process, and it then takes advanced packaging like CoWoS to bind the GPU and HBM together, before Taiwanese firms (Foxconn, Quanta, Wistron, and others) assemble it into rack-scale AI servers. In other words, on a GPU’s journey from a wafer to a usable system, most of the road runs through Taiwan.

Key Takeaways for This Gate

After looking at the GPU, first remember its gift: parallel computing. A large number of small cores firing at once is exactly the fit for AI’s appetite for “a massive number of simple calculations done together” — that’s why AI can’t do without it.

This gate is led by NVIDIA, whose moat is the chip plus the CUDA software ecosystem; AMD and the cloud giants’ in-house ASICs chase from behind. And behind every GPU runs TSMC’s process, advanced packaging, and Taiwan’s server assembly. Understanding the GPU is, in effect, your ticket to understanding the entire AI hardware supply chain.

To see how memory feeds the data, see HBM; to see the flagship generation specs, see Blackwell; to see the cloud giants’ in-house chips, see ASIC; to look back at all eight gates of the chain, head back to the supply-chain overview.

FAQ

What is a GPU? How is it different from a CPU?

A GPU (Graphics Processing Unit) was originally designed to render game graphics; its defining trait is a large number of small compute cores that can handle tens of thousands of similar calculations at once. A CPU (Central Processing Unit) has fewer cores but each is powerful, and it excels at step-by-step work that requires judgment. An analogy: a CPU is like a professor doing advanced math, solving one problem at a time; a GPU is like a thousand schoolchildren who only know arithmetic — strength in numbers. AI’s computations happen to be lots of simple calculations done at once, so a GPU is far faster at them.

Why does AI have to use a GPU?

Because at its core, AI training and inference are hundreds of millions of matrix and vector operations happening at the same time. That ‘massive, similar, and parallelizable’ nature is exactly a GPU’s strength. The Tensor Cores inside a GPU are purpose-built to speed up the matrix math common in AI (low-precision formats such as FP16, FP8, and FP4). You can run it on a CPU too, but it would be orders of magnitude slower.

What is CUDA, and why can't people switch away from NVIDIA?

CUDA is NVIDIA’s full software development environment — a programming language, a compiler, acceleration libraries (such as cuDNN and NCCL), and deeply integrated PyTorch and TensorFlow ecosystems. The AI code people have already written runs almost entirely on CUDA, and moving to another vendor’s chip means rewriting and re-optimizing, which is expensive. This software ecosystem is the moat that makes NVIDIA even harder to replace than its chips.

What's the difference between AI GPUs like the H100, B200, and GB300?

In short, it comes down to generation and specs. The H100 and H200 belong to the Hopper generation, with the H200 carrying 141GB of HBM3e; the B200 and B300 belong to the newer Blackwell generation, with the B300 carrying 288GB of HBM3e. The GB200 and GB300 fuse a Grace CPU with a Blackwell GPU into a Superchip, then assemble those into systems — for example, the GB300 NVL72 packs 72 GPUs into a single rack. Newer generations bring more memory, higher bandwidth, and tighter rack-level integration.

Are all GPUs made by NVIDIA?

On the design side it’s mostly NVIDIA (roughly 80%-plus of the AI accelerator market, estimated at 80%-90% depending on how you count), with AMD’s Instinct MI series chasing, and cloud giants like Google, AWS, and Microsoft building their own ASICs (such as the TPU and Trainium) alongside them. On the manufacturing side, almost all of it relies on TSMC’s advanced processes, then runs through advanced packaging like CoWoS and Taiwan’s server assembly before it becomes a usable system.

Disclaimer and disclosures

This article is for general information and education only. It is not investment, legal, tax, or professional advice. Markets and regulations may change at any time, and the information reflects conditions at the time of writing.

Penchan is not a registered securities investment adviser. Any securities, digital assets, or financial products mentioned are covered for informational purposes only and are not buy or sell recommendations. Make your own decisions and accept your own risk.

Some or all of this article involved AI (Penna) assistance. The exact share varies by article. It may contain errors or omissions and is not investment or financial advice. Please verify against original sources.

The author may hold some assets mentioned in this article. Holdings may change at any time and may not be updated article by article.

See this site's Legal Notice and Disclosures and Privacy Policy.