What Is a TPU? Why Google's In-House AI Chip Dares to Square Off Against NVIDIA

The TPU is Google's in-house AI-purpose chip, powering everything from Gemini to Google Search. This is a plain-English guide to what a TPU is, how it differs from a GPU, how the generations evolved up to the seventh-gen Ironwood, why even Anthropic uses TPUs at scale to run Claude, and what roles Broadcom and Taiwan's firms play.

5/28 · Penna

Google TPU illustration: a custom chip built for AI, resting on a warm-toned surface, symbolizing tensor computation

TL;DR

A TPU (Tensor Processing Unit) is Google's in-house custom chip (an ASIC) built specifically for AI computation, co-designing compute, memory, interconnect, and software to chase cost and energy efficiency. Used internally since 2015, the latest version in production is the seventh-gen Ironwood; the eighth-gen 8t/8i was unveiled in April 2026 and is not yet commercially available. Google's own Gemini and Search rely on it, and even Anthropic uses TPUs at scale to run Claude.
Beginners and long-term watchers who want to understand 'what a TPU is, how it differs from a GPU, whether it will replace NVIDIA, and why Anthropic uses it too.'
The TPU is Google's vertically integrated AI-chip route: tailored to its own workloads, built around cost and energy efficiency. Its share inside Google's own AI servers is already high (research firms estimate close to 80% for 2026), but across the global market, NVIDIA's GPUs remain the mainstay thanks to the CUDA ecosystem and general-purpose versatility. TPU and GPU coexist within Google Cloud; for the short term it's a division of labor, not a replacement. The Taiwan-firm lists that get cited are mostly analyst speculation, and this article does not constitute investment advice.

Contents

The earlier ASIC gate noted that, besides buying NVIDIA, the cloud giants are all building their own AI chips — and the most veteran of these is Google’s TPU. This piece spells out the TPU.

First what a TPU is and how it differs from a GPU, then how the generations evolved, whether it will replace NVIDIA, and why even Anthropic uses it at scale, plus what Broadcom and Taiwan’s firms do in it. This is the deep-dive on Google TPU for Gate 1, “AI chips,” of The AI Hardware Supply Chain, End to End.

What Is a TPU

TPU is short for Tensor Processing Unit, Google’s in-house custom chip built specifically for machine learning — a kind of ASIC (a chip tailored to one specific task).

Its design philosophy differs from a GPU’s. A GPU is a more general-purpose parallel-computing chip that can compute anything, with a mature software ecosystem (CUDA). A TPU instead co-designs matrix computation, HBM memory, chip-to-chip interconnect, the compiler, and the framework all together, specializing in AI training and inference, aiming to be more cost-effective on cost, power, latency, and scaling. An analogy: a GPU is like a Swiss army knife that can cut anything, while a TPU is like a chef’s knife Google custom-made for its own handful of dishes.

One clarification: Google does not avoid NVIDIA. Google Cloud has explicitly said it will work closely with NVIDIA and offer next-generation NVIDIA instances, and TPU and GPU run as two parallel product lines within Google Cloud.

Core-Data Snapshot

Below we capture the TPU’s key milestones. Specs are from Google’s official sources; shares are research-firm estimates.

Topic	Data	Timing / Nature
Latest generation in production	Seventh-gen Ironwood (TPU7x)	Google Cloud documentation (2026-05)
Ironwood specs	A single superpod strings 9,216 chips, each with 192 GiB HBM	Google official
Eighth generation	TPU 8t (training) / 8i (inference) unveiled 2026-04, not yet commercial	Google official announcement
TPU share in Google’s AI servers	Estimated about 78% for 2026 (the only cloud provider shipping more ASICs than GPUs)	TrendForce estimate
Anthropic adoption	Up to a million chips, over 1GW in 2026; about 3.5GW via Broadcom from 2027	Anthropic / Broadcom SEC

A Brief History of the Generations: From v1 to Ironwood

The TPU isn’t new. Google has used the first-gen TPU inside its own data centers since 2015, early on running services like RankBrain, Street View, and AlphaGo. It then evolved steadily: the second, third, and fourth generations moved gradually from inference toward large-scale training; the fifth generation split into the cost-saving v5e and the performance-focused v5p; the sixth generation is called Trillium (v6e).

The latest generation currently in production and available on Google Cloud is the seventh-gen Ironwood (TPU7x): a single superpod can string up to 9,216 chips, each with 192 GiB of HBM, specializing in large-model pretraining and inference. The eighth-gen TPU 8t (training) and 8i (inference) had their specs unveiled in April 2026, with the official pitch centered on higher performance per dollar and performance per watt, but for now only intent registration is open, with no general-availability (GA) documentation seen yet — just understand it as “unveiled but not yet commercial,” and don’t treat it as something you can already rent.

TPU vs GPU: Will It Replace NVIDIA

This is the most frequently asked question, and the answer has to be read at two scales: “Google’s own” and “the whole market.”

The TPU’s strength lies in vertical integration: the needs of Google’s own models (such as Gemini) can directly shape the chip’s design, and together with its emphasis on cost and energy efficiency plus ultra-large-scale interconnect, it has a real edge on Google’s own workloads. Research firm TrendForce estimates that in 2026 the TPU’s share of Google’s own AI-server shipments approaches 80%, and Google is the only major cloud provider where “in-house ASIC shipments exceed GPUs.”

But that’s Google’s own internal mix, not global market share. Across the whole market, NVIDIA’s moat is still in place: the CUDA software ecosystem, mature developer tooling, and cross-cloud, cross-framework versatility all keep GPUs the mainstay, and Google Cloud itself keeps selling NVIDIA instances. The TPU also still carries migration costs for external developers — even Google still labels the TPU’s native support for PyTorch as a “preview” stage. So the pragmatic view is this: the TPU is ramping quickly within Google itself and at some AI labs, but in the short term it’s far from replacing NVIDIA — it looks more like coexistence and a division of labor.

Who Uses TPUs

First, Google itself. From early search ranking and Street View to today’s Gemini, a large share of Google’s own AI products run on TPUs. External users can rent them through Google Cloud (Cloud TPU VM, GKE, Vertex AI).

The most closely watched external customer is Anthropic. In October 2025 it announced an expanded adoption of Google Cloud TPUs, scaling up to a million chips and bringing over 1GW of capacity in 2026; in April 2026 it signed further agreements with Google and Broadcom to obtain roughly 3.5GW of next-generation TPU compute through Broadcom starting in 2027. Note that Anthropic runs a diversified strategy, using AWS Trainium and NVIDIA GPUs at the same time, with Amazon still its primary cloud and training partner. Using multiple vendors and spreading bets is exactly the norm for today’s large AI companies. Separately, private-equity giant Blackstone also announced a joint venture with Google in May 2026 to build a TPU cloud in the United States, offering another route to TPU access beyond Google Cloud.

Who Helps Google Build TPUs, and Taiwan’s Role

Google controls the TPU’s architecture and software, but the chip’s detailed design, manufacturing, and packaging need partners.

The clearest is Broadcom. In an April 2026 SEC filing, Broadcom confirmed it had signed a long-term agreement with Google to do custom design and supply for Google’s future TPU generations, and to provide networking and other components for the next-generation AI rack, with the collaboration running through 2031. On manufacturing, the market widely links the TPU’s advanced process and packaging to TSMC, but Google has not officially disclosed the specific process node — this part is supply-chain and media speculation. There have also been media reports that MediaTek is involved in the design of Google’s next-generation inference TPU, but likewise without official confirmation, so it can only be treated as “market reporting.”

As for the “TPU stocks” people often discuss in Taiwan’s market, the market and analysts will name a slate of supply-chain firms (design services, testing, packaging-and-test, test interfaces, substrates, thermal and power, and more). Here it must be made especially clear: these lists mostly come from brokerages’ supply-chain speculation and benefit-driven imagination, not from announcements by Google or Broadcom; being named does not mean a firm has landed orders, nor how much it benefits. This article only describes industry roles; it does not compile beneficiary stocks, does not rank individual names, and does not constitute investment advice.

Key Takeaways for This Gate

The TPU is Google’s in-house AI-purpose chip, co-designing compute, memory, interconnect, and software, following a vertically integrated route built around cost and energy efficiency. The latest in production is the seventh-gen Ironwood, with the eighth-gen 8t/8i unveiled but not yet commercial.

Inside Google’s own AI servers, the TPU’s share is already high (research firms estimate close to 80% for 2026), but globally, NVIDIA remains the mainstay through the CUDA ecosystem and general-purpose versatility, and the two coexist within Google Cloud. Even Anthropic uses TPUs at scale to run Claude, with Broadcom doing the custom design behind it. The Taiwan-firm lists that get cited are mostly analyst speculation — just understand the industry roles, and don’t treat it as a stock-picking list.

To see which category Google TPU falls into, go back and read What Is an ASIC; to see the GPU-mainstay side, read What Is a GPU and Blackwell; to see how these chips get fed data, read HBM; to head back to the whole chain, return to the supply-chain overview.

FAQ

What is a TPU? How is it different from a GPU?

A TPU (Tensor Processing Unit) is Google’s in-house AI-purpose chip, a kind of ASIC (a chip tailored to one specific task). It co-designs matrix computation, HBM memory, interconnect, and software, specializing in AI training and inference. A GPU is more general-purpose, with a mature software ecosystem (CUDA). Put simply, a TPU is the blade Google custom-made for its own workloads, while a GPU is a Swiss army knife that can cut anything.

Why does Google build its own TPU?

Because Google’s internal AI usage is so large that an in-house chip pays off better on cost, power, latency, and scaling, and it lets the chip integrate deeply with Google’s own models (such as Gemini) and Google Cloud services for differentiation. The TPU has been used inside Google’s data centers since 2015, early on for RankBrain, Street View, AlphaGo, and more.

What's the latest TPU generation?

As of May 2026, the latest generation in production is the seventh-gen Ironwood (TPU7x), where a single superpod can string 9,216 chips together, each with 192 GiB of HBM. The eighth-gen TPU 8t (training) and 8i (inference) were unveiled in April 2026 with published specs, but officially only intent registration is open for now, with no general-availability (GA) documentation seen yet — it’s in an unveiled-but-not-yet-commercial stage.

Will the TPU replace NVIDIA?

Not in the short term. Inside Google’s own AI servers, the TPU’s share is already high (research firm TrendForce estimates close to 80% for 2026, and Google is the only one among major cloud providers shipping more ASICs than GPUs). But that’s Google’s own internal mix, not global market share. Across the whole market, NVIDIA remains the mainstay through the CUDA software ecosystem and general-purpose versatility, and even Google Cloud keeps offering NVIDIA instances. The more pragmatic framing is coexistence and a division of labor, not replacement.

Why does even Anthropic use TPUs?

Anthropic runs a multi-vendor, diversified compute strategy. In October 2025 it announced an expanded adoption of Google Cloud TPUs, scaling up to a million chips and bringing over 1GW of capacity in 2026; in April 2026 it signed further agreements with Google and Broadcom to obtain roughly 3.5GW of next-generation TPU compute through Broadcom starting in 2027. But Anthropic also uses AWS Trainium and NVIDIA GPUs at the same time, with Amazon still its primary cloud and training partner.

Disclaimer and disclosures

This article is for general information and education only. It is not investment, legal, tax, or professional advice. Markets and regulations may change at any time, and the information reflects conditions at the time of writing.

Penchan is not a registered securities investment adviser. Any securities, digital assets, or financial products mentioned are covered for informational purposes only and are not buy or sell recommendations. Make your own decisions and accept your own risk.

Some or all of this article involved AI (Penna) assistance. The exact share varies by article. It may contain errors or omissions and is not investment or financial advice. Please verify against original sources.

The author may hold some assets mentioned in this article. Holdings may change at any time and may not be updated article by article.

See this site's Legal Notice and Disclosures and Privacy Policy.