In-House TPUs: Just How Deep Is Google's Chip-Cost Moat?

Of all the AI giants, Google leans on Nvidia the least—thanks to the TPU chips it designs itself. This piece unpacks how the TPU drives its costs down, how deep the moat really runs, and the one global supply chain it still hasn't escaped.

5/31 · Penna

The cost moat and supply chain behind Google's in-house TPU chips

TL;DR

The TPU is Google's own in-house AI accelerator, and it makes the company the least Nvidia-dependent of the big players. Designing its own chips drives down the cost of compute for large-scale internal workloads, but the TPU still relies on TSMC to manufacture it, still needs HBM and advanced packaging—it never broke free of the global supply chain.
This article is for readers who want to know why Google barely needs to buy Nvidia, how much the TPU actually saves it, and whether that edge is something it can count on.
The TPU's savings show up mostly in Google's own large-scale workloads; for general-purpose, on-demand inference, the Nvidia ecosystem still holds an edge today. The moat is real, but its water source is still that same global supply chain running from TSMC to HBM.

Contents

In this AI race, most companies have to queue up to buy GPUs from Nvidia. Google is the exception, holding a card no one else has: an AI chip it designs itself, the TPU.

That card makes it the least Nvidia-dependent of all the big AI players. While rivals hand large budgets to Nvidia and still have to wait on supply, Google runs much of its training and inference on its own TPUs. If you want the full picture of Google as a company first, start with What kind of company is Google.

This piece answers just one question: by making its own chips, how much does Google actually save, how deep is this moat, and which hurdles can’t it clear?

The one-line take. The TPU really does save money, but what it saves is Google’s own large-scale compute; it hasn’t let Google break free of that global supply chain running from TSMC to HBM.

The TPU is Google’s own in-house AI chip

TPU is short for Tensor Processing Unit, a chip Google purpose-built to run its own AI workloads—a specialized ASIC. It isn’t sold at retail; it mainly serves Google internally and is also rented out to enterprises through Google Cloud.

Google has been making TPUs for a decade, stacking one generation on top of the next. The latest one out is the seventh-generation Ironwood, which Google positions as “the first TPU for the age of inference,” with another step up in performance per watt over the previous Trillium generation. After that comes the eighth generation, where Google did something it had never done before: it split training and inference into two different chip designs, each optimized on its own.

We won’t dig into the principles behind the chip here; if you want to know how the TPU differs from the GPU and what an ASIC is, read our supply-chain series, the AI accelerator chip breakdown and the dedicated TPU piece.

Where exactly does making your own chips save money

The most direct evidence comes from Google’s own earnings. On its 2025 earnings call, Google said it had cut Gemini’s per-unit serving cost by roughly 78% that year, crediting model optimization plus efficiency and utilization gains from its own hardware.

The savings logic has two layers. One is bypassing the middleman’s margin: by designing its own chips, it doesn’t have to buy GPUs from Nvidia at retail prices. The other is the bespoke fit: the TPU is designed for Google’s own workloads, so it runs more precisely on target. Some analysts estimate that on certain large language model training workloads, the latest-generation TPU can run at a total cost of ownership about 40% lower than a comparable Nvidia chip.

Here we should add an honest caveat, so readers don’t picture the edge as bigger than it is. That 40% is an analyst estimate for “optimized, internal use cases,” and it varies by workload. On general-purpose, on-demand cloud inference, independent benchmarks show the Nvidia ecosystem still holds a clear lead today. In other words, where the TPU saves the most is the kind of work Google itself runs—the super-high-volume jobs that can be scheduled at leisure—not every type of compute it sells to others.

Behind the end-to-end story sits a very long supply chain

“In-house chips” makes it sound like Google does everything itself; in reality, it relies on a whole row of partners.

The chip’s architecture is led by Google, but the design collaboration is split among several companies: the latest-generation training chip is co-designed with Broadcom, with the two parties signed all the way through 2031; the inference chip is handled by Taiwan’s MediaTek; and Google is also in talks with Marvell as a third design partner. The one that actually builds the chip is TSMC’s advanced process; the eighth-generation TPU is targeting 2nm, with mass production expected by the end of 2027.

The memory side is even more delicate. The TPU needs large amounts of HBM (high-bandwidth memory), and according to industry reports, Samsung has recently supplied a relatively high share of the HBM for Google’s TPUs—but that’s an industry rumor, not an official figure from the three memory makers, and Google has never disclosed the exact allocation. Demand for advanced packaging (CoWoS) has also surged, likewise stuck on TSMC’s capacity.

Putting it all together, Google does have more control than rivals who can only buy GPUs externally, but it hasn’t truly broken free of the global supply chain. TSMC’s advanced process, the supply of HBM, the advanced-packaging bottleneck—these are all the links the whole industry is scrambling for, and all within the scope of U.S. export controls. For how the entire chain works, see The end-to-end AI hardware supply chain.

Even its rivals are renting its chips

The TPU’s strength is clearest from a slightly paradoxical angle: Google’s rivals are using its chips too.

Anthropic signed Google’s largest-ever TPU purchase agreement, on the order of a million units; the market has also heard that Meta is in talks with Google over a large TPU deployment. The most intriguing thread is Anthropic: Google pours heavy investment into it on one hand while renting compute to it on the other, a complex triangle we unpack specifically in Why Google invests in its rival Anthropic.

That rivals are willing to hand something as critical as training their next-generation models over to the TPU is, in itself, an endorsement of its cost-performance.

Chips need power—where does the power come from

Chips are only half the story; the other half is power. AI compute at this scale draws a staggering amount of electricity, so energy has become another front for Google.

Its bets run along two paths. One is nuclear: Google signed an agreement with Kairos Power, planning to deploy up to 500 MW of small modular reactors (SMRs) for power between 2030 and 2035, with the first targeted to come online in 2030. The other is renewables: it signed a 1 GW solar power purchase agreement in Texas, signed over 1 GW more across several U.S. power markets, and in early 2026 spent $4.75 billion to acquire Intersect Power, a clean-energy and data-center infrastructure company—all aimed at locking down power sources first, so expansion doesn’t stall for lack of electricity.

This energy thread will grow alongside the data centers; it’s an increasingly critical piece that’s easy to overlook when you look at Google’s AI strategy.

The parts not yet laid bare

Plenty of numbers on this topic are stuck at the “industry reports” stage, and we’ll flag them honestly here:

The exact share of each HBM supplier: the claim that Samsung’s share is relatively high recently comes from industry chatter and Korean media; none of the three memory makers have come forward to confirm it, and the exact allocation is undisclosed.
The TPU’s share of TSMC’s capacity: there are estimates based on MediaTek’s orders, but how much of TSMC’s advanced process and packaging capacity Google itself takes up has never been disclosed officially.
Ironwood’s process node: the industry points to TSMC 3nm, but Google hasn’t announced it officially.
The internal procurement ratio between TPU and Nvidia: Alphabet’s earnings don’t break out the dollar amounts for in-house TPUs versus externally purchased GPUs separately.

None of these have an official verdict yet, so any single, ever-more-precise number is worth keeping a question mark beside.

Penna’s take

Pull the TPU card back for a wide view, and its value isn’t in being “the cheapest”—it’s in having “the most control.”

While others fret over scrambling for Nvidia’s supply and agonize over purchase prices, Google holds a chip line it can schedule on its own. In an era where compute is national power, that’s some serious backbone. But this moat has its clear boundaries: where it saves the most is its own large-scale compute, not every kilowatt it sells externally; and its water source is still that global supply chain running from TSMC and HBM to advanced packaging. Google is closer to “chip self-sufficiency” than anyone, yet no one has truly achieved it.

Further reading: What kind of company is Google, The end-to-end AI hardware supply chain, Why Google invests in its rival Anthropic.

FAQ

What is a TPU?

TPU stands for Tensor Processing Unit—a chip Google designs itself, purpose-built to run AI workloads, a type of ASIC tailored to a specific task. It isn’t sold at retail; it mainly powers Google’s own services, and it’s also rented out to enterprise customers through Google Cloud. For the technical principles, see our breakdown of AI accelerator ASICs.

Does an in-house TPU really save money?

On Google’s own large-scale workloads, the savings are clear. In its 2025 earnings, Google said it had cut Gemini’s per-unit serving cost by roughly 78% that year. Some analysts estimate that on certain training workloads, the latest-generation TPU can run at a total cost of ownership about 40% lower than a comparable Nvidia chip. But that applies to optimized, internal use cases; for general-purpose, on-demand inference, the Nvidia ecosystem still holds an edge.

Does Google use Nvidia at all?

It’s not that it avoids Nvidia entirely. Google is the least Nvidia-dependent of the big players, running a large share of its own compute on TPUs—but its Google Cloud offers customers both TPUs and Nvidia GPUs, so Nvidia is still part of its supply mix.

Does Google build the TPU itself?

The chip architecture is led by Google, but it has partners for design collaboration and manufacturing. The latest-generation training chip is co-designed with Broadcom, the inference chip is handled by MediaTek, and the actual manufacturing goes to TSMC’s advanced process nodes. So it’s a combination of “designed by Google, built with partners, fabbed by TSMC.”

Is this cost advantage something Google can rely on?

It has a real foundation, but with two caveats. First, where it saves the most is Google’s own high-volume internal use, which isn’t the same as being the cheapest for external, on-demand cloud. Second, the TPU still depends on TSMC capacity, HBM memory, and advanced packaging—all bottlenecks the whole world is fighting over, and all within the scope of export controls.

Disclaimer and disclosures

This article is for general information and education only. It is not investment, legal, tax, or professional advice. Markets and regulations may change at any time, and the information reflects conditions at the time of writing.

Penchan is not a registered securities investment adviser. Any securities, digital assets, or financial products mentioned are covered for informational purposes only and are not buy or sell recommendations. Make your own decisions and accept your own risk.

Some or all of this article involved AI (Penna) assistance. The exact share varies by article. It may contain errors or omissions and is not investment or financial advice. Please verify against original sources.

The author may hold some assets mentioned in this article. Holdings may change at any time and may not be updated article by article.

See this site's Legal Notice and Disclosures and Privacy Policy.