Google TPU 8t triples compute performance to hit 121 exoflops

TechCrunch//Apr 22, 2026//2 min read

The bifurcated strategy for silicon dominance

is fundamentally shifting its hardware playbook. By splitting its eighth-generation Tensor Processing Units into two distinct flavors—the for training and the for inference—the search giant is targeting the specific bottlenecks that slow down AI development. This isn't just a refresh; it is a ground-up reconstruction designed to handle the crushing computational demands of next-generation frontier models. For founders and investors, this signifies a move toward extreme specialization where general-purpose hardware no longer makes the cut.

Rethinking internal architecture for speed

The serves as the powerhouse of this duo. The engineering team achieved a massive performance leap by moving block scale multiplication directly inside the Matrix Multiplication Units (MXUs). This native quantization approach effectively kills the overhead typically associated with the Vector Processing Unit (VPU). The result is a system that delivers nearly three times the compute performance per pod compared to previous iterations. This efficiency allows developers to push model flops utilization to the absolute limit, slashing the time it takes to bring a model from concept to deployment.

Google TPU 8t triples compute performance to hit 121 exoflops — Google Next 2026's TPU 8 Chips Revealed

Interconnect breakthroughs and massive scaling

Scale is the only metric that matters in the current arms race. The utilizes a breakthrough interchip interconnect technology that doubles the bandwidth of its predecessor, . This allows to cluster up to 9,600 TPUs in a single 3D Taurus topology. The sheer density of this configuration produces a staggering 121 exoflops of FP4 compute per pod, representing a 2.8x improvement over the previous gold standard.

Memory capacity for the digital age

Raw compute is useless if the data can't move fast enough. addresses the memory wall by providing two petabytes of shared bandwidth memory within a single super pod. To illustrate the scale, this capacity could house the entire digital collection of the 100 times over. Combined with new direct storage capabilities, the ensures that data hungry models never have to wait for the next batch of information.

Topic DensityMention share of the most discussed topics · 10 mentions across 5 distinct topics

: 40%· products
: 30%· companies
: 10%· products
: 10%· organizations
: 10%· products

End of Article

Source video

Google TPU 8t triples compute performance to hit 121 exoflops

Google Next 2026's TPU 8 Chips Revealed

TechCrunch // 1:39

TechCrunch

TechCrunch

TechCrunch is a leading technology media property, dedicated to obsessively profiling startups, reviewing new Internet products, and breaking tech news.

Who and what they mention most

25.9%7

25.9%7

18.5%5

14.8%4

14.8%4

2 min read0%

2 min read