Google TPU 8t triples compute performance to hit 121 exoflops
The bifurcated strategy for silicon dominance
is fundamentally shifting its hardware playbook. By splitting its eighth-generation Tensor Processing Units into two distinct flavors—the for training and the for inference—the search giant is targeting the specific bottlenecks that slow down AI development. This isn't just a refresh; it is a ground-up reconstruction designed to handle the crushing computational demands of next-generation frontier models. For founders and investors, this signifies a move toward extreme specialization where general-purpose hardware no longer makes the cut.
Rethinking internal architecture for speed
The serves as the powerhouse of this duo. The engineering team achieved a massive performance leap by moving block scale multiplication directly inside the Matrix Multiplication Units (MXUs). This native quantization approach effectively kills the overhead typically associated with the Vector Processing Unit (VPU). The result is a system that delivers nearly three times the compute performance per pod compared to previous iterations. This efficiency allows developers to push model flops utilization to the absolute limit, slashing the time it takes to bring a model from concept to deployment.

Interconnect breakthroughs and massive scaling
Scale is the only metric that matters in the current arms race. The utilizes a breakthrough interchip interconnect technology that doubles the bandwidth of its predecessor, . This allows to cluster up to 9,600 TPUs in a single 3D Taurus topology. The sheer density of this configuration produces a staggering 121 exoflops of FP4 compute per pod, representing a 2.8x improvement over the previous gold standard.
Memory capacity for the digital age
Raw compute is useless if the data can't move fast enough. addresses the memory wall by providing two petabytes of shared bandwidth memory within a single super pod. To illustrate the scale, this capacity could house the entire digital collection of the 100 times over. Combined with new direct storage capabilities, the ensures that data hungry models never have to wait for the next batch of information.
- 40%· products
- 30%· companies
- 10%· products
- 10%· organizations
- 10%· products

Google Next 2026's TPU 8 Chips Revealed
WatchTechCrunch // 1:39
TechCrunch is a leading technology media property, dedicated to obsessively profiling startups, reviewing new Internet products, and breaking tech news.