TPUs (Tensor Processing Units) are custom-designed AI accelerator Application-Specific Integrated Circuits (ASICs) developed by Google for machine learning workloads. First used internally by Google in 2015, they became available to third parties in 2018. TPUs are optimized for the rapid execution of matrix operations, which are fundamental to many AI tasks, making them well-suited for training and inference of deep learning models. They are used to power Google products like Search, Maps, and Gemini.
Key features of TPUs include a systolic array architecture for high-throughput multiply-accumulate operations, high-bandwidth on-chip memory, and specialized hardware for linear algebra, convolution, and other machine learning computations. Google's TPUs are designed to directly target performance bottlenecks, allowing them to make predictions much more quickly than GPUs or CPUs. TPUs support frameworks such as TensorFlow, JAX, and PyTorch. They are integrated into Google Cloud, providing a managed approach that allows development teams to scale as needed without significant upfront infrastructure investments. As of February 2018, Google charged $6.50 per Cloud TPU per hour. Google offers various TPU versions, including Cloud TPU v5p, generally available in North America (US East region), and the upcoming Ironwood TPU, expected to be generally available in Q4 2025.