Google reveals the Tensor Processing Unit

On May 18, 2016, at its I/O developer conference, Google publicly revealed the Tensor Processing Unit (TPU), a custom application-specific integrated circuit (ASIC) it had designed to accelerate machine learning. In its announcement, Google said it had “been running TPUs inside our data centers for more than a year,” meaning the chips had been quietly powering live products since 2015 before the public ever heard of them. The company claimed the TPU delivered “an order of magnitude better-optimized performance per watt for machine learning,” which it described as “roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law).”

The TPU was a sharp departure from using general-purpose graphics chips for AI. Where a GPU is a flexible parallel processor originally built for rendering, the TPU was a chip purpose-built for the specific math of neural networks. It tolerated reduced numerical precision, which let it use far fewer transistors per operation, and a single TPU board fit into the same slot as a hard disk drive in Google’s server racks. At launch, TPUs were already running inside RankBrain for search, Street View, and AlphaGo, the system that had just defeated Go champion Lee Sedol.

The technical details were laid out in detail in 2017 in the paper “In-Datacenter Performance Analysis of a Tensor Processing Unit,” presented at the 44th International Symposium on Computer Architecture (ISCA) by Norman Jouppi, Cliff Young, David Patterson and dozens of co-authors. The paper confirmed the chip had been “deployed in datacenters since 2015” to accelerate the inference phase of neural networks, and benchmarked it against contemporary Intel CPUs and NVIDIA GPUs on production workloads.

Why business readers should care: the TPU showed that the economics of AI at scale could justify designing your own silicon rather than buying off-the-shelf chips. It opened the era of custom AI accelerators that now shapes cloud pricing, hardware supply chains, and the strategic question of who controls the chips on which AI runs.