TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning

This 2023 paper, again led by Norman Jouppi and David Patterson with a Google team, describes the fourth-generation Tensor Processing Unit and the supercomputer built around it. The headline innovation is the use of optical circuit switches (OCSes) to dynamically reconfigure the interconnect topology that links the chips. Instead of a fixed wiring pattern, the network can be rewired on the fly to improve scale, availability, utilization, and fault isolation, and the authors report that these optical switches account for less than 5 percent of system cost and 3 percent of system power.

A single TPU v4 supercomputer scales to 4,096 chips connected in a three-dimensional torus. The paper reports that v4 outperforms the previous generation by about 2.1 times and improves performance per watt by roughly 2.7 times. It also introduces SparseCores, dedicated units that accelerate the embedding lookups common in recommendation models by 5 to 7 times while occupying only about 5 percent of die area and power. Against competing accelerators of the era, the authors measured v4 as faster than the Graphcore IPU Bow and competitive with or faster than the Nvidia A100 at lower power.

For a general reader, the paper shows that AI hardware progress is no longer just about faster chips: the interconnect, the way thousands of chips talk to each other, has become as important as the silicon itself, and optical networking is one of the levers large operators pull to keep scaling.

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning

Sources

Related