NVIDIA Hopper and the H100: The Transformer Engine

In March 2022 NVIDIA announced the Hopper architecture and its flagship H100 GPU, the chip that would power much of the generative AI boom. Built with 80 billion transistors on a custom TSMC 4N process, the H100’s signature feature was a dedicated Transformer Engine paired with fourth-generation Tensor Cores supporting FP8, an 8-bit floating-point format. The engine dynamically manages precision across the many layers of a Transformer, and NVIDIA reported it could accelerate Transformer networks by as much as 6 times over the prior generation without losing accuracy.

The H100 was the first GPU to use HBM3 memory, reaching 3 terabytes per second of memory bandwidth, a direct response to the memory-bound nature of serving large models. Its fourth-generation NVLink, combined with an external NVLink Switch, could connect up to 256 H100 GPUs at far higher bandwidth than before, letting operators build large training and inference clusters.

Demand for the H100 became one of the defining business stories of the era, with the chip in acute shortage as labs and cloud providers raced to train and serve ever-larger models, and it drove NVIDIA’s valuation to historic highs. For a business reader, the H100 is the piece of hardware most directly associated with the wave of large language models that followed ChatGPT, and its FP8 Transformer Engine shows how chip design and model architecture now co-evolve.

NVIDIA Hopper and the H100: The Transformer Engine

Sources

Related