NVIDIA A100: The Ampere Data-Center GPU

In May 2020 NVIDIA announced the A100, the first GPU built on its Ampere architecture, which became the defining accelerator of the large-model era. The A100 carried third-generation Tensor Cores and introduced a new number format called TF32 that gave a large speedup for training without requiring any code changes. It shipped with HBM2e high-bandwidth memory in 40 gigabyte and later 80 gigabyte versions, the 80 gigabyte variant offering more than two terabytes per second of memory bandwidth.

A distinctive feature was Multi-Instance GPU (MIG), which lets a single A100 be partitioned into as many as seven isolated GPU instances, each with its own memory and compute, so a chip could be shared across many smaller jobs or right-sized to demand. For the largest workloads, many A100s were linked together with NVLink and NVSwitch into multi-GPU systems. NVIDIA reported up to roughly 20 times the performance of the prior generation across a range of AI tasks.

The A100 was the workhorse behind a remarkable run of models, including GPT-3 and many of the systems that followed, and demand for it helped drive NVIDIA’s data-center business to record levels. For a business reader, the A100 is the chip that turned the abstract idea of scaling laws into a concrete supply chain: training a frontier model in this period largely meant assembling thousands of A100s.