Published in 2017 by Paulius Micikevicius and colleagues at NVIDIA and Baidu, this paper established a now-standard technique for training large neural networks faster and with less memory by using half-precision (16-bit) floating-point numbers instead of full 32-bit precision. The challenge is that 16-bit floats have a much smaller numerical range, which can cause small gradient values to vanish to zero during training.
The authors propose two fixes that together make 16-bit training reliable. First, they keep a single master copy of the model weights in full 32-bit precision that accumulates the tiny updates from each optimizer step, so small changes are not lost. Second, they apply loss scaling, multiplying the loss by a constant before computing gradients to push small values into the representable range, then scaling back afterward. With these techniques the method matches full-precision accuracy across image, speech, language, and generative models while roughly halving memory consumption.
The timing was deliberate: NVIDIA’s Volta GPUs had just introduced Tensor Cores that ran 16-bit matrix math far faster than 32-bit. Mixed precision turned that hardware capability into a practical training recipe. For a business reader, this paper is a key reason modern AI models can be trained at all within reasonable cost and memory budgets, and the approach remains the default for nearly every large model trained today.