Broken Neural Scaling Laws

“Broken Neural Scaling Laws” was submitted to arXiv on October 26, 2022 by Ethan Caballero, Kshitij Gupta, Irina Rish, and David Krueger. It offered a more flexible mathematical description of how neural networks improve with scale than the simple power laws that had dominated the conversation.

Earlier scaling laws modeled performance as a single smooth power law in the amount of compute, parameters, or data: a straight line on a log-log plot. That captures the broad trend but misses important wrinkles. The authors proposed a “broken neural scaling law,” or BNSL, which is essentially a sequence of power-law segments smoothly joined at break points. This richer functional form can fit behaviors a single power law cannot, including the non-monotonic dip and recovery of double descent and the sudden, sharp inflections seen when models abruptly become competent at tasks like arithmetic.

The authors reported that the BNSL fits and, more importantly, extrapolates scaling behavior considerably more accurately than competing functional forms across a wide range of domains, including large-scale vision, language, audio, video, diffusion, and generative modeling, and they released code for reproducing the fits.

For a general reader, the work matters because so much of modern AI strategy rests on forecasting: deciding how much to spend on a bigger model hinges on predicting what that model will be able to do. A scaling law that captures the bends and jumps, not just the average slope, makes those expensive bets a little less of a gamble.

Sources

Related