“Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets” was submitted to arXiv on January 6, 2022 by Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra of OpenAI. It documented a strange and now widely studied training phenomenon that the authors named “grokking.”
The setup is deliberately simple: small neural networks trained on algorithmically generated tasks, like modular arithmetic, where the rule is fully determined and the dataset is tiny. The networks first memorize the training data, reaching perfect training accuracy while remaining useless on held-out examples - textbook overfitting. The surprise is what happens if you keep training far past that point. After a long plateau, sometimes orders of magnitude more training steps later, the network’s validation accuracy abruptly climbs from chance to near-perfect: it suddenly “groks” the underlying rule and generalizes. The transition from memorization to genuine understanding happens long after the loss curves suggested training was over.
The paper also found that smaller training sets required disproportionately more optimization to reach this generalizing state, and it proposed these clean algorithmic tasks as a laboratory for studying how overparameterized networks generalize rather than merely memorize.
Grokking became a magnet for interpretability research, because the clean before- and-after lets researchers watch a network’s internal representations reorganize from a lookup table into an actual algorithm. It complicates the simple picture that overfitting is permanent and that you should stop training when validation loss stops improving, and it remains one of the more intriguing open puzzles about why and when deep learning generalizes.