The Lottery Ticket Hypothesis

“The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks” was submitted to arXiv on March 9, 2018 by Jonathan Frankle and Michael Carbin of MIT. It won a Best Paper award at ICLR 2019 and reframed how researchers think about why large networks are easy to train but small ones often are not.

The puzzle it addressed comes from pruning. It has long been known that after a network is trained, most of its weights can be deleted with little loss of accuracy - the dense network was far larger than the final function needed. The natural question is why you cannot just train that small pruned network directly from scratch. Usually you cannot; trained from random initialization, the small network underperforms. Frankle and Carbin’s hypothesis is that a dense, randomly initialized network contains a small subnetwork - a “winning ticket” - that, if you keep its original initial weights, can be trained in isolation to match the full network’s accuracy, often in fewer iterations. The catch is the initialization: the winning ticket only wins with the specific random values it was born with.

They found these subnetworks by training, pruning the smallest-magnitude weights, then resetting the survivors to their original initial values and retraining - “iterative magnitude pruning.” The winning tickets were typically 10 to 20 percent of the original size yet matched or beat it.

The result suggested that part of why overparameterization helps is that a bigger network buys more lottery tickets - more chances that some lucky subnetwork starts in a trainable configuration. The hypothesis sparked a large literature probing how far it holds at scale, and it remains a touchstone in the study of sparsity and why neural networks generalize.

The Lottery Ticket Hypothesis

Sources

Related