Random Search for Hyper-Parameter Optimization

“Random Search for Hyper-Parameter Optimization” by James Bergstra and Yoshua Bengio appeared in the Journal of Machine Learning Research in 2012. It made a simple and surprisingly counterintuitive point that changed how practitioners search for good model settings.

For years the default way to tune hyperparameters was grid search, in which you pick a set of values for each hyperparameter and try every combination. Bergstra and Bengio showed both empirically and with a clear geometric argument that random search, sampling configurations at random within the same budget, finds equally good or better models in far less computation. The intuition is that in most problems only a few hyperparameters truly matter, and grid search wastes most of its trials varying the unimportant ones along fixed axes, whereas random search explores many more distinct values of each important hyperparameter.

The paper became one of the most-cited works on hyperparameter tuning and established random search as a strong, embarrassingly parallel baseline that more sophisticated methods like Hyperband and Bayesian optimization are still measured against.

For a business reader, it is a memorable lesson that the obvious, exhaustive approach is often wasteful, and a smarter sampling strategy can save large amounts of compute.

Random Search for Hyper-Parameter Optimization

Sources

Related