Hyperparameter Optimization

Hyperparameters are the configuration choices of a machine-learning method that are set before training rather than learned from data, for example the learning rate, the number of layers or trees, regularization strength, and batch size. Hyperparameter optimization is the task of searching for the combination of these settings that produces the best performance on held-out data.

The simplest approaches are grid search, which tries every combination from a predefined set, and random search, which samples combinations at random. Bergstra and Bengio’s 2012 JMLR paper showed that random search is usually more efficient than grid search because only a few hyperparameters tend to matter and grid search wastes trials on the rest. More sophisticated methods include Bayesian optimization, which builds a model of how performance depends on the settings and chooses the next trial intelligently, and multi-fidelity methods such as Hyperband and BOHB, which save compute by stopping unpromising trials early. Population Based Training even adjusts hyperparameters during training. Open-source frameworks like Optuna and Ray Tune package these strategies for everyday use.

For a business reader, hyperparameter optimization is a routine but high-leverage step: the difference between default and well-tuned settings can be the difference between a model that works and one that does not, and automating the search saves expensive expert time.

Sources

Related