“Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization” was submitted in March 2016 by Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. It approaches tuning not by being clever about which configurations to pick, but by being clever about how much compute each one gets.
Hyperband builds on random search and the idea of successive halving: start many randomly chosen configurations with a tiny budget (a few epochs or a small data sample), keep only the best-performing fraction, give them more budget, and repeat. The key contribution is a principled way to hedge across different budget-versus-configuration trade-offs, so the method does not need to know in advance whether it is better to try a few configurations thoroughly or many configurations briefly. Because poor configurations are stopped early, Hyperband can deliver over an order-of-magnitude speedup compared with full evaluation on deep-learning and kernel-based problems.
Hyperband became a standard early-stopping strategy and was later combined with Bayesian optimization in methods such as BOHB.
For a business reader, Hyperband captures a practical truth of large-scale experimentation: cut your losses fast on the approaches that are clearly losing, and pour resources into the ones that are winning.