“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” was submitted to arXiv in May 2019 by Mingxing Tan and Quoc Le of Google, and presented at the ICML conference that year. It tackled a practical question that earlier architecture work had treated ad hoc: when you have a bigger compute budget, how should you grow a network?
Researchers had typically scaled one thing at a time, making a network deeper, or wider, or feeding it higher-resolution images. The authors argued these three dimensions are coupled and should grow together in a balanced way. Their compound scaling method ties depth, width, and input resolution to a single coefficient, so that increasing the budget expands all three by fixed, principled ratios. Starting from a compact baseline found by neural architecture search, they produced a family of models, EfficientNet-B0 through B7, that traced out a much better accuracy-versus-cost curve than prior networks. The headline result, EfficientNet-B7, reached 84.3 percent top-1 accuracy on ImageNet while being roughly eight times smaller and six times faster at inference than the best previous convolutional model.
EfficientNet became a default choice when accuracy had to be balanced against model size and latency, and its compound-scaling recipe influenced how later models were sized. It also showcased neural architecture search as a way to find strong baseline designs rather than relying purely on human intuition.
For a general reader, EfficientNet is a clear case of efficiency as the headline: the same or better quality at a fraction of the size and speed cost, which is exactly what matters when a model has to run on a phone or at scale in a data center.