mixup: Beyond Empirical Risk Minimization

“mixup: Beyond Empirical Risk Minimization” was submitted to arXiv on October 25, 2017 by Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz at MIT and Facebook AI Research. It proposed a strikingly simple data-augmentation idea that improves how well image classifiers generalize.

Standard training, what the title calls empirical risk minimization, shows the network real examples paired with their true labels. mixup instead constructs new synthetic training points by taking convex combinations of pairs of examples and the same combinations of their labels: blend two images, say seventy percent cat and thirty percent dog, and the target becomes seventy percent cat and thirty percent dog. Training on these mixtures encourages the model to behave linearly between examples, smoothing its predictions rather than letting it jump abruptly from one confident class to another.

The reported benefits were broad. Across ImageNet, CIFAR-10, CIFAR-100, a speech commands dataset, and several UCI tabular datasets, mixup improved the generalization of state-of-the-art architectures. It also reduced the network’s tendency to memorize corrupted labels, increased robustness to adversarial examples, and helped stabilize the training of generative adversarial networks.

For a general reader, mixup is a clean illustration of a counterintuitive lesson in machine learning: training a model on slightly unrealistic, blended data can make it more reliable on the real thing, because it forces the model to interpolate sensibly rather than overfit to the exact points it was shown.

Sources

Last verified June 7, 2026