Large Scale GAN Training for High Fidelity Image Synthesis (BigGAN)

“Large Scale GAN Training for High Fidelity Natural Image Synthesis,” posted to arXiv on September 28, 2018 by Andrew Brock, Jeff Donahue, and Karen Simonyan, showed what happened when generative adversarial networks were pushed to a much larger scale than before. The model, known as BigGAN, generated diverse, high-resolution images conditioned on ImageNet class labels, a far harder task than the single-domain faces that earlier GANs handled well.

The central finding was that scale itself was a major driver of quality: training with much larger batch sizes and more model parameters substantially improved the realism and diversity of generated images. The authors paired this with orthogonal regularization on the generator and a technique they called the truncation trick, which lets a user trade off between image variety and image fidelity by restricting the range of the input noise. BigGAN set new records on standard image-quality metrics for class-conditional generation.

BigGAN demonstrated that, like language models, image generators benefited dramatically from more compute and data, foreshadowing the scaling-driven progress that would define the field. It also exposed the high cost and instability of training very large GANs, which helped motivate the later shift toward diffusion models. For a general reader, BigGAN is a clear early data point in the broader story that, across modalities, bigger models trained on more data tend to produce qualitatively better results.

Large Scale GAN Training for High Fidelity Image Synthesis (BigGAN)

Sources

Related