Denoising Diffusion Probabilistic Models (DDPM)

“Denoising Diffusion Probabilistic Models,” posted to arXiv in June 2020 by Jonathan Ho, Ajay Jain, and Pieter Abbeel of UC Berkeley, is the paper that turned diffusion models from a theoretical curiosity into a practical way to generate high-quality images. The idea of diffusion - gradually destroying data with noise and learning to reverse the process - had been proposed years earlier, but DDPM found the training recipe that made it produce sharp, realistic samples.

The method has two halves. In the forward process, an image is corrupted by adding small amounts of Gaussian noise over many steps until it becomes pure static. A neural network is then trained to reverse one step at a time - to predict and remove the noise - so that, starting from random static, repeated denoising assembles a brand-new image. The authors’ key contribution was a simplified training objective, derived from a connection between diffusion and denoising score matching, that was both stable to optimize and produced excellent results. On unconditional CIFAR-10 generation they reported an Inception score of 9.46 and a then state-of-the-art FID of 3.17.

DDPM launched the diffusion era that now dominates image, audio, and video generation. Within two years its descendants powered DALL-E 2, Stable Diffusion, Midjourney, and Imagen, displacing the generative adversarial networks that had led image synthesis for the previous half-decade. The denoising network at the heart of those systems is typically a U-Net, tying this paper directly to the 2015 segmentation architecture.

Denoising Diffusion Probabilistic Models (DDPM)

Sources

Related