A diffusion model learns to generate data by reversing a gradual corruption process. In the foundational 2020 paper “Denoising Diffusion Probabilistic Models” by Jonathan Ho, Ajay Jain, and Pieter Abbeel, the idea is to take a clean image, add random noise step by step until it becomes pure static, and then train a neural network to undo that process one step at a time. Once trained, the model can start from pure noise and run the denoising steps in reverse to produce a brand-new, realistic image.
The approach became practical at scale with the 2021 paper “High-Resolution Image Synthesis with Latent Diffusion Models” by Robin Rombach and colleagues. Rather than running the expensive noising and denoising process directly on full-resolution pixels, latent diffusion does the work in a compressed mathematical space, which dramatically cuts the computing cost. This efficiency breakthrough is what made it feasible to run high-quality image generation on ordinary hardware and underpins the systems most people have used.
Diffusion is now the dominant technique behind text-to-image tools and has spread into video, audio, and other domains. It is often compared with the earlier generative adversarial network approach, but diffusion tends to be more stable to train and produces a wider variety of outputs.
Why business readers should care: Diffusion models are the engine behind the consumer and enterprise image-generation products that emerged from 2022 onward, including DALL-E 2 and Stable Diffusion. Understanding that these tools share a common technical foundation helps leaders see why capabilities advanced so quickly across many vendors at once, and why the same method now extends to video and audio.