Consistency Models

“Consistency Models,” posted to arXiv on March 2, 2023 by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever of OpenAI, addressed the biggest practical drawback of diffusion models: speed. A diffusion model normally generates an image by running many sequential denoising steps, often dozens or hundreds, which makes generation slow and computationally expensive.

A consistency model is trained so that, from any point along the noising trajectory, it maps directly back to the clean data in a single step. This lets it generate a high-quality sample in one forward pass instead of an iterative loop, while still allowing a few extra steps to be traded for better quality when desired. The authors showed two ways to train these models, either by distilling an existing pretrained diffusion model or by training one from scratch as a standalone generator, and they reported strong results along with zero-shot capabilities such as image inpainting and colorization without task-specific training.

Consistency models were a milestone in the drive to make diffusion-based generation fast enough for interactive and on-device use, inspiring later fast samplers and real-time image tools. For a general reader, they represent the answer to a simple, important question: now that we know how to generate beautiful images by gradual denoising, can we get the same result almost instantly? Consistency models showed that, to a large degree, you can.

Sources

Related