SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

“SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis,” posted to arXiv on July 4, 2023 by Dustin Podell and colleagues at Stability AI, described the model behind Stable Diffusion XL, a substantial upgrade to the open Stable Diffusion line that had popularized accessible text-to-image generation.

The paper details several concrete improvements over the earlier Stable Diffusion. SDXL uses a UNet backbone roughly three times larger with more attention, and it combines two text encoders to better understand prompts. It introduces additional conditioning on image size and cropping to make better use of training data, and it pairs a base model with a separate refinement model that runs as a second stage to sharpen fine detail. Together these changes produced images that were markedly more detailed and prompt-faithful, competitive with leading commercial systems while remaining openly available.

Because it was released with open weights, SDXL became a workhorse of the open generative-AI ecosystem, the base on which a large community built fine-tunes, LoRA adapters, and controllable-generation tools. For a general reader, SDXL illustrates how the open-source side of image generation kept pace with closed commercial offerings, and it is the version many independent developers and artists built their workflows around.

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Sources

Related