DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

“DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” submitted to arXiv on August 25, 2022 by Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman of Google Research, solved the personalization problem for image generators. A model like Stable Diffusion or Imagen can draw a generic dog, but it cannot draw your dog, because it has never seen it. DreamBooth fine-tunes the model on just a few images of a specific subject and binds that subject to a unique identifier token, so afterward a prompt using that token can place the exact subject in new scenes, poses, and lighting.

The technical hurdle is that naive fine-tuning on a handful of images causes the model to overfit and forget how to draw the broader class - train it on one corgi and it may start drawing every dog as that corgi, a failure the authors call language drift. DreamBooth introduces a class-specific prior-preservation loss that uses the model’s own generated images of the general class to keep that broader knowledge intact while the new subject is learned.

DreamBooth became one of the most widely used personalization techniques in the open image-generation community, often combined with lightweight adapter methods like LoRA to make the fine-tuning cheap. It is how hobbyists and studios alike create models that reliably reproduce a particular person, product, character, or art style. The paper was published at CVPR 2023.

Why business readers should care: DreamBooth is the bridge from a general image model to a brand-specific or product-specific one. The same few-shot personalization idea underlies custom avatars, product photography, and consistent character generation in commercial creative tools.

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Sources

Related