Stability AI launches Stable Audio for text-to-music

Stability AI, the company behind the open image model Stable Diffusion, launched Stable Audio on September 13, 2023 as its first product for generating music and sound effects. Users type a text prompt - for example, “Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric” - and specify a duration, and the system generates a corresponding audio track at 44.1 kHz, the standard sample rate of CD-quality sound.

Under the hood, Stable Audio uses a latent diffusion model, the same broad approach as Stable Diffusion, but trained on audio. A distinctive feature was timing control: the model was conditioned on both the desired duration and a start time, letting it produce tracks of a specified length with structure rather than fixed-size clips. The free tier generated tracks up to 45 seconds, while a paid tier extended that to 90 seconds with a license for commercial use. The model was trained on music and metadata from the licensed library AudioSparx.

Why business readers should care: Stable Audio extended a leading image-generation lab into music and showed a path toward generative audio built on licensed training data, a contrast to the unlicensed-scraping disputes engulfing other AI music systems. The licensing arrangement was itself a product decision aimed at making the output safe for commercial use.