Stability AI, the company behind the open image model Stable Diffusion, launched Stable Audio on September 13, 2023 as its first product for generating music and sound effects. Users type a text prompt - for example, “Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric” - and specify a duration, and the system generates a corresponding audio track at 44.1 kHz, the standard sample rate of CD-quality sound.
Under the hood, Stable Audio uses a latent diffusion model, the same broad approach as Stable Diffusion, but trained on audio. A distinctive feature was timing control: the model was conditioned on both the desired duration and a start time, letting it produce tracks of a specified length with structure rather than fixed-size clips. The free tier generated tracks up to 45 seconds, while a paid tier extended that to 90 seconds with a license for commercial use. The model was trained on music and metadata from the licensed library AudioSparx.
Why business readers should care: Stable Audio extended a leading image-generation lab into music and showed a path toward generative audio built on licensed training data, a contrast to the unlicensed-scraping disputes engulfing other AI music systems. The licensing arrangement was itself a product decision aimed at making the output safe for commercial use.