“Simple and Controllable Music Generation,” submitted to arXiv on June 8, 2023 by Jade Copet, Alexandre Defossez, and colleagues at Meta AI, introduced MusicGen, a text-to-music model whose selling point was in its title: simplicity. Where competing systems cascaded several models together, MusicGen used a single-stage transformer language model operating over discrete audio tokens from Meta’s EnCodec codec, with a clever token-interleaving pattern that removed the need for the usual stack of components.
MusicGen generates music conditioned on a text description, and can additionally be steered by a reference melody so the output follows a given tune in a new style. It produces both mono and stereo audio. Meta released the code and model weights through its AudioCraft library on GitHub, making a capable music generator freely available at a moment when Google’s comparable MusicLM was held back.
Why business readers should care: MusicGen put a controllable, openly available music generator into the hands of developers and researchers, accelerating both creative experimentation and the supply of tools downstream startups could build on. Its open release stood in deliberate contrast to the closed posture of rival labs, part of Meta’s broader strategy of shipping open models.