SeamlessM4T (Meta speech-and-text translation)

SeamlessM4T is a foundational multilingual and multitask model announced by Meta on August 22, 2023, designed to translate and transcribe across both speech and text in a single system. Earlier pipelines stitched together separate models for speech recognition, translation, and speech synthesis; SeamlessM4T folds these into one model, reducing the error compounding that happens when components are chained.

Per Meta’s announcement, the model supports five capabilities: automatic speech recognition for about 100 languages; speech-to-text translation for about 100 input and output languages; speech-to-speech translation for about 100 input languages and 35 output languages plus English; text-to-text translation for about 100 languages; and text-to-speech translation with the same coverage. That makes it one of the most broadly capable open speech-translation systems, and it was released under a Creative Commons non-commercial license to support research.

SeamlessM4T builds on Meta’s earlier translation work, including the No Language Left Behind text models and massively multilingual speech research, and points toward the long-standing goal of a universal translator that works in spoken conversation, not just on written text.

For organizations with multilingual voice products, the significance is a single model that can listen, translate, and speak across many languages, simplifying systems that previously required several specialized components.

SeamlessM4T (Meta speech-and-text translation)

Sources

Related