mT5: A Massively Multilingual Text-to-Text Transformer

mT5 is the multilingual version of Google’s T5 model, introduced in “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” posted to arXiv on October 22, 2020 by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. It applies T5’s idea of casting every task as text-in, text-out, but pre-trains across 101 languages using a Common Crawl-based corpus called mC4.

The model reached state-of-the-art results on several multilingual benchmarks at the time of release. The paper also flagged and fixed a specific failure mode of multilingual generation called “accidental translation,” where in a zero-shot setting the model sometimes translates part of its answer into the wrong language. The authors described a simple technique to discourage this behavior, which made zero-shot cross-lingual generation more reliable.

Because mT5 was released openly in a range of sizes, from small to 13 billion parameters, it became a common starting point for researchers and companies building multilingual systems for summarization, question answering, and classification across many languages at once.

For practitioners, mT5 demonstrated that the same simple text-to-text framing that worked in English could be scaled to a hundred-plus languages, lowering the barrier to serving non-English users with a single, well-understood architecture.

mT5: A Massively Multilingual Text-to-Text Transformer

Sources

Related