NLLB-200 (No Language Left Behind)

NLLB-200 is the translation model released by Meta AI in 2022 as part of the No Language Left Behind project, announced on Meta’s research site and detailed in the paper “No Language Left Behind: Scaling Human-Centered Machine Translation” (arXiv 2207.04672). It is a single model that translates directly between 200 languages, supporting roughly 40,000 translation directions, with a deliberate focus on the 150 low-resource languages that commercial systems had largely ignored.

The headline numbers are large: a model of about 54 billion parameters, roughly five times the size of the earlier M2M-100, trained on around 18 billion parallel sentences and reported to improve translation quality by an average of 44% over prior state-of-the-art systems, as measured on the FLORES-200 benchmark released alongside it. To get there, the team mined and cleaned parallel data for languages with very little of it and built tooling to add languages without degrading the rest.

Meta open-sourced the models and benchmark and integrated NLLB into real products, including translation on Facebook and Instagram and the content-translation tool on Wikipedia, which extended automated translation to languages that had never had it.

NLLB-200 is a landmark because it reframed translation progress around the languages left out of the AI boom, showing that one open model could serve a large slice of the world’s languages at usable quality.

NLLB-200 (No Language Left Behind)

Sources

Related