BLOOM (BigScience multilingual LLM)

BLOOM is a 176-billion-parameter open-access language model described in “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model” (arXiv 2211.05100, first posted November 9, 2022). It was produced by the BigScience Workshop, a year-long collaboration of hundreds of researchers, and the paper carries 392 authors, reflecting that unusually broad effort.

What set BLOOM apart was both its openness and its multilingual scope. Most large language models of its era were English-dominated and either closed or restrictively licensed. BLOOM was trained on the ROOTS corpus, which combined hundreds of sources covering 46 natural languages and 13 programming languages (59 in total), with strong representation of languages from Africa, South Asia, and elsewhere that were underserved by other models. It was released under a Responsible AI License intended to allow broad use while discouraging harmful applications.

The model showed competitive performance across many benchmarks, with stronger results after multitask prompted finetuning, and it demonstrated that a frontier-scale model could be built transparently and in the open rather than only inside a few large companies.

For organizations, BLOOM matters as proof that openly governed, community-built models can reach large scale and serve languages that commercial vendors deprioritize, widening who gets to benefit from large language models.

Sources

Last verified June 7, 2026