Masakhane is an open, continent-wide, distributed research community focused on natural language processing for African languages. It was introduced to the wider research world in “Masakhane — Machine Translation For Africa,” accepted at the AfricaNLP workshop at ICLR 2020 and posted to arXiv on March 13, 2020 by Iroro Orife, Julia Kreutzer, Blessing Sibanda, and more than twenty other contributors. The name comes from isiZulu and means “we build together.”
The problem Masakhane set out to address is stark: Africa is home to over 2,000 languages, yet almost none are well supported by modern NLP, held back by limited funding, scattered data, and a lack of standard benchmarks. The community’s answer is a distributed, online model in which researchers across the continent contribute translations, datasets, and models for their own native languages. Its stated mission is to strengthen NLP research in African languages “for Africans, by Africans,” and it explicitly rejects “parachute research,” where outside groups extract data without building local capacity.
Masakhane has since produced widely used resources, including benchmark datasets for African-language tasks, and has become a model for community-driven, data-sovereign AI development elsewhere.
For anyone thinking about who AI serves, Masakhane is a concrete example that closing the language gap is as much about community, ownership, and incentives as it is about model architecture.