MTEB: Massive Text Embedding Benchmark

MTEB (Massive Text Embedding Benchmark), introduced by Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers in October 2022, became the standard way to compare text embedding models. Before MTEB, embeddings were usually judged on a handful of datasets from a single task type, which left it unclear whether a model that excelled at, say, semantic similarity would also do well at clustering or reranking.

MTEB spans 8 embedding task categories, 58 datasets, and 112 languages, covering retrieval, classification, clustering, reranking, semantic similarity, and more. The authors evaluated 33 models and found that no single embedding method dominated across all tasks, an important early signal that there was no universally best embedding and that model choice depends on the job.

The benchmark ships with open-source code and a public Hugging Face leaderboard, which quickly turned into the scoreboard the embedding community races on. New models from OpenAI, Cohere, BGE, E5, and many open-source labs routinely report MTEB scores at launch.

For a business choosing an embedding model to power search or a RAG system, MTEB is the practical reference point: it lets teams compare candidates on the kinds of tasks they actually run, rather than trusting a vendor’s single cherry-picked number.

MTEB: Massive Text Embedding Benchmark

Sources

Related