Mistral releases Mixtral 8x7B, an open mixture-of-experts model

On December 11, 2023, Mistral AI released Mixtral 8x7B, described as “a sparse mixture of experts model (SMoE) with open weights,” licensed under Apache 2.0. Instead of a single dense network, Mixtral routes each token through a small subset of expert sub-networks: it has 46.7 billion total parameters but uses only about 12.9 billion per token, giving it the quality of a much larger model at the inference cost of a smaller one.

Mistral reported that Mixtral “outperforms Llama 2 70B on most benchmarks with 6x faster inference” and “matches or outperforms GPT-3.5 on most standard benchmarks.” The instruction-tuned variant scored 8.30 on MT-Bench, putting it in the league of leading chat models of the time.

Mixtral was one of the first openly downloadable, high-quality mixture-of-experts models, bringing an architecture previously confined to closed frontier labs into the open-weights ecosystem. It reinforced Mistral’s position as Europe’s leading independent AI lab and helped popularize MoE as a practical way to scale capability without scaling per-token compute.

Mistral releases Mixtral 8x7B, an open mixture-of-experts model

Sources

Related