Meta releases the Llama 4 mixture-of-experts models

On April 5, 2025, Meta released the first models in the Llama 4 family, saying they “will enable people to build more personalized multimodal experiences.” Llama 4 marked Meta’s shift to a mixture-of-experts (MoE) architecture, where only a fraction of the model’s parameters activate for any given token. Llama 4 Scout uses 17 billion active parameters across 16 experts (109 billion total), and Llama 4 Maverick uses 17 billion active parameters across 128 experts (400 billion total). Meta also previewed a still-training teacher model, Llama 4 Behemoth, with 288 billion active parameters and close to two trillion total.

The headline capability was context length: Meta said Scout offers an “industry leading 10 million tokens” context window, a large jump over prior open models. The models were also natively multimodal, designed to handle text and images together from the start rather than bolting vision on afterward.

Llama 4 mattered as Meta’s bid to keep the leading open-weight model line competitive with closed frontier systems while adopting the MoE efficiency techniques that rivals like Mistral, DeepSeek, and Qwen had already embraced. It pushed the open ecosystem toward sparse, long-context, multimodal designs.

Sources

Last verified June 7, 2026