Mixture-of-Agents (MoA) is a method that combines several large language models in layers so that their collective strengths add up. It was introduced in “Mixture-of-Agents Enhances Large Language Model Capabilities,” posted to arXiv on June 7, 2024 by Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou, with work tied to Together AI and Stanford.
The architecture is built from stacked layers, each containing several LLM agents. Every agent in a layer receives all of the outputs from the previous layer as auxiliary context and uses them to generate a refined response, so the answer is progressively improved as it passes upward through the stack. The headline result was that an MoA built only from open-source models led AlpacaEval 2.0 with a score of 65.1 percent, against 57.5 percent for GPT-4 Omni, and also set state-of-the-art marks on MT-Bench and FLASK. The name deliberately echoes mixture-of-experts, but the unit being mixed is a whole model treated as an agent rather than a sub-network inside one model.
Why business readers should care: MoA is concrete evidence that orchestrating several cheaper models can, on some tasks, outperform a single frontier model - a different cost curve than simply buying the biggest model available. The tradeoff is latency and total token spend, since every layer runs multiple models, so the approach suits offline or quality-critical work more than low-latency interactive use.