LMArena (Chatbot Arena)

LMArena, originally known as Chatbot Arena, measures which AI chatbot people actually prefer. A user types a prompt, sees responses from two anonymous models side by side, and votes for the better one. Aggregating hundreds of thousands of these pairwise votes produces a ranking, computed with statistical rating methods similar to those used for chess. Because real users judge on real prompts, it captures overall helpfulness in a way fixed test sets cannot.

The platform and methodology were described by Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica in “Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference,” posted in March 2024. The paper reported over 240,000 votes and found crowd judgments aligned well with expert assessments. The live site presents itself as an official AI ranking and LLM leaderboard.

LMArena became influential because it provides a continuously updated, preference-based ranking that is hard to game with narrow optimization, making it a go-to reference when comparing flagship models.

The live rankings are hosted on the official site and change constantly, so they are not reproduced here.

Sources

Related