⚔️ LMSYS Chatbot Arena (Multimodal): Benchmarking LLMs and VLMs in the Wild

Blog | GitHub | Paper | Dataset | Twitter | Discord | Kaggle Competition

New Launch! Jailbreak models at RedTeam Arena.

📜 Rules

  • Ask any question to two anonymous models (e.g., ChatGPT, Gemini, Claude, Llama) and vote for the better one!
  • You can continue chatting until you identify a winner.
  • Vote won't be counted if model identity is revealed during conversation.
  • NEW Image Support: Upload an image on your first turn to unlock the multimodal arena! Images should be less than 15MB.

🏆 Chatbot Arena Leaderboard

  • We've collected 1,000,000+ human votes to compute an LLM Elo leaderboard for 100+ models. Find out who is the 🥇LLM Champion here!

👇 Chat now!

GPT-4o: The flagship model across audio, vision, and text by OpenAI Grok-2: Grok-2 by xAI Gemini: Gemini by Google
Claude 3.5: Claude by Anthropic Llama 3.1: Open foundation and chat models by Meta Mixtral of experts: A Mixture-of-Experts model by Mistral AI
GPT-4-Turbo: GPT-4-Turbo by OpenAI Jamba 1.5: Jamba by AI21 Labs Gemma 2: Gemma 2 by Google
Claude: Claude by Anthropic DeepSeek Coder v2: An advanced code model by DeepSeek Nemotron-4 340B: Cutting-edge Open model by Nvidia
Llama 3: Open foundation and chat models by Meta Athene-70B: A large language model by NexusFlow Qwen Max: The Frontier Qwen Model by Alibaba
GPT-3.5: GPT-3.5-Turbo by OpenAI Yi-Large: State-of-the-art model by 01 AI Yi-Chat: A large language model by 01 AI
Phi-3: A capable and cost-effective small language models (SLMs) by Microsoft Reka Core: Frontier Multimodal Language Model by Reka Reka Flash: Multimodal model by Reka
Command-R-Plus: Command R+ by Cohere Command R: Command R by Cohere Qwen 1.5: The First 100B+ Model of the Qwen1.5 Series
GLM-4: Next-Gen Foundation Model by Zhipu AI DBRX Instruct: DBRX by Databricks Mosaic AI InternVL 2: Multimodal Model developed by OpenGVLab
internlm2_5-20b-chat: Register the description at fastchat/model/model_registry.py

Terms of Service

Users are required to agree to the following terms before using the service:

The service is a research preview. It only provides limited safety measures and may generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. Please do not upload any private information. The service collects user dialogue data, including both text and images, and reserves the right to distribute it under a Creative Commons Attribution (CC-BY) or a similar license.

Please report any bug or issue to our Discord/arena-feedback.

Acknowledgment

We thank UC Berkeley SkyLab, Kaggle, MBZUAI, a16z, Together AI, Hyperbolic, RunPod, Anyscale, HuggingFace for their generous sponsorship.