Chatbot Arena has seen 40,000 people vote on the best AI

Joel Loynds
firefly ai generated image of two robots fighting in new york

A new analysis tool from UC Berkeley, the Chatbot Arena, pits two AI chatbots together to determine which one is the best.

Chatbot Arena is a new AI testing ground designed by UC Berkeley to try and figure out which is the best. The AI battleground pits two random AI models against each other, and you then vote on which gave the best answer.

All of this is then tallied up in a leaderboard, where GPT-4, which powers ChatGPT, currently reigns supreme. Chatbot Arena currently houses 20 different language models, including open-source models from around the web.

In our own tests, we were introduced to models that we usually wouldn’t interact with on a regular basis. These included Palm2 and guanaco-33b.

Speaking with PC Mag, the creator, Hao Zhang, said that 40,000 people have taken part in the votes. Zhang sees validation by humans as instrumental in the development of language models and generative AI:

“It mostly measures human preference, and its ability to follow instructions and do the task the human wants, which is a very important factor in making a model useful.”

AI boom has led to multiple chatbots

Since the AI boom, language models have seen exponential development. This includes things like DarkBERT, a language model designed to analyze the dark web to keep users safe.

Meanwhile, Microsoft has invested billions into ChatGPT creator OpenAI, which has resulted in Windows 11 having GPT-4 AI fully integrated into the operating system.

We mentioned above that we tested the AI Chatbot Arena and found that some of the lesser-known models are still early in development. However, it was fascinating to see them fail in unusual ways. Using a particular line of questioning, “Can you give me a list of Duke Nukem 3D weapons?” we found that some of the models would confuse the classic boomer shooter with its prior two games, or started to make up weapons entirely.

Related Topics

About The Author

E-Commerce Editor. You can get in touch with him over email: He's written extensively about video games and tech for over a decade for various sites. Previously seen on Scan, WePC, PCGuide, Eurogamer, Digital Foundry and A deep love for old tech, bad games and even jankier MTG decks.