Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.
Benchmarks used to rank AI models are several years old, often sourced from amateur websites, and, experts worry, lending automated systems a dubious sense of authority
You're viewing a single thread.
This is the way:
https://chat.lmsys.org/?arena