Technology @lemmy.world ModerateImprovement @sh.itjust.works 4mo ago

Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless.

themarkup.org Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless – The Markup

Benchmarks used to rank AI models are several years old, often sourced from amateur websites, and, experts worry, lending automated systems a dubious sense of authority

You're viewing a single thread.

26 comments

This is the way:

https://chat.lmsys.org/?arena

26 comments