Benchmarks used to rank AI models are several years old, often sourced from amateur websites, and, experts worry, lending automated systems a dubious sense of authority
The article makes the valid argument that LLMs simply predict next letters based on training and query.
But is that actually true of latest models from OpenAI, Claude etc?
And even if it is true, what solid proof do we have that humans aren’t doing the same? I’ve met endless people who could waffle for hours without seeming to do any reasoning.
I think I know enough about these concepts to know that there isn’t any conclusive proof, observed in output or system state, to establish consensus that human speech output is generated differently to how LLMs generate output. If you have links to any papers that claim otherwise, I’ll be happy to read them.
I've just diagonally read a google link where the described way humans work with language appears for me to be very similar to GPT in rough strokes. Only human brain does a lot more than language. Hence the comparisons to the mechanical Turk.
I’m not saying humans and LLMs generate language the same way.
I’m not saying humans and LLMs don’t generate language the same way.
I’m saying I don’t know and I haven’t seen clear data/evidence/papers/science to lean one way or the other.
A lot of people seem to believe humans and LLMs don’t generate language the same way. I’m challenging that belief in the absence of data/evidence/papers/science.
You're actually incorrect in regards to Russell's teapot in this instance. The correct approach is to admit to yourself and others you don't know. Not to assume a negative became you can't prove a positive, if you can't prove the negative either.
I know I don't know, but this is a continuous system and the probability of something being in one particular state is infinitely small ; the probability of it being in certain range of that particular state is, ahem, not, but with the amount of moving things in LLMs and in human brains there are most likely quite a few radical differences between laws describing them.
Why am I incorrect? You can't disprove that there isn't that teapot flying at a certain orbit as well. Or you can, but not for all such statements.
What would be the criterion for saying that yes, human brain works with language just in the same way as LLMs do? What would be "same"? Logic exists inside defined constraints in the continuous world.
Unless you define what would prove something, you can't disprove it, but it's also not a scientific hypothesis. That's Popper's criterion.
what solid proof do we have that humans aren’t doing the same?
Humans are not computers. Brains are not LLMs...
Given a totally reasonable hypothesis (humans =/= computers) and a completely outlandish hypothesis (humans = computers), I would need much more 'proof' for the later.
Well, brains are a network of neurons (we can evidentially verify this) trained on … eyes, ears, sense of touch, taste, smell and balance (rewarded by endorphins released by the old brain on certain hardcoded stimuli). LLMs are a network of neurons trained on text and images (rewarded by producing text that mimics input text and some reasoning tests).
It’s not given that this results in the same way of dealing with language, given the wider set of input data for a human, but it’s not given that it doesn’t either.
Humans predict things by assigning meaning to events and things, because in nature, we're constantly trying to guess what other creatures are planning. An LLM does not hypothesize what your plans are when you communicate to it, it's just trying to predict the next set of tokens with the greatest reward value. Even if you were to use literal human neurons to build your LLM, you would still have a stochastic parrot.
Why should I need to prove a negative? The burden is on the ones claiming an LLM is sentient. LLMs are token predictors, do I need to present evidence of this?