The linked pdf lists the deficiencies of the LLM responses. They are varied and it sometimes misses the mark completely or cant grasp vital context.
Still pretty useless comparison, they testet 10 university level humans against Llama2-70B. The model has fallen out of use completely by now and was never really great at summarization. The study didnt fine tune it either, so this isnt really representative of the current situation.
There are far better models out, that were either especially trained for summarization or can be easily fine tuned to excel at it. Not to mention the Llama3 and 3.1 series, with the crazy 405B model.
This is an old study, they tested University level adults against the standard Llama2-70B.
Kinda absolete now, the model has completely fallen out of use, for the newer and far better 3 and 3.1 Versions. It also wasnt fine tuned for summarization, and while base L2-70B was OK, it wasnt great at anything without fine tuning.
This clickbait title also sounds like self gratification, the abysmal reading comprehension in the Internet is directly counter to it. The average human found on the Internet doesnt approch the level of literary capabilities, that those ten human testers showed in the study.