TechTakes @awful.systems David Gerard @awful.systems 5mo ago

Don’t use AI to summarize documents — it’s worse than humans in every way

pivot-to-ai.com Don’t use AI to summarize documents — it’s worse than humans in every way

The main use case for LLMs is writing text nobody wanted to read. The other use case is summarizing text nobody wanted to read. Except they don’t do that either. The Australian Securities and…

110

110 comments

LLMs, and everyone who uses them to process information:
Facts are not a data type for LLMs

I kind of like this because it highlights the way LLMs operate kind of blind and drunk, they're just really good at predicting the next word.
- They’re not good at predicting the next word, they’re good at predicting the next common word while excluding most unique choices.
  
  What results is essentially if you made a Venn diagram of human language and only ever used the center of it.
  
  Yes, thanks for clarifying what I meant! AI will never create anything unique unless prompted uniquely and even then it will tend to revert back to what you expect most.
ATTN: If you're coming into this thread to say, "The output of AI is bad because your prompts suck," I'm just proud that you managed to figure out how to use the internet at all. Good job, you!
- remember remember, eternal september
  
  (not that I much agree with the classist overtones of the original, but fuck me does it come to mind often)
Well, to be fair, AI can do it in seconds. Which beats humans.

But if that is relevant if the results are worthless is another question.
- Yeah it changes the task from note taking or summarizing to proofreading.
  
  And proofreading is notably more complex and has a worse failure state than just writing your own summary.
how the hell did this of all the posts turn into a promptfondler shooting gallery
- 1.26K subscribers
- Promptfondler has to be my new favourite slur!
  
  *Epithet
Made strange choices about what to highlight.

They certainly do. For a while it was common to see AI-generated summaries under links to articles on lemmy, so I got a feel for them. Seems to me you would not need any fancy artificial intelligence to do equally well: Just take random excerpts, or maybe just read every third sentence.
i have seen the light from the helpful posters here, made up bullshit alleged summaries of documents are great actually
Dang everyone here needs to look at a tree or a cat or something. Energy is wack in here
- I just went outside and appreciated the rendering
  
  Pretty nice right? I did the trees and cats.
- While reading this entire stuff I periodically looked at my cat and let out a sigh, and he just looks at me with that knowing gaze
  
  "Ye, you are all dumb, hoomans. Don't think about it. Pet me now."
- Nearly every cat is a tree-cat.
Could it be because a statistical relation isn't the same as a semantic one? No, I must be prompting it wrong. I'll just add "engineer" to my title and then everyone will take me seriously.
Is it only me, or is the linked article not super long on details & is reaching a conclusion from 2 examples? This is important & I need to hear more, & I’m generally biased against AI at this point— but the article isn’t doing enough to convince me
- did you click through to any of the inline citations? David’s shorter articles on pivot mostly gather and summarize those, so if you need to read the original research and its conclusions that’s where to go
  
  Ah, that’s better, yes. Thank you , no sarcasm :) now sleepy brain is more informed
I had GPT 3.5 break down 6x 45-minute verbatim interviews into bulleted summaries and it did great. I even asked it to anonymize people’s names and it did that too. I did re-read the summaries to make sure no duplicate info or hallucinations existed and it only needed a couple of corrections.

Beats manually summarizing that info myself.

Maybe their prompt sucks?
- “Are you sure you’re holding it correctly?”
  
  christ, every damn time
  
  That is how tools tend to work, yes.
- I got AcausalRobotGPT to summarise your post and it said "I'm not saying it's always programming.dev, but"
- @RagnarokOnline @dgerard "They failed to say the magic spells correctly"
- Did you conduct or read all the interviews in full in order to verify no hallucinations?
  
  I conducted the interviews myself alongside a colleague.
- How did you make sure no hallucinations existed without reading the source material; and if you read the source material, what did using an LLM save you?
- I also use it for that pretty often. I always double check and usually it's pretty good. Once in a great while it turns the summary into a complete shitshow but I always catch it on a reread, ask a second time, and it fixes things up. My biggest problem is that I'm dragged into too many useless meetings every week and this saves a ton of time over rereading entire transcripts and doing a poor job of summarizing because I have real work to get back to.
  
  I also use it as a rubber duck. It works pretty well if you tell it what it's doing and tell it to ask questions.
  
  Isn't the whole point of rubber duck debugging that the method works when talking to a literal rubber duck?
  
  Yup! I’ll feed in meeting transcripts and get a list of action steps to email out to everyone. If I was in project management, I’m pretty sure i’d outsource my entire job to LLMs.
You could use them to know what the text is about, and if it's worth your reading time. In this situation, it's fine if the AI makes shit up, as you aren't reading its output for the information itself anyway; and the distinction between summary and shortened version becomes moot.

However, here's the catch. If the text is long enough to warrant the question "should I spend my time reading this?", it should contain an introduction for that very purpose. In other words if the text is well-written you don't need this sort of "Gemini/ChatGPT, tell me what this text is about" on first place.

EDIT: I'm not addressing documents in this. My bad, I know. [In my defence I'm reading shit in a screen the size of an ant.]
- Both the use cases here are goverment documents. I'm baffled at the idea of it being "fine if the AI makes shit up".
- ChatGPT gives you a bad summary full of hallucinations and, as a result, you choose not to read the text based on that summary.
  
  (For clarity I'll re-emphasise that my top comment is the result of misreading the word "documents" out, so I'm speaking on general grounds about AI "summaries", not just about AI "summaries" of documents.)
  
  The key here is that the LLM is likely to hallucinate the claims of the text being shortened, but not the topic. So provided that you care about the later but not the former, in order to decide if you're going to read the whole thing, it's good enough.
  
  And that is useful in a few situations. For example, if you have a metaphorical pile of a hundred or so scientific papers, and you only need the ones about a specific topic (like "Indo-European urheimat" or "Argiope spiders" or "banana bonds").
  
  That backtracks to the OP. The issue with using AI summaries for documents is that you typically know the topic at hand, and you want the content instead. That's bad because then the hallucinations won't be "harmless".
- if the text is well-written you don’t need this sort of “Gemini/ChatGPT, tell me what this text is about” on first place.
  
  And if it's badly written then the LLM will shit itself.
  
  Now let's ask ourselves how much of the text in the world is "well-written"?
  
  Or even better, you could apply this to Copilot. How much code in the world is good code? The answer is fucking none, mate.
- @lvxferre @dgerard have you bumped your head?
  
  No, it's just rambling. My bad.
  
  I focused too much on using AI to summarise and ended not talking about it summarising documents, even if the text is about the later.
  
  And... well, the later is such a dumb idea that I don't feel like telling people "the text is right, don't do that", it's obvious.
The problem is not the LLMs, but what people are trying to do with them.

They are currently spoons, but people are desperately wishing they were katanas.

They work really well for soup, but they can't cut steak. But they're being hyped as super ninja steak knives, and people are getting pissed when they can't cut steak.

If you give them watery, soupy tasks they can do successfully, they can lighten your workload, as long as you're aware of what they are and aren't good at.

What people want LLMs to be able to do, ie. "Steak" tasks:

write complex documents

apply complex knowledge/rules to a situation

Write complex code and create entire programs based on vague description

What LLMs can currently do ie. "Soup" tasks:

check this document and fix all spelling, punctuation and grammatical errors

summarise this paragraph as dot points

write a python program that sorts my photographs into folders based on the year they were taken

Half of Lemmy is hyping katanas, the other half is yelling "Why won't my spoon cut this steak?!! AI is so dumb!!!"

Update: wow, the pure vitriol pouring out of the replies is just stunning. Seems there are a lot of you out there who have, in one way or another, tied your ego very strongly to either the success or failure of AI.

Take a step back, friends, and go outside for a while.
- What LLMs can currently do summarise this paragraph as dot points
  
  The entire point here is that they can't?
  
  Clearly this post is about LLMs not succeeding at this task, but anecdotally I've seen it work OK and also fail. Just like humans, which is the benchmark but they are faster.
- they don’t do any of that soup shit reliably either and reading the article might have told you that
  
  They absolutely do, and I have no idea why you're so angry
- I'd offer congratulations on obfuscating a bad claim with a poor analogy, but you didn't even do that very well.
  
  more of a Trabant analogy than a Corvette analogy
- Why did this immediately give me a flashback to Donald Trump yelling, "when it comes to great steaks, I've just raised the stakes!"
- good god this entire post is the most tortured believer whataboutism I've encountered this month and there's extremely strong competition here
  
  are currently spoons, but people are desperately wishing they were katanas
  
  ie. “Steak” tasks
  
  you should make a youtube channel, The Katana Steak-Eater. I'd watch the shit out of that at least one saturday afternoon
- Food analogy
  
  This level of discourse wouldn't fly on 4chan, how is it so popular with LLM fans?
  
  needs to be a car analogy
  
  What people want LLMs to do, i.e. Corvette tasks
  
  What LLMs actually do, i.e. Trabant tasks
  
  don't diss the course, this steak's great
- "spoons and katanas" has got to be the most baby brained analogy. are you a child
  
  Thanks Donald, good luck in November
  
  Who cares? It paints the correct picture and adds useful context.
- Actually, LLMs are syringes filled with brain-parasite-infested poop
So, they used a year old model, and what? Expected a miracle? Do you expect your '87 Chrysler to have parallel parking assist?

Here. If humans are better in every way, I'll do you a favor and summarize the article.

It's bullshit, designed to continue the assault against AI with bad faith arguments.

If you want to hate AI, go for it, but give me a good goddamned reason to support your cause. This shit is an insult to my intelligence.

Edit: The Anti-AI brigade has arrived. You all gotta find a new hobby. Downvotes without discourse is just masturbation.
- 1987 was one year ago?
  
  Also, you want to talk about bad faith arguments, this was presented to parliament in May 2024. It was submitted in January 2024. Model selection and optimisation was done in October 2023.
  
  Llama3 was released April 2024. They did not use an old model to intentionally tank the results, as you are implying. Llama2 was the 'latest-and-greatest' at the time of the study.
  
  1987 was one year ago?
  
  I think we can agree that AI is evolving slightly faster than sedan technology
  
  Also, you want to talk about bad faith arguments, this was presented to parliament in May 2024. It was submitted in January 2024. Model selection and optimisation was done in October 2023.
  
  And the article was posted today. I can post old data all day long. Got cancer? Just drink this heroin.
  
  Llama3 was released April 2024. They did not use an old model to intentionally tank the results, as you are implying. Llama2 was the ‘latest-and-greatest’ at the time of the study.
  
  Ok, fine, I'll accept your correction, if you'll accept my updated summary:
  
  The article was written in bad faith with outdated data in an attempt to turn AI disparagement into SEO into money.
Ok? I don't have another human available to skim a shitload of documents for me to find answers I need and I don't have time to do ot myself. AI is my best option.
- So long as you don't care about whether they're the right or relevant answers, you do you, I guess. Did you use AI to read the linked post too?
  
  Yep. Go ahead and ignore all the cases where it's getting answers correct and actually helping. We're all just hallucinating, it's in no way my lived experience. Your reality is the prime reality and we're the NPC's.
  
  I didn't read the post at all because its premise is irrelevant to my situation. If I had another human to read documentation for me I would do that. I don't so the next best thing is AI. I have to double check its findings but it gets me 95% of the way there and saves hours of work. It's a useful tool.
Hahaha what a load of nonsense.
- Summarised by Gemini
- your post history tells me you’re pretty fucking comfortable with pointless nonsense
  
  Ah, interesting. What exactly makes you think so? Specifically where I was talking about this topic (and was downvoted just as much) how I use LLM to perform board level repairs with reprogramming of chips? Perhaps there is an when better fitting post I forget where I was talking about pointless nonsense? Or to you it would be far better if I stopped doing those pointless repairs and just bought new stuff.

110 comments