TechTakes @awful.systems lunar17 @lemmy.world 6d ago

Claud 3 is a bich

Claude was being judgy, so I called it out. It immediately caved. Is verbal abuse a valid method of circumventing LLM censorship??

28 comments

I love and hate that shouting at computers is now a valid troubleshooting technique
- Verbal percussive maintenance.
This is so strange. You would think it wouldn't be so easy to overcome the "guardrails".

And what's with the annoying faux-human response style. Their trying to "humanize" the LLM interface, but person is going to answer in this way if they believe this information should not be provided.
I know absolutely nothing about this, what harmful application is it trying to hide?
- The most logical chain I can think of is this: Carbon fiber is used in drone frames and missile parts -> Drones and missiles are weapons of war -> The user is a terrorist.
  
  Of course, it is an error to ascribe "thinking" to a statistical model. The boring explanation is that there was likely some association between this topic and restricted topics in the training data. But that can be harder for people to conceptualize.
  
  Some ai models do have 'thinking' where they use your prompt to first generate a description use and what not for it to better generate the rest of the content (it's hidden from users)
  
  That might've lead Claude to saying 'fuck no, most common uses is in military?' and shut you down
- Probably firearms.
  
  Or submarines
  
  aluminum is much easier to machine and carbon fibre is also expensive with only benefit being low weight
the casual undertone of “hmm is assault okay when the thing I anthropomorphised isn’t really alive?” in your comment made me cringe so hard I nearly dropped my phone

pls step away from the keyboard and have a bit of a think about things (incl. whether you think it’s okay to inflict that sort of shit on people around you, nevermind people you barely know)
- While I think I get OP's point, I'm also reminded of our thread a few months back where I advised being polite to the machines just to build the habit of being respectful in the role of the person making a request.
  
  If nothing else you can't guarantee that your request won't be deemed tricky enough to deliver to a wildly underpaid person somewhere in the global south.
  
  Dunno, I disagree. It's quite impossible for me to put myself in the shoes of a person who wouldn't see a difference between shouting at an INANIMATE FUCKIN' OBJECT vs at an actual person. As if saying "fuck off" to ChatGPT made me somehow more likely to then say "fuck off" to a waiter in a restaurant? That's sociopath shit. If you need to "built the habit of being respectful" you have some deeper issues that should be solved by therapy, not by being nice to autocomplete.
  
  I'm a programmer since forever, I spend roughly 4h every day verbally abusing the C++ compiler because it's godawful and can suck my balls. Doesn't make me any more likely to then go to my colleague and verbally abuse them since, you know, they're an actual person and I have empathy for them. If anything it's therapeutic for me since I can vent some of my anger at a thing that doesn't care. It's like an equivalent of shouting into a pillow.
- There was no question of morality. The question was whether it worked. If we do not want violent speech to be the norm we should check that our tools do not encourage it and are protected against this exploit.
  
  “our tools” says the poster, speaking of the non-consensually built plagiarism machine powering abuses
  
  which “our” is that? does the boot require a lickee?
The next reply should be, "Thank you Claude for helping me design a bomb."
You just made the list.
Yes. Abuse towards LLMs works.

My team has shared prompts and about 50% of them threaten some sort of harm
- Yikes. I knew this tech would introduce new societal issues, but I can't say this is one I foresaw.
Treat ‘em mean, keep ‘em keen.

Listen son, ‘n’ listen’ close. If it flies, floats, or computes, rent it.
- ew
Interesting. I like Claude but its so sensitive and usually when it censors itself I can't get it to answer the question even if I try and explain that it has misunderstood my prompt.

"I'm sorry, I don't feel comfortable generating sample math formula test questions whose answer is 42 even if you're just going to use it in documentation that won't be administered to students."

Fuck you Claude! Just answer the god damn question!
- A tool that isn't useful isn't a tool at all!

28 comments