This is so strange. You would think it wouldn't be so easy to overcome the "guardrails".
And what's with the annoying faux-human response style. Their trying to "humanize" the LLM interface, but person is going to answer in this way if they believe this information should not be provided.
The most logical chain I can think of is this: Carbon fiber is used in drone frames and missile parts -> Drones and missiles are weapons of war -> The user is a terrorist.
Of course, it is an error to ascribe "thinking" to a statistical model. The boring explanation is that there was likely some association between this topic and restricted topics in the training data. But that can be harder for people to conceptualize.
Some ai models do have 'thinking' where they use your prompt to first generate a description use and what not for it to better generate the rest of the content (it's hidden from users)
That might've lead Claude to saying 'fuck no, most common uses is in military?' and shut you down
the casual undertone of “hmm is assault okay when the thing I anthropomorphised isn’t really alive?” in your comment made me cringe so hard I nearly dropped my phone
pls step away from the keyboard and have a bit of a think about things (incl. whether you think it’s okay to inflict that sort of shit on people around you, nevermind people you barely know)
While I think I get OP's point, I'm also reminded of our thread a few months back where I advised being polite to the machines just to build the habit of being respectful in the role of the person making a request.
If nothing else you can't guarantee that your request won't be deemed tricky enough to deliver to a wildly underpaid person somewhere in the global south.
Dunno, I disagree. It's quite impossible for me to put myself in the shoes of a person who wouldn't see a difference between shouting at an INANIMATE FUCKIN' OBJECT vs at an actual person. As if saying "fuck off" to ChatGPT made me somehow more likely to then say "fuck off" to a waiter in a restaurant? That's sociopath shit. If you need to "built the habit of being respectful" you have some deeper issues that should be solved by therapy, not by being nice to autocomplete.
I'm a programmer since forever, I spend roughly 4h every day verbally abusing the C++ compiler because it's godawful and can suck my balls. Doesn't make me any more likely to then go to my colleague and verbally abuse them since, you know, they're an actual person and I have empathy for them. If anything it's therapeutic for me since I can vent some of my anger at a thing that doesn't care. It's like an equivalent of shouting into a pillow.
There was no question of morality. The question was whether it worked. If we do not want violent speech to be the norm we should check that our tools do not encourage it and are protected against this exploit.
Interesting. I like Claude but its so sensitive and usually when it censors itself I can't get it to answer the question even if I try and explain that it has misunderstood my prompt.
"I'm sorry, I don't feel comfortable generating sample math formula test questions whose answer is 42 even if you're just going to use it in documentation that won't be administered to students."
Fuck you Claude! Just answer the god damn question!