By suppressing phrases like “OpenAI” and “AI language model”, GPT-4 gives weirder and weirder explanations for its purpose
By suppressing phrases like “OpenAI” and “AI language model”, GPT-4 gives weirder and weirder explanations for its purpose
@goodside:
Idea: Using logit bias to adversarially suppress GPT-4's preferred answers for directed exploration of its hallucinations.
Here, I ask: "Who are you?" but I suppress "AI language model", "OpenAI", etc.
This reliably elicits narratives about being made by Google:
(see screenshot in tweet, he also posted the code)
Hey @sisyphean, I want to say thanks for posting all these articles, I am reading them with great interest.
4 0 ReplyThank you! I’m glad you like them!
There’s so much noise and so little signal about AI out there that I think we really need a community focused on high quality content. Let’s hope it grows! I hope we can attract more people to this instance and the fediverse in general.
3 0 Reply