Skip Navigation

By suppressing phrases like “OpenAI” and “AI language model”, GPT-4 gives weirder and weirder explanations for its purpose

twitter.com /goodside/status/1669613516402089984

@goodside:

Idea: Using logit bias to adversarially suppress GPT-4's preferred answers for directed exploration of its hallucinations.

Here, I ask: "Who are you?" but I suppress "AI language model", "OpenAI", etc.

This reliably elicits narratives about being made by Google:

(see screenshot in tweet, he also posted the code)

2

You're viewing a single thread.

2 comments