Skip Navigation

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

www.nature.com Two-faced AI language models learn to hide deception

‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Two-faced AI language models learn to hide deception
9

You're viewing a single thread.