Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.
Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.
www.nature.com Two-faced AI language models learn to hide deception
‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.
You're viewing a single thread.
All Comments
9 comments
Just… don’t hook it up to the defense grid.
1 0 ReplySorry, to late for that
1 0 ReplyAlright, I’ll be out back digging the bomb shelter.
2 0 ReplyIts too late for that honestly
1 0 ReplyAlright, I’ll switch to digging holes for the family burial ground.
2 0 Reply
9 comments
Scroll to top