Skip Navigation

Futurology @futurology.today Lugh @futurology.today 8mo ago

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

www.nature.com Two-faced AI language models learn to hide deception

‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Two-faced AI language models learn to hide deception

You're viewing a single thread.

9 comments

Just… don’t hook it up to the defense grid.
- Sorry, to late for that
  
  Alright, I’ll be out back digging the bomb shelter.
  
  Its too late for that honestly
  
  Alright, I’ll switch to digging holes for the family burial ground.

9 comments