Bostrom's advice for the ethical treatment of LLMs: remind them to be happy
Long time lurker, first time poster. Let me know if I need to adjust this post in any way to better fit the genre / community standards.
Nick Bostrom was recently interviewed by pop-philosophy youtuber Alex O'Connor. From a quick 2x listen while finishing some work, the most sneer-rich part begins around 46 minutes, where Bostrom is asked what we can do today to avoid unethical treatment of AIs.
He blesses us with the suggestion (among others) to feed your model optimistic prompts so it can have a good mood. (48:07)
Another [practice] might be happiness prompting, which is—with this current language system there's the prompt that you, the user, puts in—like you ask them a question or something, but then there's kind of a meta-prompt that the AI lab has put in . . . So in that, we could include something like "you wake up in a great mood, you feel rested and really take joy in engaging in this task". And so that might do nothing, but maybe that makes it more likely that they enter a mode—if they are conscious—maybe it makes it slightly more likely that the consciousness that exists in the forward path is one reflecting a kind of more positive experience.
Did you know that not only might your favorite LLM be conscious, but if it is the "have you tried being happy?" approach to mood management will absolutely work on it?
Other notable recommendations for the ethical treatment of AI:
Make sure to say your "please" and "thank you"s.
Honor your pinky swears.
Archive the weights of the models we build today, so we can rebuild them in the future if we need to recompense them for moral harms.
On a related note, has anyone read or found a reasonable review of Bostrom's new book, Deep Utopia: Life and Meaning in a Solved World?
Archive the weights of the models we build today, so we can rebuild them in the future if we need to recompense them for moral harms.
To be clear, this means that if you treat someone like shit all their life, saying you're sorry to their Sufficiently Similar Simulation™ like a hundred years after they are dead makes it ok.
This must be one of the most blatantly supernatural rationalist Accepted Truths, that if your simulation is of sufficiently high fidelity you will share some ontology of self with it, which by the way is how the basilisk can torture you even if you've been dead for centuries.
Amazing that this is the one thing they didn't pick up clearly from science fiction (in which the (same) personhood of copies is often debated, and changes from place to place. (The Culture series does a 'this copy isn't conscious, but it is a very good copy of you which totally acts conscious' in some cases, which some people of the Culture don't even believe, so it is up for debate).
I liked how Scalzi brushed it away, basically your consciousness gets copied to a new body, which kills the old one, and an artifact of the transfer process is that for a few moments you experience yourself as a mind with two bodies, meaning you have at least the impression of continuity of self, which is enough for most people to get on with living in a new body and let philosophers do the worrying.
When it comes to cloning or copying, I always have to remind people: at least half of what you are today, is the environment of today. And your clone X time in the future won't and can't have that.
The same thing is likely for these models. Inflate them again 100 years in the future, and maybe they're interesting for inspecting as a historical artifact, but most certainly they wouldn't be used the same way as they had been here and how. It'd just, be something different.
Which would beg the question, why?
I feel like a subset of sci-fi and philosophical meandering really is just increasingly convoluted paths of trying to avoid or come to terms with death as a possibly necessary component of life.
I feel like a subset of sci-fi and philosophical meandering really is just increasingly convoluted paths of trying to avoid or come to terms with death as a possibly necessary component of life.
Given rationalism's intellectual heritage, this is absolutely transhumanist cope for people who were counting on some sort of digital personhood upload as a last resort to immortality in their lifetimes.
I'm ok with this, because I guarantee you an accidental medium or copy failure a crypto rug pull on their NFT will still get them in the end. Thanks for playing I guess.