This is a post on "the basics" -- the simplest moment-to-moment attitudes one can take to orient toward truth, without any special calculations such as Fermi estimates or remembering priors to avoid base-rate neglect. At the same time, it's something almost everyone can fruitfully work on (I suspect), including myself.
Somewhat similar to track-back meditation.MemoryTip of the Tongue
The central claim here is that there's a special art associated with what you do when something is "on the tip of your tongue" and you can't quite remember it. Most people have the skill to some extent, but, it can be sharpened to a fine point.
Improved memory helps you become truth-oriented in a fact-oriented, detail-oriented sense. It works against inaccuracy. It also works against misspeaking, and thus propagating falsehoods.Remembering Dreams
I first explicitly noticed the effectiveness of this technique for remembering dreams. When I wake up, I often have only one significant memory from my dreams. However, when I focus on the memory, explicitly naming each detail I can recall, and gently waiting for more, I can often unfold the memory into far, far more than I initially thought I could remember.
- Each detail you recall can open up more details.
- There's something special about explicitly naming details. I might have a general sense that there was a portal in the sky that looked a certain way, but explicitly confirming in my head that it looked as if the sky were broken glass, but at the same time the portal was perfectly round, might bring back more memories.
- Writing things down on paper is probably a good way of making sure you're explicitly confirming each detail, if you want to go that far.
- It's also very important to sit with memories and give them time to bring something more. Sometimes there will be a rush of memories, with each new item bringing more and more. Other times, you'll be stuck. It's easy to fail at that step, assuming that no more is coming. In my experience, if you sit with the memories, avoid getting distracted, and gently ask for more, more will often come to you fairly soon. You'll surprise yourself with what you can remember.
Sometimes I don't even remember any images from the dream at all, but have a vague sense of the dream (excitement, peace, more complicated emotions). I can still sometimes recall much more if I explicitly describe the left-over feeling to myself in as much detail as possible, and sit with it patiently waiting for more.
Think of it as forming a better relationship with your memory. It's easier to wait patiently when you've had several experiences where it's paid off. Explicitly processing details of what you've remembered lets your memory know you're interested, helping to keep it engaged in searching for more (and, potentially, training it to retain more).
Eventually, if you're better calibrated, you won't have to wait 5 minutes trying fruitlessly if you really don't think you will remember. But in order to be well-calibrated about that, you have to try it sometimes.Remembering Events
My claim is that this technique generalizes to any memory. Dreams might be a good practice case, especially if you don't have too many cognitively demanding distractions in the morning.
But you can try the same thing with anything. Someone I knew with especially good memory told me that he thought this was most of his skill; he might have started out with slightly above-average memory, at some point he started taking pride in his reputation for good memory. This prompted him to put effort into it, rehearsing memories much more than he otherwise would. People would then remark on his good memory, further reinforcing the behavior.
Conversations, and interactions with people generally, might make a good practice case. Many people already re-visit conversations mentally over and over (perhaps thinking of things they wish they'd said). You can treat these the same way as dreams, trying to recall as much detail as you can each time you think of them.
Of course, rehearsing certain memories again and again might not be a good thing. Watch whether you're worsening any mental problems such as depression. It may be good to couple this practice with staring into regrets and other emotionally balancing techniques, so that rehearsing memories is useful rather than intensifying emotional damage from those memories.False Memories
Some studies about memory may give you pause.
- First of all, there is evidence that people fabricate false memories. So, how can we trust recall? Maybe trying harder to recall something actually generates false memories.
- Second, there's been some research suggesting that in some sense we "retrieve" memories (take them out of storage), and then "put them back"; and if the process is disrupted before we "put them back", we can be made to forget the memory. This suggests that memories might be altered every time they get touched, which would mean they'd last longer if we didn't think about them.
Unfortunately, forgetting is also a thing, so making memories last longer by avoiding them doesn't seem to be an option. Rehearsal is necessary for sharper memory.
Still, false memories seem like a significant concern. Memories just seem real. If false memories are really common and easy to create, what are we supposed to do about that?
I think the situation isn't really hopeless. I think most false memories are more like mistaken inferences. I might be sure I put my keys in my pants pocket, where I always put them. But then I might eventually recall that I put them somewhere else yesterday. What seemed like a memory was actually an inference.
As long as you're aware of this, I would expect that gently tugging on memories to recall more details would improve things rather than lead to more confabulation. I could be wrong, of course. This is a critical question in how good/important the overall practice is.Gendlin's Focusing
There's an obvious similarity between what I'm describing and Gendlin's focusing. I similarly gently interact with a "felt sense" and try to name it, and iterate the process to get more detail. However, the "felt sense" is not especially located in my body the way it's described in Gendlin's focusing. It's possible that body sensations are actually involved at a subconscious level.
In any case, you may find the "gentle tugging" kind of stance useful for untangling emotions, not just recalling memories. Also, learning Gendlin's Focusing might help with memory and the other things I'm describing in this post?Remembering Ideas
I tend to place a high value on remembering ideas. A forgotten idea is like a little death. I generally prefer the conversation norm of pausing if someone has forgotten an idea, possibly for a significant amount of time, so they can try and recover it. Ideas are important.
This habit gave me a lot of practice with tip-of-the-tongue type recollection and the "gentle tugging" technique. Practicing this stuff seems quite important for being able to do it when you need it. So I think giving yourself significant time to try and remember forgotten ideas is quite valuable if only as practice.
I think a similar sort of mental motion is involved in developing ideas, as well. Let's move on from the memory section...Truth-Oriented ThinkingDeveloping Ideas
When you have an idea, you start with a kind of "pointer" -- a felt sense which says that there should be a think in a particular direction. You can unpack the pointer by explicitly naming things about it, checking for "fit" with the felt sense. The more you name, the easier it is to pull more details out.
Sometimes it turns out that the idea really doesn't make any sense at all; the things with the best "fit" don't actually do anything good when you explicitly spell them out. Then the felt sense changes.
To me, it feels like the felt sense traces out natural "pathways" across a "landscape" which you're exploring. An idea might be a pointer which leads to a dead end, but there's still "really a path there" -- you had it, which must mean that it was a natural thought to have in some sense. I take interest not just in what's true, but what the natural development of certain ideas is. This kind of attitude helps you explore alternative pathways.
Gendlin describes his notion of Focusing as involved in scientific research. It's not just about emotions. I think I'm describing the same thing here.Inner Sim
CFAR teaches a class on "inner sim", the intuitive expectations you have. When you try to balance one object on top of another, you have an intuition about whether it will fall. If someone tells you something, you might have an intuition about whether they're lying. You can't necessarily unpack these intuitions very well. Nor are they perfectly accurate. But they are quite useful.
The surprising thing is that it seems many people don't naturally make use of their inner sims as much as they could. Let's say you're at work, and you come up with a plan for completing a project within a week. The words "planning fallacy" might come to mind, but let's set that aside and ask a different question -- does your inner sim really expect the project to be done in a week? This kind of question can give useful information surprisingly often. And if your inner sim doesn't think the plan will work, you can try and ask yourself questions like why it will fail.
So, once you've developed an idea via the methodology in the previous section, another thing you can do is ask your inner sim about the idea. Is it true? Is it real? Can it work? What do you actually expect?
Using gentle tugging for idea development is just as good for creating fact or fiction, so you have to add this kind of reality check.
Also, communicating with the inner sim can be a lot like communicating with memory. You can gently sit with the question "what do I actually expect?" and see what comes up. And you similarly want to try and explicitly name what comes up; each detail of your expectations which you explicitly name can help pull more out.Motivated Cognition
Just like we worried about false memories, we might worry about motivated cognition. Does asking your inner sim really provide a truth check? Does following your felt sense create a bias in what ideas you develop?
In my experience, if I'm caught up in motivated cognition, it is literally harder to remember things which go against what I'm saying -- it seems like I just don't remember them. But the same memory techniques which I've mentioned do help. I might not want to say the contrary facts once I recall them, but I can at least consciously decide that.
Similarly, I think the inner-sim checks are indeed useful in combating motivated cognition. Is it true? Is it real? What do I actually expect? What do I actually think? Giving yourself a little pause to sit with these questions can make you change your mind during an argument in a number of seconds (in my experience).Explaining Things to Others
Just as explicitly naming things within your own head can help you pull detail out, once you think you understand something, explaining it to someone else can help pull a whole lot more detail out. This is probably true for memory, too.
It's not even necessarily about the interaction with the other person. Just trying to write something for someone else (and then never sharing it) can be similarly useful, whether it's a specific audience or a broad one. The need to bridge the inferential gap makes many more details feel relevant, which didn't feel relevant when you were explaining it to yourself.
Naturally, communicating an idea to another person is also great for uncovering problems.
This goes back to the reason why the overall technique I'm discussing works at all. Explicitly naming details of a memory helps to unpack it because what you know you know is different than what you know. You have a kind of mental illusion that you're remembering a whole conversation, but you're not really fitting all those details in short-term memory, which means you're not successfully pulling on all the associations. Similarly, you might think you understand something, but be unable to really explain all the details.Gears Thinking
Gears-level thinking is like unpacking an idea with exceptionally high standards about whether you really understand it. I mentioned that explaining things to others is helpful because you "pull on" details which you wouldn't ordinarily pull on, since you think you understand them. Gears thinking doesn't literally pull on "everything", but it pulls on a lot more.
I'm afraid that someone will read that and kind of nod along without getting it. I'm not talking about just generally having higher standards. I'm talking about the moment-to-moment experience of thinking. I'm saying there's a mental stance you can take where you "stop being lazy about your thinking" -- you don't re-check really solid things like 1+1=2, but you aren't satisfied with a thought until you've really gotten all the details in a significant sense.
The question you ask isn't whether something is true; the question you ask is exactly why it's true. No matter how confident you are that, say, a theorem you're using holds, you want the proof. You're trying to see all the pieces and how they fit together.
It's like pulling out a moth-eaten map and looking at the holes, trying to fill them in. Maybe you can't fill them in right away; maybe you have to make a voyage across the sea. It's hard. But you want those details; you want the map to be complete, not just "good enough".Understanding Others
There's a closely related mental stance which I call "ask all the questions". You might think, from the kind of Focusing-like habits I've been describing, that you have to turn within to get the answers. But your focusing object can also be outside of you.
You can orient this toward typical social small-talk. What cognitive habits lead someone to ask questions like "what school did you go to" or "do you have any siblings"? You could have a mental list of standard questions you ask people in social settings. But a different way, which I think is more efficient, is to focus on your "picture" of the person (sort of mentally rehearsing it) and asking questions to fill in the gaps.
Something which surprised me when I tried this attitude on was how self-centred it felt. You're still looking at your map for holes. And, you're kind of dominating the conversation, in terms of steering. But, you can bring in the gentle/patient attitude I keep talking about.
You can do the same for topics other than small talk. Maybe you are trying to understand how someone things about X. What many people do is focus mainly on their own picture of X, and let what the other person says kind of land in that map, focusing questions on problems. And that's useful. But you can also focus on your map of their map. (This might start out being a copy of your map, since you might assume that they mostly think about X like you and just have some different details. But the cognitive operation is already different; you bring your attention to the places least likely to be the same as for you.)
Again I want to emphasize that I'm talking about a moment-to-moment stance. Not occasionally thinking "what's my map of their map?" during a conversation. Focusing on it primarily, letting it drive most of your questions.
This can be a good way of absorbing technical subjects from people.
Dominic Cummings: "we’re hiring data scientists, project managers, policy experts, assorted weirdos"
Dominic Cummings (discussed previously on LW, most recently here) is a Senior Advisor to the new UK PM, Boris Johnson. He also seems to be essentially a rationalist (at least in terms of what ideas he's paying attention to).
He has posted today that his team is hiring "data scientists, project managers, policy experts, assorted weirdos". Perhaps some LW readers should apply.
Extensive quotes below:
‘This is possibly the single largest design flaw contributing to the bad Nash equilibrium in which … many governments are stuck. Every individual high-functioning competent person knows they can’t make much difference by being one more face in that crowd.’ Eliezer Yudkowsky, AI expert, LessWrong etc.
Now there is a confluence of: a) Brexit requires many large changes in policy and in the structure of decision-making, b) some people in government are prepared to take risks to change things a lot, and c) a new government with a significant majority and little need to worry about short-term unpopularity while trying to make rapid progress with long-term problems.
There is a huge amount of low hanging fruit — trillion dollar bills lying on the street — in the intersection of:
- the selection, education and training of people for high performance
- the frontiers of the science of prediction
- data science, AI and cognitive technologies (e.g Seeing Rooms, ‘authoring tools designed for arguing from evidence’, Tetlock/IARPA prediction tournaments that could easily be extended to consider ‘clusters’ of issues around themes like Brexit to improve policy and project management)
- communication (e.g Cialdini)
- decision-making institutions at the apex of government.
We want to hire an unusual set of people with different skills and backgrounds to work in Downing Street with the best officials, some as spads and perhaps some as officials. If you are already an official and you read this blog and think you fit one of these categories, get in touch.
The categories are roughly:
- Data scientists and software developers
- Policy experts
- Project managers
- Communication experts
- Junior researchers one of whom will also be my personal assistant
- Weirdos and misfits with odd skills
A. Unusual mathematicians, physicists, computer scientists, data scientists
You must have exceptional academic qualifications from one of the world’s best universities or have done something that demonstrates equivalent (or greater) talents and skills. You do not need a PhD — as Alan Kay said, we are also interested in graduate students as ‘world-class researchers who don’t have PhDs yet’.
A few examples of papers that you will be considering:
- The papers on computational rationality below.
- The work of Judea Pearl, the leading scholar of causation who has transformed the field.
B. Unusual software developers
We are looking for great software developers who would love to work on these ideas, build tools and work with some great people. You should also look at some of Victor’s technical talks on programming languages and the history of computing.
You will be working with data scientists, designers and others.
C. Unusual economists
We are looking to hire some recent graduates in economics. You should a) have an outstanding record at a great university, b) understand conventional economic theories, c) be interested in arguments on the edge of the field — for example, work by physicists on ‘agent-based models’ or by the hedge fund Bridgewater on the failures/limitations of conventional macro theories/prediction, and d) have very strong maths and be interested in working with mathematicians, physicists, and computer scientists.
The sort of conversation you might have is discussing these two papers in Science (2015): Computational rationality: A converging paradigm for intelligence in brains, minds, and machines, Gershman et al and Economic reasoning and artificial intelligence, Parkes & Wellman.
You will see in these papers an intersection of:
- von Neumann’s foundation of game theory and ‘expected utility’,
- mainstream economic theories,
- modern theories about auctions,
- theoretical computer science (including problems like the complexity of probabilistic inference in Bayesian networks, which is in the NP–hard complexity class),
- ideas on ‘computational rationality’ and meta-reasoning from AI, cognitive science and so on.
If these sort of things are interesting, then you will find this project interesting.
It’s a bonus if you can code but it isn’t necessary.
D. Great project managers.
If you think you are one of the a small group of people in the world who are truly GREAT at project management, then we want to talk to you.
It is extremely interesting that the lessons of Manhattan (1940s), ICBMs (1950s) and Apollo (1960s) remain absolutely cutting edge because it is so hard to apply them and almost nobody has managed to do it. The Pentagon systematically de-programmed itself from more effective approaches to less effective approaches from the mid-1960s, in the name of ‘efficiency’. Is this just another way of saying that people like General Groves and George Mueller are rarer than Fields Medallists?
E. Junior researchers
In many aspects of government, as in the tech world and investing, brains and temperament smash experience and seniority out of the park.
We want to hire some VERY clever young people either straight out of university or recently out with with extreme curiosity and capacity for hard work.
In SW1 communication is generally treated as almost synonymous with ‘talking to the lobby’. This is partly why so much punditry is ‘narrative from noise’.
With no election for years and huge changes in the digital world, there is a chance and a need to do things very differently.
G. Policy experts
One of the problems with the civil service is the way in which people are shuffled such that they either do not acquire expertise or they are moved out of areas they really know to do something else. One Friday, X is in charge of special needs education, the next week X is in charge of budgets.
If you want to work in the policy unit or a department and you really know your subject so that you could confidently argue about it with world-class experts, get in touch.
G. Super-talented weirdos
People in SW1 talk a lot about ‘diversity’ but they rarely mean ‘true cognitive diversity’. They are usually babbling about ‘gender identity diversity blah blah’. What SW1 needs is not more drivel about ‘identity’ and ‘diversity’ from Oxbridge humanities graduates but more genuine cognitive diversity.
We need some true wild cards, artists, people who never went to university and fought their way out of an appalling hell hole, weirdos from William Gibson novels like that girl hired by Bigend as a brand ‘diviner’ who feels sick at the sight of Tommy Hilfiger or that Chinese-Cuban free runner from a crime family hired by the KGB. If you want to figure out what characters around Putin might do, or how international criminal gangs might exploit holes in our border security, you don’t want more Oxbridge English graduates who chat about Lacan at dinner parties with TV producers and spread fake news about fake news.
By definition I don’t really know what I’m looking for but I want people around No10 to be on the lookout for such people.
We need to figure out how to use such people better without asking them to conform to the horrors of ‘Human Resources’ (which also obviously need a bonfire).
As Paul Graham and Peter Thiel say, most ideas that seem bad are bad but great ideas also seem at first like bad ideas — otherwise someone would have already done them. Incentives and culture push people in normal government systems away from encouraging ‘ideas that seem bad’. Part of the point of a small, odd No10 team is to find and exploit, without worrying about media noise, what Andy Grove called ‘very high leverage ideas’ and these will almost inevitably seem bad to most.
I will post some random things over the next few weeks and see what bounces back — it is all upside, there’s no downside if you don’t mind a bit of noise and it’s a fast cheap way to find good ideas…
An important, ongoing part of the rationalist project is to build richer mental models for understanding the world. To that end I'd like to briefly share part of my model of the world that seems to be outside the rationalist cannon in an explicit way, but which I think is known well to most, and talk a bit about how I think it is relevant to you, dear reader. Its name is "normalization of deviance".
If you've worked a job, attended school, driven a car, or even just grew up with a guardian, you've most likely experienced normalization of deviance. It happens when your boss tells you to do one thing but all your coworkers do something else and your boss expects you to do the same as them. It happens when the teacher gives you a deadline but lets everyone turn in the assignment late. It happens when you have to speed to keep up with traffic to avoid causing an accident. And it happens when parents lay down rules but routinely allow exceptions such that the rules might as well not even exist.
It took a much less mundane situation for the idea to crystalize and get a name. Diane Vaughan coined the term as part of her research into the causes of the Challenger explosion, where she described normalization of deviance as what happens when people within an organization become so used to deviant behavior that they don't see the deviance, even if that deviance is actively working against an important goal (in the case of Challenger, safety). From her work the idea has spread to considerations in healthcare, aeronautics, security, and, where I learned about it, software engineering. Along the way the idea has generalized from being specifically about organizations, violations of standard operating procedures, and safety to any situation where norms are so regularly violated that they are replaced by the de facto norms of the violations.
I think normalization of deviance shows up all over the place and is likely quietly happening in your life right now just outside where you are bothering to look. Here's some ways I think this might be relevant to you, and I encourage you to mention more in the comments:
- If you are trying to establish a new habit, regular violations of the intended habit may result in a deviant, skewed version of the habit being adopted.
- If you are trying to live up to an ideal (truth telling, vegetarianism, charitable giving, etc.), regularly tolerating violations of that ideal draws you away from it in a sneaky, subtle way that you may still claim to be upholding the ideal when in fact you are not and not even really trying to.
- If you are trying to establish norms in a community, regularly allowing norm violations will result in different norms than those you intended being adopted.
Those mentioned, my purpose in this post is to be informative, but I know that some of you will read this and make the short leap to treating it as advice that you should aim to allow less normalization of deviance, perhaps by being more scrupulous or less forgiving. Maybe, but before you jump to that, I encourage you to remember the adage about reversing all advice. Sometimes normalized "deviance" isn't so much deviance as an illegible norm that is serving an important purpose and "fixing" it will actually break things or otherwise make things worse. And not all deviance is normalized deviance: if you don't leave yourself enough slack you'll likely fail from trying too hard. So I encourage you to know about normalization of deviance, to notice it, and be deliberate about how you choose to respond to it.
Some people have expressed that “GPT-2 doesn’t understand anything about language or reality. It’s just huge statistics.” In at least two senses, this is true.
First, GPT-2 has no sensory organs. So when it talks about how things look or sound or feel and gets it right, it is just because it read something similar on the web somewhere. The best understanding it could have is the kind of understanding one gets from reading, not from direct experiences. Nor does it have the kind of understanding that a person does when reading, where the words bring to mind memories of past direct experiences.
Second, GPT-2 has no qualia. This is related to the previous point, but distinct from it. One could imagine building a robotic body with cameras for eyes and microphones for ears that fed .png and .wav files to something like GPT-2 rather than .html files. Such a system would have what might be called experiences of the world. It would not, however, create an direct internal impression of redness or loudness, the ineffable conscious experience that accompanies sensation.
However, this is too high a bar to rule out understanding. Perhaps we should call the understanding that comes from direct personal experience “real understanding” and the kind that comes solely from reading with no connection to personal experience “abstract understanding.” Although I can’t “really understand” what it was like to fight in the Vietnam War (because I wasn’t there, man) I can still understand it in an abstract sense. With an abstract understanding, here are some things one can do:
• answer questions about it in one’s own words
• define it
• use it appropriately in a sentence
• provide details about it
• summarize it
Professional teachers distinguish between tests of knowledge (which can be handled by mere memorization) and tests of understanding, with the latter being more difficult and useful (see Bloom’s Taxonomy). Understanding requires connecting a new idea to ideas a student is already familiar with.
GPT-2 is able to pass many such tests of understanding. With an appropriate prompt (such as giving examples of what form the answer to a question should take) it is able to answer questions, define terms, use words appropriately in a sentence, provide details, and summarize.
This is understanding for most practical purposes. It shows that when GPT-2 uses a word, that word has the appropriate kinds of connections to other words. The word has been integrated into a large graph-like structure of relationships between what can reasonably be called concepts or ideas. When probabilities for the next token have been generated, it has a certain propensity for using a particular word; but if that word is artificially blocked, other ways of saying the same thing also have been activated and will be used instead. It is reasonable to interpret this as having an “idea” of what it “wants” to “say” and at some point the quotation marks are no longer helpful, and we may as well dispense with them.
Here is an example. I input the following prompt into GPT-2 1.5B, with top-k=10 sampling:
"Indiana Jones ducked as he entered the cave to avoid being decapitated." In this sentence, the word "decapitated" means
Here are the first 10 results (truncated after the first sentence):
• "to be cut down" as well as "to be slain."
• "to chop off".
• "to cut off one of the branches of a tree."
• "The captain of the ship was killed in the cave."
• "to cut off, cut off by decapitation."
• "cut off".
• "cut off."
• to be "sliced off."
• "to be killed," which is the same thing as "to be killed by the sword."
• to fall from high altitude or to be cut down.
• "to have a head chopped off."
The system has a strong notion that “decapitated" means “to cut off” and “to kill” but is less likely to mention that the word has anything to do with a head. So its concept of “decapitation” appears to be approximately (but not completely) right. When prompted to write a sentence using the word “decapitate,” the sentences the system usually generates are consistent with this, often being used in a way consistent with killing, but only rarely mentioning heads. (This has all gotten rather grisly.)
However, one shouldn't take this too far. GPT-2 uses concepts in a very different way than a person does. In the paper “Evaluating Commonsense in Pre-trained Language Models,” the probability of generating each of a pair of superficially similar sentences is measured. If the system is correctly and consistently applying a concept, then one of the two sentences will have a high probability and the other a low probability of being generated. For example, given the four sentences
1. People need to use their air conditioner on a hot day.
2. People need to use their air conditioner on a lovely day.
3. People don’t need to use their air conditioner on a hot day.
4. People don’t need to use their air conditioner on a lovely day.
Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood. The same problem occurred with word vectors, like word2vec. “Black” is the opposite of “white,” but except in the one dimension they differ, nearly everything else about them is the same: you can buy a white or black crayon, you can paint a wall white or black, you can use white or black to describe a dog’s fur. Because of this, black and white are semantically close, and tend to get confused with each other.
The underlying reason for this issue appears to be that GPT-2 has only ever seen sentences that make sense, and is trying to generate sentences that are similar to them. It has never seen sentences that do NOT make sense and makes no effort to avoid them. The paper “Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training” introduces such an “unlikelihood objective” and shows it can help with precisely the kinds of problems mentioned in the previous paper, as well as GPT-2’s tendency to get stuck in endless loops.
Despite all this, when generating text, GPT-2 is more likely to generate a true sentence than the opposite of a true sentence. “Polar bears are found in the Arctic” is far more likely to be generated than “Polar bears are found in the tropics,” and it is also more likely to be generated than “Polar bears are not found in the Arctic” because “not found” is a less likely construction to be used in real writing than “found.”
It appears that what GPT-2 knows is that the concept polar bear has a found in relation to Arctic but that it is not very particular about the polarity of that relation (found in vs. not found in.) It simply defaults to expressing the more commonly used positive polarity much of the time.
Another odd feature of GPT-2 is that its writing expresses equal confidence in concepts and relationships it knows very well, and those it knows very little about. By looking into the probabilities, we can often determine when GPT-2 is uncertain about something, but this uncertainty is not expressed in the sentences it generates. By the same token, if prompted with text that has a lot of hedge words and uncertainty, it will include those words even if it is a topic it knows a great deal about.
Finally, GPT-2 doesn’t make any attempt to keep its beliefs consistent with one another. Given the prompt The current President of the United States is named, most of the generated responses will be variations on “Barack Obama.” With other prompts, however, GPT-2 acts as if Donald Trump is the current president. This contradiction was present in the training data, which was created over the course of several years. The token probabilities show that both men’s names have fairly high likelihood of being generated for any question of the kind. A person discovering that kind of uncertainty about two options in their mind would modify their beliefs so that one was more likely and the other less likely, but GPT-2 doesn't have any mechanism to do this and enforce a kind of consistency on its beliefs.
In summary, it seems that GPT-2 does have something that can reasonably be called “understanding” and holds something very much like “concepts” or “ideas” which it uses to generate sentences. However, there are some profound differences between how a human holds and uses ideas and how GPT-2 does, which are important to keep in mind.
[AN #80]: Why AI risk might be solved without additional intervention from longtermists View this email in your browser Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.
Audio version here (may not be up yet).
Welcome to another special edition of the newsletter! In this edition, I summarize four conversations that AI Impacts had with researchers who were optimistic that AI safety would be solved "by default". (Note that one of the conversations was with me.)
While all four of these conversations covered very different topics, I think there were three main points of convergence. First, we were relatively unconvinced by the traditional arguments for AI risk, and find discontinuities relatively unlikely. Second, we were more optimistic about solving the problem in the future, when we know more about the problem and have more evidence about powerful AI systems. And finally, we were more optimistic that as we get more evidence of the problem in the future, the existing ML community will actually try to fix that problem.
Conversation with Paul Christiano (Paul Christiano, Asya Bergal, Ronny Fernandez, and Robert Long) (summarized by Rohin): There can't be too many things that reduce the expected value of the future by 10%; if there were, there would be no expected value left. So, the prior that any particular thing has such an impact should be quite low. With AI in particular, obviously we're going to try to make AI systems that do what we want them to do. So starting from this position of optimism, we can then evaluate the arguments for doom. The two main arguments: first, we can't distinguish ahead of time between AIs that are trying to do the right thing, and AIs that are trying to kill us, because the latter will behave nicely until they can execute a treacherous turn. Second, since we don't have a crisp concept of "doing the right thing", we can't select AI systems on whether they are doing the right thing.
However, there are many "saving throws", or ways that the argument could break down, avoiding doom. Perhaps there's no problem at all, or perhaps we can cope with it with a little bit of effort, or perhaps we can coordinate to not build AIs that destroy value. Paul assigns a decent amount of probability to each of these (and other) saving throws, and any one of them suffices to avoid doom. This leads Paul to estimate that AI risk reduces the expected value of the future by roughly 10%, a relatively optimistic number. Since it is so neglected, concerted effort by longtermists could reduce it to 5%, making it still a very valuable area for impact. The main way he expects to change his mind is from evidence from more powerful AI systems, e.g. as we build more powerful AI systems, perhaps inner optimizer concerns will materialize and we'll see examples where an AI system executes a non-catastrophic treacherous turn.
Paul also believes that clean algorithmic problems are usually solvable in 10 years, or provably impossible, and early failures to solve a problem don't provide much evidence of the difficulty of the problem (unless they generate proofs of impossibility). So, the fact that we don't know how to solve alignment now doesn't provide very strong evidence that the problem is impossible. Even if the clean versions of the problem were impossible, that would suggest that the problem is much more messy, which requires more concerted effort to solve but also tends to be just a long list of relatively easy tasks to do. (In contrast, MIRI thinks that prosaic AGI alignment is probably impossible.)
Note that even finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems, since no one wants to build an unaligned AI system. Paul thinks that right now the case for AI risk is not very compelling, and so people don't care much about it, but if we could generate more compelling arguments, then they would take it more seriously. If instead you think that the case is already compelling (as MIRI does), then you would be correspondingly more pessimistic about others taking the arguments seriously and coordinating to avoid building unaligned AI.
One potential reason MIRI is more doomy is that they take a somewhat broader view of AI safety: in particular, in addition to building an AI that is trying to do what you want it to do, they would also like to ensure that when the AI builds successors, it does so well. In contrast, Paul simply wants to leave the next generation of AI systems in at least as good a situation as we find ourselves in now, since they will be both better informed and more intelligent than we are. MIRI has also previously defined aligned AI as one that produces good outcomes when run, which is a much broader conception of the problem than Paul has. But probably the main disagreement between MIRI and ML researchers and that ML researchers expect that we'll try a bunch of stuff, and something will work out, whereas MIRI expects that the problem is really hard, such that trial and error will only get you solutions that appear to work.
Rohin's opinion: A general theme here seems to be that MIRI feels like they have very strong arguments, while Paul thinks that they're plausible arguments, but aren't extremely strong evidence. Simply having a lot more uncertainty leads Paul to be much more optimistic. I agree with most of this.
However, I do disagree with the point about "clean" problems. I agree that clean algorithmic problems are usually solved within 10 years or are provably impossible, but it doesn't seem to me like AI risk counts as a clean algorithmic problem: we don't have a nice formal statement of the problem that doesn't rely on intuitive concepts like "optimization", "trying to do something", etc. This suggests to me that AI risk is more "messy", and so may require more time to solve.
Conversation with Rohin Shah (Rohin Shah, Asya Bergal, Robert Long, and Sara Haxhia) (summarized by Rohin): The main reason I am optimistic about AI safety is that we will see problems in advance, and we will solve them, because nobody wants to build unaligned AI. A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn't scale. I don't know why there are different underlying intuitions here.
In addition, many of the classic arguments for AI safety involve a system that can be decomposed into an objective function and a world model, which I suspect will not be a good way to model future AI systems. In particular, current systems trained by RL look like a grab bag of heuristics that correlate well with obtaining high reward. I think that as AI systems become more powerful, the heuristics will become more and more general, but they still won't decompose naturally into an objective function, a world model, and search. In addition, we can look at humans as an example: we don't fully pursue convergent instrumental subgoals; for example, humans can be convinced to pursue different goals. This makes me more skeptical of traditional arguments.
I would guess that AI systems will become more interpretable in the future, as they start using the features / concepts / abstractions that humans are using. Eventually, sufficiently intelligent AI systems will probably find even better concepts that are alien to us, but if we only consider AI systems that are (say) 10x more intelligent than us, they will probably still be using human-understandable concepts. This should make alignment and oversight of these systems significantly easier. For significantly stronger systems, we should be delegating the problem to the AI systems that are 10x more intelligent than us. (This is very similar to the picture painted in Chris Olah’s views on AGI safety (AN #72), but that had not been published and I was not aware of Chris's views at the time of this conversation.)
I'm also less worried about race dynamics increasing accident risk than the median researcher. The benefit of racing a little bit faster is to have a little bit more power / control over the future, while also increasing the risk of extinction a little bit. This seems like a bad trade from each agent's perspective. (That is, the Nash equilibrium is for all agents to be cautious, because the potential upside of racing is small and the potential downside is large.) I'd be more worried if [AI risk is real AND not everyone agrees AI risk is real when we have powerful AI systems], or if the potential upside was larger (e.g. if racing a little more made it much more likely that you could achieve a decisive strategic advantage).
Overall, it feels like there's around 90% chance that AI would not cause x-risk without additional intervention by longtermists. The biggest disagreement between me and more pessimistic researchers is that I think gradual takeoff is much more likely than discontinuous takeoff (and in fact, the first, third and fourth paragraphs above are quite weak if there's a discontinuous takeoff). If I condition on discontinuous takeoff, then I mostly get very confused about what the world looks like, but I also get a lot more worried about AI risk, especially because the "AI is to humans as humans are to ants" analogy starts looking more accurate. In the interview I said 70% chance of doom in this world, but with way more uncertainty than any of the other credences, because I'm really confused about what that world looks like. Two other disagreements, besides the ones above: I don't buy Realism about rationality (AN #25), whereas I expect many pessimistic researchers do. I may also be more pessimistic about our ability to write proofs about fuzzy concepts like those that arise in alignment.
On timelines, I estimated a very rough 50% chance of AGI within 20 years, and 30-40% chance that it would be using "essentially current techniques" (which is obnoxiously hard to define). Conditional on both of those, I estimated 70% chance that it would be something like a mesa optimizer; mostly because optimization is a very useful instrumental strategy for solving many tasks, especially because gradient descent and other current algorithms are very weak optimization algorithms (relative to e.g. humans), and so learned optimization algorithms will be necessary to reach human levels of sample efficiency.
Rohin's opinion: Looking over this again, I'm realizing that I didn't emphasize enough that most of my optimism comes from the more outside view type considerations: that we'll get warning signs that the ML community won't ignore, and that the AI risk arguments are not watertight. The other parts are particular inside view disagreements that make me more optimistic, but they don't factor in much into my optimism besides being examples of how the meta considerations could play out. I'd recommend this comment of mine to get more of a sense of how the meta considerations factor into my thinking.
I was also glad to see that I still broadly agree with things I said ~5 months ago (since no major new opposing evidence has come up since then), though as I mentioned above, I would now change what I place emphasis on.
Conversation with Robin Hanson (Robin Hanson, Asya Bergal, and Robert Long) (summarized by Rohin): The main theme of this conversation is that AI safety does not look particularly compelling on an outside view. Progress in most areas is relatively incremental and continuous; we should expect the same to be true for AI, suggesting that timelines should be quite long, on the order of centuries. The current AI boom looks similar to previous AI booms, which didn't amount to much in the past.
Timelines could be short if progress in AI were "lumpy", as in a FOOM scenario. This could happen if intelligence was one simple thing that just has to be discovered, but Robin expects that intelligence is actually a bunch of not-very-general tools that together let us do many things, and we simply have to find all of these tools, which will presumably not be lumpy. Most of the value from tools comes from more specific, narrow tools, and intelligence should be similar. In addition, the literature on human uniqueness suggests that it wasn't "raw intelligence" or small changes to brain architecture that makes humans unique, it's our ability to process culture (communicating via language, learning from others, etc).
In any case, many researchers are now distancing themselves from the FOOM scenario, and are instead arguing that AI risk occurs due to standard principal-agency problems, in the situation where the agent (AI) is much smarter than the principal (human). Robin thinks that this doesn't agree with the existing literature on principal-agent problems, in which losses from principal-agent problems tend to be bounded, even when the agent is smarter than the principal.
You might think that since the stakes are so high, it's worth working on it anyway. Robin agrees that it's worth having a few people (say a hundred) pay attention to the problem, but doesn't think it's worth spending a lot of effort on it right now. Effort is much more effective and useful once the problem becomes clear, or once you are working with a concrete design; we have neither of these right now and so we should expect that most effort ends up being ineffective. It would be better if we saved our resources for the future, or if we spent time thinking about other ways that the future could go (as in his book, Age of Em).
It's especially bad that AI safety has thousands of "fans", because this leads to a "crying wolf" effect -- even if the researchers have subtle, nuanced beliefs, they cannot control the message that the fans convey, which will not be nuanced and will instead confidently predict doom. Then when doom doesn't happen, people will learn not to believe arguments about AI risk.
Rohin's opinion: Interestingly, I agree with almost all of this, even though it's (kind of) arguing that I shouldn't be doing AI safety research at all. The main place I disagree is that losses from principal-agent problems with perfectly rational agents are bounded -- this seems crazy to me, and I'd be interested in specific paper recommendations (though note I and others have searched and not found many).
On the point about lumpiness, my model is that there are only a few underlying factors (such as the ability to process culture) that allow humans to so quickly learn to do so many tasks, and almost all tasks require near-human levels of these factors to be done well. So, once AI capabilities on these factors reach approximately human level, we will "suddenly" start to see AIs beating humans on many tasks, resulting in a "lumpy" increase on the metric of "number of tasks on which AI is superhuman" (which seems to be the metric that people often use, though I don't like it, precisely because it seems like it wouldn't measure progress well until AI becomes near-human-level).
Conversation with Adam Gleave (Adam Gleave et al) (summarized by Rohin): Adam finds the traditional arguments for AI risk unconvincing. First, it isn't clear that we will build an AI system that is so capable that it can fight all of humanity from its initial position where it doesn't have any resources, legal protections, etc. While discontinuous progress in AI could cause this, Adam doesn't see much reason to expect such discontinuous progress: it seems like AI is progressing by using more computation rather than finding fundamental insights. Second, we don't know how difficult AI safety will turn out to be; he gives a probability of ~10% that the problem is as hard as (a caricature of) MIRI suggests, where any design not based on mathematical principles will be unsafe. This is especially true because as we get closer to AGI we'll have many more powerful AI techniques that we can leverage for safety. Thirdly, Adam does expect that AI researchers will eventually solve safety problems; they don't right now because it seems premature to work on those problems. Adam would be more worried if there were more arms race dynamics, or more empirical evidence or solid theoretical arguments in support of speculative concerns like inner optimizers. He would be less worried if AI researchers spontaneously started to work on relative problems (more than they already do).
Adam makes the case for AI safety work differently. At the highest level, it seems possible to build AGI, and some organizations are trying very hard to build AGI, and if they succeed it would be transformative. That alone is enough to justify some effort into making sure such a technology is used well. Then, looking at the field itself, it seems like the field is not currently focused on doing good science and engineering to build safe, reliable systems. So there is an opportunity to have an impact by pushing on safety and reliability. Finally, there are several technical problems that we do need to solve before AGI, such as how we get information about what humans actually want.
Adam also thinks that it's 40-50% likely that when we build AGI, a PhD thesis describing it would be understandable by researchers today without too much work, but ~50% that it's something radically different. However, it's only 10-20% likely that AGI comes only from small variations of current techniques (i.e. by vastly increasing data and compute). He would see this as more likely if we hit additional milestones by investing more compute and data (OpenAI Five was an example of such a milestone).
Rohin's opinion: I broadly agree with all of this, with two main differences. First, I am less worried about some of the technical problems that Adam mentions, such as how to get information about what humans want, or how to improve the robustness of AI systems, and more concerned about the more traditional problem of how to create an AI system that is trying to do what you want. Second, I am more bullish on the creation of AGI using small variations on current techniques, but vastly increasing compute and data (I'd assign ~30%, while Adam assigns 10-20%).Copyright © 2020 Rohin Shah, All rights reserved.
Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
Thesis: FCToM is underspecified, since they rely on a mapping (which I call the "level of analysis") from physical systems to mathematic functions or computational algorithms (respectively), but do not specify that mapping. (I'm collapsing functional and computational theories of mind, since I think the distinction isn't relevant here.
I believe this issue has great ethical significance: if we accept a naive version of FCToM, we may end up using a misleading level of analysis, and (e.g.) committing massive mind crime. One form of naive FCToM would ignore this issue and say: "if two systems can be described as performing the same computations, then they have the same 'mind' (and hence the same consciousness and same status as moral patients)"
The reductio ad absurdum: Imagine a future totalitarian society where individual humans are forced to play the role of logic gates in a computer which hosts an emulation of your brain. They communicate via snail-mail, and severe punishment, social isolation, and redundancies are used to ensure that they perform their task faithfully. If you endorse naive FCToM, you would say "that's just me!" But far more ethically relevant than the emulation is the experience of the many people enslaved in this system. Note: this is a thought experiment, and may not be physically realizable (for instance, the people playing the gates may be too difficult to control); I think exploring that issue can provide a complementary critique of FCToM, but I'll skip it for now.
Historical note: the idea for writing this post, although not the content, is somewhat inspired by a debate between Massimo Pigliucci and Eliezer Yudkowsky on blogging heads (around 35-39 minutes). I think Massimo won that argument.
It’s easy to make experimental design mistakes that invalidate your online controlled experiments. At an organisation like Facebook (who kindly supplied the corpus of experiments used in this study), the state of art is to have a pool of experts carefully review all experiments. PlanAlyzer acts a bit like a linter for online experiment designs, where those designs are specified in the PlanOut language.
As well as pointing out any bugs in the experiment design, PlanAlyzer will also output a set of contrasts — comparisons that you can safely make given the design of the experiment. Hopefully the comparison you wanted to make when you set up the experiment is in that set!
Regular readers of The Morning Paper will be well aware that there’s plenty that can go wrong in the design and interpretation of online controlled experiments (see e.g. ‘A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments’) . PlanAnalyzer is aimed at detecting threats to internal validity, the degree to which valid causal conclusions can (or cannot!) be drawn from a study.
For an epistemic status statement and an outline of the purpose of this sequence of posts, please see the top of my prior post. There are also some explanations and caveats in that post which I won’t repeat - or will repeat only briefly - in this post.Purpose of this post
In my prior post, I wrote:
We are often forced to make decisions under conditions of uncertainty. This uncertainty can be empirical (e.g., what is the likelihood that nuclear war would cause human extinction?) or moral (e.g., does the wellbeing of future generations matter morally?). The issue of making decisions under empirical uncertainty has been well-studied, and expected utility theory has emerged as the typical account of how a rational agent should proceed in these situations. The issue of making decisions under moral uncertainty appears to have received less attention (though see this list of relevant papers), despite also being of clear importance.
I then went on to describe three prominent approaches for dealing with moral uncertainty (based on Will MacAskill’s 2014 thesis):
- Maximising Expected Choice-worthiness (MEC), if all theories under consideration by the decision-maker are cardinal and intertheoretically comparable.
- Variance Voting (VV), a form of what I’ll call “Normalised MEC”, if all theories under consideration are cardinal but not intertheoretically comparable.
- The Borda Rule (BR), if all theories under consideration are ordinal.
But I was surprised to discover that I couldn’t find any very explicit write-up of how to handle moral and empirical uncertainty at the same time. I assume this is because most people writing on relevant topics consider the approach I will propose in this post to be quite obvious (at least when using MEC with cardinal, intertheoretically comparable, consequentialist theories). Indeed, many existing models from EAs/rationalists (and likely from other communities) already effectively use something very much like the first approach I discuss here (“MEC-E”; explained below), just without explicitly noting that this is an integration of approaches for dealing with moral and empirical uncertainty.
But it still seemed worth explicitly spelling out the approach I propose, which is, in a nutshell, using exactly the regular approaches to moral uncertainty mentioned above, but on outcomes rather than on actions, and combining that with consideration of the likelihood of each action leading to each outcome. My aim for this post is both to make this approach “obvious” to a broader set of people and to explore how it can work with non-comparable, ordinal, and/or non-consequentialist theories (which may be less obvious).
(Additionally, as a side-benefit, readers who are wondering what on earth all this “modelling” business some EAs and rationalists love talking about is, or who are only somewhat familiar with modelling, may find this post to provide useful examples and explanations.)
I'd be interested in any comments or feedback you might have on anything I discuss here!MEC under empirical uncertainty
To briefly review regular MEC: MacAskill argues that, when all moral theories under consideration are cardinal and intertheoretically comparable, a decision-maker should choose the “option” that has the highest expected choice-worthiness. Expected choice-worthiness is given by the following formula:
In this formula, C(Ti) represents the decision-maker’s credence (belief) in Ti (some particular moral theory), while CWi(A) represents the “choice-worthiness” (CW) of A (an “option” or action that the decision-maker can choose) according to Ti. In my prior post, I illustrated how this works with this example:
Suppose Devon assigns a 25% probability to T1, a version of hedonistic utilitarianism in which human “hedons” (a hypothetical unit of pleasure) are worth 10 times more than fish hedons. He also assigns a 75% probability to T2, a different version of hedonistic utilitarianism, which values human hedons just as much as T1 does, but doesn’t value fish hedons at all (i.e., it sees fish experiences as having no moral significance). Suppose also that Devon is choosing whether to buy a fish curry or a tofu curry, and that he’d enjoy the fish curry about twice as much. (Finally, let’s go out on a limb and assume Devon’s humanity.)
According to T1, the choice-worthiness (roughly speaking, the rightness or wrongness of an action) of buying the fish curry is -90 (because it’s assumed to cause 1,000 negative fish hedons, valued as -100, but also 10 human hedons due to Devon’s enjoyment). In contrast, according to T2, the choice-worthiness of buying the fish curry is 10 (because this theory values Devon’s joy as much as T1 does, but doesn’t care about the fish’s experiences). Meanwhile, the choice-worthiness of the tofu curry is 5 according to both theories (because it causes no harm to fish, and Devon would enjoy it half as much as he’d enjoy the fish curry).
[...] Using MEC in this situation, the expected choice-worthiness of buying the fish curry is 0.25 * -90 + 0.75 * 10 = -15, and the expected choice-worthiness of buying the tofu curry is 0.25 * 5 + 0.75 * 5 = 5. Thus, Devon should buy the tofu curry.
But can Devon really be sure that buying the fish curry will lead to that much fish suffering? What if this demand signal doesn’t lead to increased fish farming/capture? What if the additional fish farming/capture is more humane than expected? What if fish can’t suffer because they aren’t actually conscious (empirically, rather than as a result of what sorts of consciousness our moral theory considers relevant)? We could likewise question Devon’s apparent certainty that buying the tofu curry definitely won’t have any unintended consequences for fish suffering, and his apparent certainty regarding precisely how much he’d enjoy each meal.
These are all empirical questions, but they seem very important for Devon’s ultimate decision, as T1 and T2 don’t “intrinsically care” about buying fish curry or buying tofu curry; they care about some of the outcomes which those actions may or may not cause.
More generally, I expect that, in all realistic decision situations, we’ll have both moral and empirical uncertainty, and that it’ll often be important to explicitly consider both types of uncertainties. For example, GiveWell’s models consider both how likely insecticide-treated bednets are to save the life of a child, and how that outcome would compare to doubling the income of someone in extreme poverty. However, typical discussions of MEC seem to assume that we already know for sure what the outcomes of our actions will be, just as typical discussions of expected value reasoning seem to assume that we already know for sure how valuable a given outcome is.
Luckily, it seems to me that MEC and traditional (empirical) expected value reasoning can be very easily and neatly integrated in a way that resolves those issues. (This is perhaps partly due to that fact that, if I understand MacAskill’s thesis correctly, MEC was very consciously developed by analogy to expected value reasoning.) Here is my formula for this integration, which I'll call Maximising Expected Choice-worthiness, accounting for Empirical uncertainty (MEC-E), and which I'll explain and provide an example for below:
Here, all symbols mean the same things they did in the earlier formula from MacAskill’s thesis, with two exceptions:
- I’ve added Oj, to refer to each “outcome”: each consequence that an action may lead to, which at least one moral theory under consideration intrinsically values/disvalues. (E.g., a fish suffering; a person being made happy; rights being violated.)
- Related to that, I’d like to be more explicit that A refers only to the “actions” that the decision-maker can directly choose (e.g., purchasing a fish meal, imprisoning someone), rather than the outcomes of those actions.
(I also re-ordered the choice-worthiness term and the credence term, which makes no actual difference to any results, and was just because I think this ordering is slightly more intuitive.)
Stated verbally (and slightly imprecisely), MEC-E claims that:
One should choose the action which maximises expected choice-worthiness, accounting for empirical uncertainty. To calculate the expected choice-worthiness of each action, you first, for each potential outcome of the action and each moral theory under consideration, find the product of 1) the probability of that outcome given that that action is taken, 2) the choice-worthiness of that outcome according to that theory, and 3) the credence given to that theory. Second, for each action, you sum together all of those products.
To illustrate, I have modelled in Guesstimate an extension of the example of Devon deciding what meal to buy to also incorporate empirical uncertainty. In the text here, I will only state the information that was not in the earlier version of the example, and the resulting calculations, rather than walking through all the details.
Suppose Devon believes there’s an 80% chance that buying a fish curry will lead to “fish being harmed” (modelled as 1000 negative fish hedons, with a choice-worthiness of -100 according to T1 and 0 according to T2), and a 10% chance that buying a tofu curry will lead to that same outcome. He also believes there’s a 95% chance that buying a fish curry will lead to “Devon enjoying a meal a lot” (modelled as 10 human hedons), and a 50% chance that buying a tofu curry will lead to that.
The expected choice-worthiness of buying a fish curry would therefore be: 0.8 * -100 * 0.25 + 0.8 * 0 * 0.75 + 0.95 * 10 * 0.25 + 0.95 * 10 * 0.75 = -10.5
Meanwhile, the expected choice-worthiness of buying a tofu curry would be: 0.1 * -100 * 0.25 + 0.1 * 0 * 0.75 + 0.5 * 10 * 0.25 + 0.5 * 10 * 0.75 = 2.5
As before, the tofu curry appears the better choice, despite seeming somewhat worse according to the theory (T2) assigned higher credence, because the other theory (T1) sees the tofu curry as much better.
In the final section of this post, I discuss potential extensions of these approaches, such as how it can handle probability distributions (rather than point estimates) and non-consequentialist theories.
The last thing I’ll note about MEC-E in this section is that MEC-E can be used as a heuristic, without involving actual numbers, in exactly the same way MEC or traditional expected value reasoning can. For example, without knowing or estimating any actual numbers, Devon might reason that, compared to buying the tofu curry, buying the fish curry is “much” more likely to lead to fish suffering and only “somewhat” more likely to lead to him enjoying his meal a lot. He may further reason that, in the “unlikely but plausible” event that fish experiences do matter, the badness of a large amount of fish suffering is “much” greater than the goodness of him enjoying a meal. He may thus ultimately decide to purchase the tofu curry.
(Indeed, my impression is that many effective altruists have arrived at vegetarianism/veganism through reasoning very much like that, without any actual numbers being required.)Normalised MEC under empirical uncertainty
(From here onwards, I’ve had to go a bit further beyond what’s clearly implied by existing academic work, so the odds I’ll make some mistakes go up a bit. Please let me know if you spot any errors.)
To briefly review regular Normalised MEC: Sometimes, despite being cardinal, the moral theories we have credence in are not intertheoretically comparable (basically meaning that there’s no consistent, non-arbitrary “exchange rate” between the theories' “units of choice-worthiness"). MacAskill argues that, in such situations, one must first "normalise" the theories in some way (i.e., "[adjust] values measured on different scales to a notionally common scale"), and then apply MEC to the new, normalised choice-worthiness scores. He recommends Variance Voting, in which the normalisation is by variance (rather than, e.g., by range), meaning that we:
“[treat] the average of the squared differences in choice-worthiness from the mean choice-worthiness as the same across all theories. Intuitively, the variance is a measure of how spread out choice-worthiness is over different options; normalising at variance is the same as normalising at the difference between the mean choice-worthiness and one standard deviation from the mean choice-worthiness.”
(I provide a worked example here, based on an extension of the scenario with Devon deciding what meal to buy, but it's possible I've made mistakes.)
My proposal for Normalised MEC, accounting for Empirical Uncertainty (Normalised MEC-E) is just to combine the ideas of non-empirical Normalised MEC and non-normalised MEC-E in a fairly intuitive way. The steps involved (which may be worth reading alongside this worked example and/or the earlier explanations of Normalised MEC and MEC-E) are as follows:
Work out expected choice-worthiness just as with regular MEC, except that here one is working out the expected choice-worthiness of outcomes, not actions. I.e., for each outcome, multiply that outcome’s choice-worthiness according to each theory by your credence in that theory, and then add up the resulting products.
- You could also think of this as using the MEC-E formula, except with “Probability of outcome given action” removed for now.
Normalise these expected choice-worthiness scores by variance, just as MacAskill advises in the quote above.
Find the “expected value” of each action in the traditional way, with these normalised expected choice-worthiness scores serving as the “value” for each potential outcome. I.e., for each action, multiply the probability it leads to each outcome by the normalised expected choice-worthiness of that outcome (from step 2), and then add up the resulting products.
- You could think of this as bringing “Probability of outcome given action” back into the MEC-E formula.
Choose the action with the maximum score from step 3 (which we could call normalised expected choice-worthiness, accounting for empirical uncertainty, or expected value, accounting for normalised moral uncertainty).
The final approach MacAskill recommends in his thesis is the Borda Rule (BR; also known as Borda counting). This is used when the moral theories we have credence in are merely ordinal (i.e., they don’t say “how much” more choice-worthy one option is compared to another). In my prior post, I provided the following quote of MacAskill’s formal explanation of BR (here with “options” replaced by “actions”):
“An [action] A’s Borda Score, for any theory Ti, is equal to the number of [actions] within the [action]-set that are less choice-worthy than A according to theory Ti’s choice-worthiness function, minus the number of [actions] within the [action]-set that are more choice-worthy than A according to Ti’s choice-worthiness function.
An [action] A’s Credence-Weighted Borda Score is the sum, for all theories Ti, of the Borda Score of A according to theory Ti multiplied by the credence that the decision-maker has in theory Ti.
[The Borda Rule states that an action] A is more appropriate than an [action] B iff [if and only if] A has a higher Credence-Weighted Borda Score than B; A is equally as appropriate as B iff A and B have an equal Credence-Weighted Borda Score.”
To apply BR when one is also empirically uncertain, I propose just explicitly considering/modelling one’s empirical uncertainties, and then figuring out each action’s Borda Score with those empirical uncertainties in mind. (That is, we don’t change the method at all on a mathematical level; we just make sure each moral theory’s preference rankings over actions - which is used as input into the Borda Rule - takes into account our empirical uncertainty about what outcomes each action may lead to.)
I’ll illustrate how this works with reference to the same example from MacAskill’s thesis that I quoted in my prior post, but now with slight modifications (shown in bold).
“Julia is a judge who is about to pass a verdict on whether Smith is guilty for murder. She is very confident that Smith is innocent. There is a crowd outside, who are desperate to see Smith convicted. Julia has three options:
[G]: Pass a verdict of ‘guilty’.
[R]: Call for a retrial.
[I]: Pass a verdict of ‘innocent’.
She thinks there’s a 0% chance of M if she passes a verdict of guilty, a 30% chance if she calls for a retrial (there may mayhem due to the lack of a guilty verdict, or later due to a later innocent verdict), and a 70% chance if she passes a verdict of innocent.
There’s obviously a 100% chance of C if she passes a verdict of guilty and a 0% chance if she passes a verdict of innocent. She thinks there’s also a 20% chance of C happening later if she calls for a retrial.
Julia believes the crowd is very likely (~90% chance) to riot if Smith is found innocent, causing mayhem on the streets and the deaths of several people. If she calls for a retrial, she believes it’s almost certain (~95% chance) that he will be found innocent at a later date, and that it is much less likely (only ~30% chance) that the crowd will riot at that later date if he is found innocent then. If she declares Smith guilty, the crowd will certainly (~100%) be appeased and go home peacefully. She has credence in three moral theories**, which, when taking the preceding probabilities into account, provide the following choice-worthiness orderings**:
35% credence in a variant of utilitarianism, according to which [G≻I≻R].
34% credence in a variant of common sense, according to which [I>R≻G].
31% credence in a deontological theory, according to which [I≻R≻G].”
This leads to the Borda Scores and Credence-Weighted Borda Scores shown in the table below, and thus to the recommendation that Julia declare Smith innocent.
(More info on how that was worked out can be found in the following footnote, along with the corresponding table based on the moral theories' preference orderings in my prior post, when empirical uncertainty wasn't taken into account.)
In the original example, both the utilitarian theory and the common sense theory preferred a retrial to a verdict of innocent (in order to avoid a riot), which resulted in calling for a retrial having the highest Credence-Weighted Borda Score.
However, I’m now imagining that Julia is no longer assuming each action 100% guarantees a certain outcome will occur, and paying attention to her empirical uncertainty has changed her conclusions.
In particular, I’m imagining that she realises she’d initially been essentially “rounding up” (to 100%) the likelihood of a riot if she provides a verdict of innocent, and “rounding down” (to 0%) the likelihood of the crowd rioting at a later date. However, with more realistic probabilities in mind, utilitarianism and common sense would both actually prefer an innocent verdict to a retrial (because the innocent verdict seems less risky, and the retrial more risky, than she’d initially thought, while an innocent verdict still frees this innocent person sooner and with more certainty). This changes each action’s Borda Score, and gives the result that she should declare Smith innocent.Potential extensions of these approaches Does this approach presume/privilege consequentialism?
A central idea of this post has been making a clear distinction between “actions” (which one can directly choose to take) and their “outcomes” (which are often what moral theories “intrinsically care about”). This clearly makes sense when the moral theories one has credence in are consequentialist. However, other moral theories may “intrinsically care” about actions themselves. For example, many deontological theories would consider lying to be wrong in and of itself, regardless of what it leads to. Can the approaches I’ve proposed handle such theories?
Yes - and very simply! For example, suppose I wish to use MEC-E (or Normalised MEC-E), and I have credence in a (cardinal) deontological theory that assigns very low choice-worthiness to lying (regardless of outcomes that action leads to). We can still calculate expected choice-worthiness using the formulas shown above; in this case, we find the product of (multiply) “probability me lying leads to me having lied” (which we’d set to 1), “choice-worthiness of me having lied, according to this deontological theory”, and “credence in this deontological theory”.
Thus, cases where a theory cares intrinsically about the action and not its consequences can be seen as a “special case” in which the approaches discussed in this post just collapse back to the corresponding approaches discussed in MacAskill’s thesis (which these approaches are the “generalised” versions of). This is because there’s effectively no empirical uncertainty in these cases; we can be sure that taking an action would lead to us having taken that action. Thus, in these and other cases of no relevant empirical uncertainty, accounting for empirical uncertainty is unnecessary, but creates no problems.
I’d therefore argue that a policy of using the generalised approaches by default is likely wise. This is especially the case because:
- One will typically have at least some credence in consequentialist theories.
- My impression is that even most “non-consequentialist” theories still do care at least somewhat about consequences. For example, they’d likely say lying is in fact “right” if the negative consequences of not doing so are “large enough” (and one should often be empirically uncertain about whether they would be).
In this post, I modified examples (from my prior post) in which we had only one moral uncertainty into examples in which we had one moral and one empirical uncertainty. We could think of this as “factoring out” what originally appeared to be only moral uncertainty into its “factors”: empirical uncertainty about whether an action will lead to an outcome, and moral uncertainty about the value of that outcome. By doing this, we’re more closely approximating (modelling) our actual understandings and uncertainties about the situation at hand.
But we’re still far from a full approximation of our understandings and uncertainties. For example, in the case of Julia and the innocent Smith, Julia may also be uncertain how big the riot would be, how many people would die, whether these people would be rioters or uninvolved bystanders, whether there’s a moral difference between a rioter vs a bystanders dying from the riot (and if so, how big this difference is), etc.
A benefit of the approaches shown here is that they can very simply be extended, with typical modelling methods, to incorporate additional uncertainties like these. You simply disaggregate the relevant variables into the “factors” you believe they’re composed of, assign them numbers, and multiply them as appropriate.Need to determine whether uncertainties are moral or empirical?
In the examples given just above, you may have wondered whether I was considering certain variables to represent moral uncertainties or empirical ones. I suspect this ambiguity will be common in practice (and I plan to discuss it further in a later post). Is this an issue for the approaches I’ve suggested?
I’m a bit unsure about this, but I think the answer is essentially “no”. I don’t think there’s any need to treat moral and empirical uncertainty in fundamentally different ways for the sake of models/calculations using these approaches. Instead, I think that, ultimately, the important thing is just to “factor out” variables in the way that makes the most sense, given the situation and what the moral theories under consideration “intrinsically care about”. (An example of the sort of thing I mean can be found in footnote 14, in a case where the uncertainty is actually empirical but has different moral implications for different theories.)Probability distributions instead of point estimates
You may have also thought that a lot of variables in the examples I’ve given should be represented by probability distributions (e.g., representing 90% confidence intervals), rather than point estimates. For example, why would Devon estimate the probability of “fish being harmed”, as if it’s a binary variable whose moral significance switches suddenly from 0 to -100 (according to T1) when a certain level of harm is reached? Wouldn’t it make more sense for him to estimate the amount of harm to fish that is likely, given that that better aligns both with his understanding of reality and with what T1 cares about?
If you were thinking this, I wholeheartedly agree! Further, I can’t see any reason why the approaches I’ve discussed couldn’t use probability distributions and model variables as continuous rather than binary (the only reason I haven’t modelled things in that way so far was to keep explanations and examples simple). For readers interested in an illustration of how this can be done, I’ve provided a modified model of the Devon example in this Guesstimate model. (Existing models like this one also take essentially this approach.)Closing remarks
I hope you’ve found this post useful, whether to inform your heuristic use of moral uncertainty and expected value reasoning, to help you build actual models taking into account both moral and empirical uncertainty, or to give you a bit more clarity on “modelling” in general.
In the next post, I’ll discuss how we can combine the approaches discussed in this and my prior post with sensitivity analysis and value of information analysis, to work out what specific moral or empirical learning would be most decision-relevant and when we should vs shouldn’t postpone decisions until we’ve done such learning.
What “choice-worthiness”, “cardinal” (vs “ordinal”), and “intertheoretically comparable” mean is explained in the previous post. To quickly review, roughly speaking:
- Choice-worthiness is the rightness or wrongness of an action, according to a particular moral theory.
- A moral theory is ordinal if it tells you only which options are better than which other options, whereas a theory is cardinal if it tells you how big a difference in choice-worthiness there is between each option.
- A pair of moral theories can be cardinal and yet still not intertheoretically comparable if we cannot meaningfully compare the sizes of the “differences in choice-worthiness” between the theories; basically, if there’s no consistent, non-arbitrary “exchange rate” between different theories’ “units of choice-worthiness”.
MacAskill also discusses a “Hybrid” procedure, if the theories under consideration differ in whether they’re cardinal or ordinal and/or whether they’re intertheoretically comparable; readers interested in more information on that can refer to pages 117-122 MacAskill’s thesis. An alternative approach to such situations is Christian Tarsney’s (pages 187-195) “multi-stage aggregation procedure”, which I may write a post about later (please let me know if you think this’d be valuable). ↩︎
Examples of models that effectively use something like the “MEC-E” approach include GiveWell’s cost-effectiveness models and this model of the cost effectiveness of “alternative foods”.
And some of the academic moral uncertainty work I’ve read seemed to indicate the authors may be perceiving as obvious something like the approaches I propose in this post.
But I think the closest thing I found to an explicit write-up of this sort of way of considering moral and empirical uncertainty at the same time (expressed in those terms) was this post from 2010, which states: “Under Robin’s approach to value uncertainty, we would (I presume) combine these two utility functions into one linearly, by weighing each with its probability, so we get EU(x) = 0.99 EU1(x) + 0.01 EU2(x)”. ↩︎
Some readers may be thinking the “empirical” uncertainty about fish consciousness is inextricable from moral uncertainties, and/or that the above paragraph implicitly presumes/privileges consequentialism. If you’re one of those readers, 10 points to you for being extra switched-on! However, I believe these are not really issues for the approaches outlined in this post, for reasons outlined in the final section. ↩︎
Note that my usage of “actions” can include “doing nothing”, or failing to do some specific thing; I don’t mean “actions” to be distinct from “omissions” in this context. MacAskill and other writers sometimes refer to “options” to mean what I mean by “actions”. I chose the term “actions” both to make it more obvious what the A and O terms in the formula stand for, and because it seems to me that the distinction between “options” and “outcomes” would be less immediately obvious. ↩︎
My university education wasn’t highly quantitative, so it’s very possible I’ll phrase certain things like this in clunky or unusual ways. If you notice such issues and/or have better phrasing ideas, please let me know. ↩︎
In that link, the model using MEC-E follows a similar model using regular MEC (and thus considering only moral uncertainty) and another similar model using more traditional expected value reasoning (and thus considering only empirical uncertainty); readers can compare these against the MEC-E model. ↩︎
Before I tried to actually model an example, I came up with a slightly different proposal for integrating the ideas of MEC-E and Normalised MEC. Then I realised the proposal outlined above might make more sense, and it does seem to work (though I’m not 100% certain), so I didn’t further pursue my original proposal. I therefore don't know for sure whether my original proposal would work or not (and, if it does work, whether it’s somehow better than what I proposed above). My original proposal was as follows:
- Work out expected choice-worthiness just as with regular MEC-E; i.e., follow the formula from above to incorporate consideration of the probabilities of each action leading to each outcome, the choice-worthiness of each outcome according to each moral theory, and the credence one has in each theory. (But don’t yet pick the action with the maximum expected choice-worthiness score.)
- Normalise these expected choice-worthiness scores by variance, just as MacAskill advises in the quote above. (The fact that these scores incorporate consideration of empirical uncertainty has no impact on how to normalise by variance.)
- Now pick the action with the maximum normalised expected choice-worthiness score.
G (for example) has a Borda Scoreof 2 - 0 = 2 according to utilitarianism because that theory views two options as less choice-worthy than G, and 0 options as more choice-worthy than G.
To fill in the final column, you take a credence-weighted average of the relevant action’s Borda Scores.
What follows is the corresponding table based on the moral theories' preference orderings in my prior post, when empirical uncertainty wasn't taken into account:
It’s also entirely possible for paying attention to empirical uncertainty to not change any moral theory’s preference orderings in a particular situation, or for some preference orderings to change without this affecting which action ends up with the highest Credence-Weighted Borda Score. This is a feature, not a bug.
Another perk is that paying attention to both moral and empirical uncertainty also provides more clarity on what the decision-maker should think or learn more about. This will be the subject of my next post. For now, a quick example is that Julia may realise that a lot hangs on what each moral theory’s preference ordering should actually be, or on how likely the crowd actually is to riot if she passes a verdict or innocent or calls for a retrial, and it may be worth postponing her decision in order to learn more about these things. ↩︎
Arguably, the additional complexity in the model is a cost in itself. But this is only a problem only in the same way this is a problem for any time one decides to model something in more detail or with more accuracy at the cost of adding complexity and computations. Sometimes it’ll be worth doing so, while other times it’ll be worth keeping things simpler (whether by considering only moral uncertainty, by considering only empirical uncertainty, or by considering only certain parts of one’s moral/empirical uncertainties). ↩︎
The approaches discussed in this post can also deal with theories that “intrinsically care” about other things, like a decision-maker’s intentions or motivations. You can simply add in a factor for “probability that, if I take X, it’d be due to motivation Y rather than motivation Z” (or something along those lines). It may often be reasonable to round this to 1 or 0, in which case these approaches didn’t necessarily “add value” (though they still worked). But often we may genuinely be (empirically) uncertain about our own motivations (e.g., are we just providing high-minded rationalisations for doing something we wanted to do anyway for our own self-interest?), in which case explicitly modelling that empirical uncertainty may be useful. ↩︎
For another example, in the case of Devon choosing a meal, he may also be uncertain how many of each type of fish will be killed, the way in which they’d be killed, whether each type of fish has certain biological and behavioural features thought to indicate consciousness, whether those features do indeed indicate consciousness, whether the consciousness they indicate is morally relevant, whether creatures with consciousness like that deserve the same “moral weight” as humans or somewhat lesser weight, etc. ↩︎
For example, Devon might replace “Probability that purchasing a fish meal leads to "fish being harmed"” with (“Probability that purchasing a fish meal leads to fish being killed” * “Probability fish who were killed would be killed in a non-humane way” * “Probability any fish killed in these ways would be conscious enough that this can count as “harming” them”). This whole term would then be in calculations used wherever ““Probability that purchasing a fish meal leads to "fish being harmed"” was originally used.
For another example, Julia might replace “Probability the crowd riots if Julia finds Smith innocent” with “Probability the crowd riots if Julia finds Smith innocent” * “Probability a riot would lead to at least one death” * “Probability that, if at least one death occurs, there’s at least one death of a bystander (rather than of one of the rioters themselves)” (as shown in this partial Guesstimate model). She can then keep in mind this more specific final outcome, and its more clearly modelled probability, as she tries to work out what choice-worthiness ordering each moral theory she has credence in would give to the actions she’s considering.
Note that, sometimes, it might make sense to “factor out” variables in different ways for the purposes of different moral theories’ evaluations, depending on what the moral theories under consideration “intrinsically care about”. In the case of Julia, it definitely seems to me to make sense to replace “Probability the crowd riots if Julia finds Smith innocent” with “Probability the crowd riots if Julia finds Smith innocent” * “Probability a riot would lead to at least one death”. This is because all moral theories under consideration probably care far more about potential deaths from a riot than about any other consequences of the riot. This can therefore be considered an “empirical uncertainty”, because its influence on the ultimate choice-worthiness “flows through” the same “moral outcome” (a death) for all moral theories under consideration.
However, it might only make sense to further multiply that term by “Probability that, if at least one death occurs, there’s at least one death of a bystander (rather than of one of the rioters themselves)” for the sake of the common sense theory’s evaluation of the choice-worthiness order, not for the utilitarian theory’s evaluation. This would be the case if the utilitarian theory cared not at all (or at least much less) about the distinction between the death of a rioter and the death of a bystander, while common sense does. (The Guesstimate model should help illustrate what I mean by this.) ↩︎
Additionally, the process of factoring things out in this way could by itself provide a clearer understanding of the situation at hand, and what the stakes really are for each moral theory one has credence in. (E.g., Julia may realise that passing a verdict of innocent is much less bad than she thought, as, even if a riot does occur, there’s only a fairly small chance it leads to the death of a bystander.) It also helps one realise what uncertainties are most worth thinking/learning more about (more on this in my next post). ↩︎
It's generally accepted here that theories are valuable to the extent that they provide testable predictions. Being falsifiable means that incorrect theories can be discarded and replaced with theories that better model reality (see Making Beliefs Pay Rent). Unfortunately, reality doesn't play nice and we will sometimes possess excellent theoretical reasons for believing a theory, but that theory will possess far too many degrees of freedom to make it easily falsifiability.
The prototypical example are the kinds of hypotheses that are produced by evolutionary psychology. Clearly all aspects of humanity have been shaped by evolution and the idea that our behaviour is an exception would be truly astounding. In fact, I'd say that it is something of an anti-prediction.
But what use is a theory that doesn't make any solid predictions? Firstly, believing in such a theory will normally have a significant impact on your priors, even if no-one observation would provide strong evidence of its falsehood. But secondly, if the existing viable theories all claim A and you propose viable a theory that would be compatible with A or B, then that would make B viable again. And sometimes that can be a worthy contribution in and of itself. Indeed, you can have a funny situation arise where people nominally reject a theory for not sufficiently constraining expectations, while really opposing it because of how people's expectations would adjust if the theory was true.
When we consider how good the best possible future might be, it's tempting to focus on only a handful of dimensions of change. In transhumanist thinking, these dimensions tend to be characteristics of individuals: their happiness, longevity, intelligence, and so on.  This reflects the deeply individualistic nature of our modern societies overall, and transhumanists in particular. Yet when asked what makes their lives meaningful, most people prioritise their relationships with others. In contrast, there are strands of utopian literature which focus on social reorganisation (such as Huxley’s Island or Skinner’s Walden Two), but usually without acknowledging the potential of technology to radically improve the human condition.  Meanwhile, utilitarians conceive of the best future as whichever one maximises a given metric of individual welfare - but those metrics are often criticised for oversimplifying the range of goods that people actually care about.  In this essay I've tried to be as comprehensive as possible in cataloguing the ways that the future could be much better than the present, which I've divided into three categories: individual lives, relationships with others, and humanity overall. Each section consists of a series of bullet points, with nested elaborations and examples.
I hesitated for quite some time before making this essay public, though, because it feels a little naive. Partly that’s because the different claims don’t form a single coherent narrative. But on reflection I think I endorse that: grand narratives are more seductive but also more likely to totally miss the point. Additionally, Holden Karnofsky has found that “the mere act of describing [a utopia] makes it sound top-down and centralized” in a way which people dislike - another reason why discussing individual characteristics is probably more productive.
Another concern is that even though there’s a strong historical trend towards increases in material quality of life, the same is not necessarily true for social quality of life. Indeed, in many ways the former impedes the latter. In particular, the less time people need to spend obtaining necessities, the more individualistic they’re able to be, and the more time they can spend on negative-sum status games. I don’t know how to solve this problem, or many others which currently prevent us from building a world that's good in all the ways I describe below. But I do believe that humanity has the potential to do so.  And having a clearer vision of utopia will likely motivate people to work on the problems that stand, as Dickinson put it, “between the bliss and me”. So what might be amazing about our future?
- Health. Perhaps the clearest and most obvious way to improve the human condition is to cure the diseases and prevent the accidents which currently reduce both the quality and the duration of many people’s lives. Mental health problems are particularly neglected right now - solving those could make many people much better off.
- Longevity. From some moral stances, the most important of these diseases to tackle is ageing itself, which prevents us from leading fulfilling lives many times longer than what people currently expect. Rejuvenation treatments could grant unlimited youth and postpone death arbitrarily. While the ethics and pragmatics of a post-death society are complicated (as I discuss here), this does not seem sufficient reason to tolerate the moral outrage of involuntary mortality.
- Wealth. Nobody should lack access to whatever material goods they need to lead fulfilling lives. As technology advances and we automate more and more of the economy, the need to work to subsist will diminish, and eventually vanish altogether. An extrapolation from the last few centuries of development predicts that with centuries almost everyone will be incredibly wealthy by today’s standards. Luxuries that are now available only to a few (or to none at all) will become widespread.
- Life in simulation. In the long term, the most complete way to achieve these two goals may be for us to spend almost all of our time in virtual reality, where possessions can be generated on demand, physical inconveniences will be eliminated, and our experiences will be limited only by our imaginations. Eventually this will likely lead to us uploading our minds and permanently inhabiting vast, shared virtual worlds. The key ideas in all of the points that follow this one are applicable whether we inhabit physical or virtual realities.
- Alleviation of suffering. Evolution has stacked the hedonic deck against us: the extremes of pain are much greater than the extremes of pleasure, and more easily accessible too. But bioengineering and neuroscience will eventually reach a point where we could move towards eradicating suffering (including mental anguish and despair) and fulfilling the goal of David Pearce's abolitionist project. Perhaps keeping something similar to physical pain or mental frustration would still be useful for adding spice or variety to our lives - but it need not be anywhere near the worst and most hopeless extremes of either.
- Freedom from violence and coercion. As part of this project, any utopia must prevent man’s inhumanity to man, and the savagery and cruelty which blight human history. This would be the continuation of a longstanding trend towards less violent and freer societies.
- Non-humans. The most horrific suffering which currently exists is not inflicted on humans, but on the trillions of animals with which we share the planet. While most of this essay is focused on human lives and society, preventing the suffering of conscious non-human life (whether animals or aliens or AIs) is a major priority.
- Deep pleasure and happiness. Broadly speaking, positive emotions are much more complicated than negative ones. Physical pleasure may be simple, but under the umbrella of happiness I also include excitement, contentment, satisfaction, wonder, joy, love, gratitude, amusement, ‘flow’, aesthetic appreciation, the feeling of human connection, and many more!
- Better living through chemistry. There’s no fundamental reason why our minds couldn’t be reconfigured to experience much more of all of the positive emotions I just listed: why the ecstasy of the greatest day of your life couldn’t be your baseline state, with most days surging much higher; why all food couldn’t taste better than the best food you’ve ever had; why everyday activities couldn’t be more exhilarating than extreme sports.
- Positive attitudes. Our happiness is crucially shaped by our patterns of thought - whether we’re optimistic and cheerful about our lives, rather than pessimistic and cynical. While I wouldn’t want a society in which people’s expectations were totally disconnected from reality, there’s a lot of room for people to have healthier mindsets and lead more satisfied lives.
- Self-worth. In particular, it’s important for people to believe that they are valuable and worthwhile. In today’s society it’s far too easy to be plagued by low self-esteem, which poisons our ability to enjoy what we have.
- Peak fun. Our society is already unprecedentedly entertainment-driven. With even fewer material constraints, we will be able to produce a lot of fun activities. Over time this will involve less passive consumption of media and more participation in exciting adventures that become intertwined with the rest of our lives. 
- New types of happiness. The best experiences won’t necessarily come about just by scaling up our existing emotions, but also by creating new ones. Consider that our ability to appreciate music is an evolutionary accident, but one which deeply enriches our current lives. Our future selves could have many more types of experiences deliberately designed to be as rich and powerful as possible.
- Choice and self-determination. Humans are more than happiness machines, though. We have dreams about our lives, and we devote ourselves to achieving them. While it’s not always straightforwardly good for people to be able to fulfil their desires (in particular desires involving superiority over other people, which I’ll discuss later), these activities give us purpose and meaning, and it seems unjust when we are unable to fulfil our plans because we are helpless in the face of external circumstances. Yet neither are the best desires those which can be fulfilled with the snap of a finger, or which steer us totally clear of any hardship. Rather, we should be able to set ourselves goals that are challenging yet achievable, goals which we might struggle with - but whose completion is ultimately even more fulfilling because of that. What might they be?
- Making a difference to others. In a utopian future, dramatically improving other people’s lives would be much more difficult than it is today. Nevertheless, we can impact others via our relationships with them, as I’ll discuss in the next section.
- Growth. People often set goals to push themselves, grow more and learn more. In those cases the specific achievements are less relevant than the lessons we take from them.
- Tending your garden. Continuous striving isn’t for everyone. An alternative is the pursuit of peace and contentment, mindfulness and self-knowledge.
- Self-expression. Everyone has a rich inner life, but most of us rarely (or never) find the means to express our true selves. I envisage unlocking the writer or musician or artist inside each of us - so that we can each tell our own story, and endless other stories most beautiful.
- Life as art. I picture a world of “human beings who are new, unique, incomparable… who create themselves!” We can think of our lives as canvases upon which we each have the opportunity to paint a masterpiece. For some, that will simply involve pursuing all the other goods I describe in this essay. Others might prioritise making their lives novel, or dramatic, or aesthetically pleasing (even if that makes them less happy).
- Life at a larger scale. With more favourable external circumstances, individuals will be able to shape their lives on an unprecedented scale. We could spend centuries on a single project, or muster together billions for vast cooperative ventures. We could also remain the “same” continuous person as long as we wanted, rather than inevitably losing touch of the past.
- Cultivation of virtue. Although less emphasised in modern times, living a good life has long been associated with building character and developing virtues. Doing so is not primarily about changing what happens in our lives, but rather changing how we respond to it. There’s no definitive characterisation of a virtuous person, though: we all have our own intuitions about what traits (integrity, kindness, courage, and so on) we admire most in others. And different philosophical traditions emphasise different virtues, from Aristotle's 'greatness of soul' to Confucius' 'familial piety' to Buddha's ‘loving kindness’ (and the other brahmaviharas).  Deciding which virtues are most valuable is a task both for individuals and for society as a whole - with the goal of creating a world of people who have deliberately cultivated the best versions of themselves. 
- Intelligence. As we are, we can comprehend many complex concepts, but there are whole worlds of thought that human-level intelligences can never fully understand. If a jump from chimpanzee brain size to our brain size opened up such vast cognitive vistas, imagine what else might be possible when we augment our current brains, scale up our intelligence arbitrarily far, and lay bare the patterns that compose the universe.
- The joy of learning. Today, learning is usually a chore. Yet humans are fundamentally curious creatures; and there can be deep satisfaction in discovery and understanding. Education should be a game, which we master through play. We might even want to reframe science as a quest for hidden truths, so that each person can experience for themselves what it’s like to push forward the frontiers of knowledge.
- Self-understanding. In many ways, we’re inscrutable even to ourselves, with our true beliefs and motivations hidden beneath the surface of consciousness. As we become more intelligent, we will better understand how we really work, fulfilling the longstanding, elusive quest to “know thyself”.
- Agency. Each human is a collection of modules in a constant tug of war. We want one thing one day, and another the next. We procrastinate and we contradict ourselves and we succumb to life-ruining addictions. But this needn’t be the case. Imagine yourself as a unified agent, one who is able to make good choices for yourself, and stick to them - one who’s not overwhelmed by anger, or addiction, or other desires that your reflective self doesn’t endorse. This might be achieved by brain modification, or by having a particularly good AI assistant which knows how to nudge you into being a more consistent version of yourself.
- Memory. Today we lose most of our experiences to forgetfulness. But we could (and have already started to) outsource our memories to more permanent storage media accessible demand, so that we can stay in touch with our pasts indefinitely.
- The extended mind. Clark and Chalmers have argued that we should consider external thinking aids to be part of our minds. Right now these aids are very primitive, and interface with our brains in very limited ways - but that will certainly improve over time, until accessing the outputs of external computation is similar to any other step in our thinking. The result will be as if we’d each internalised all of human knowledge.
- Variety and novelty of experiences.
- Seeing the universe. The urge to travel and explore is a deep-rooted one. Eventually we will be able to roam as far as we like, and observe the wonder and grandeur of the cosmos.
- Explorations of the human condition. Most of us inhabit fairly limited social circles, which don’t allow us to experience different ways of life and different people’s perspectives. Given the time to do so, we could learn a lot from the sheer variety of humanity, and import those lessons into the rest of our lives.
- Explorations of consciousness. Right now the conscious states that we’re able to experience are limited to those induced by the handful of psychoactive chemicals that we or evolution have stumbled upon. Eventually, though, we will develop totally different ways of experiencing the world that are literally inconceivable to us today.
- Spiritual experiences. One such mental shift that people already experience is the feeling of spiritual enlightenment. Aside from its religious connotations, this can be a valuable shift of perspective which give us new insights into how to live our lives.
- Progress on our journeys. A key part of leading a meaningful life is continual growth and transcendence of one’s past self, each moving towards becoming the person we want to be. That might mean becoming a more virtuous person, or more successful, or more fulfilled - as long as we’re able to be proud of our achievements so far, and hopeful about the future.
- Justified expectation of pleasant surprises. One important factor in creating this sensation of progress is uncertainty about exactly what the future has in store for us. Although we should be confident that our lives will become better, this should sometimes come in the form of pleasant surprises rather than just ticking off predictable checkpoints.
- Levelling up. One way that this growth might occur is if people’s lives consist of distinct phases, each with different opportunities and challenges. Once someone thinks they have gained all that they desire from one phase, they can choose to move on. In an extreme case, the nature and goals of a subsequent phase might be incomprehensible to those in earlier phases - in the same way that children don’t understand what it’s like to be an adult, and most people don’t understand Buddhist enlightenment. For fictional precedent, consider sublimation in Banks’ Culture, or the elves leaving Middle-Earth in Tolkein’s mythos.
- Guardrails. Extended lives should be very hard to irreversibly screw up, since there’s so much at stake - especially if we have much greater abilities to modify ourselves than we do today.
- Leaving a legacy. People want to be remembered after they’ve moved on. Even in a world without death, each person should have had the opportunity to make a lasting difference in their communities before they leave for their next great adventure.
Relationships with others
For most of us, our relationships (with friends, family and romantic partners) are what we find most valuable in life. By that metric, though, it’s plausible that Westerners are poorer than we’ve ever been. What would it mean for our social lives to be as rich as our material lives have and will become? Imagine living in communities and societies that didn’t just allow you to pursue your best life, but were actively on your side - that were ideally designed to enable the flourishing of their inhabitants.
- Stronger connections. Most relationships are nowhere near as loving or as fulfilling as they might ideally be. That might be because we’re afraid of vulnerability, or we don’t know how to nurture these relationships (knowing how to be a good friend is more valuable than almost anything learned in classes, but taught almost nowhere), or we simply struggle to find and spend time with people who complement us. Imagine a society which is as successful as solving these problems as ours has been at solving scientific and engineering problems, for example by designing better social norms, giving its citizens more time and space for each other, and teaching individuals to think about their relationships in the most constructive ways.
- Abolishing loneliness. I envisage a future where loneliness has been systematically eradicated, by helping everyone find social environments in which they can flourish, and by providing comprehensive support for people struggling with building or maintaining relationships. I imagine too a future without the social anxieties which render many of us insecure and withdrawn.
- Love, love, love. What would utopia be without romantic love and passion? This is an obsession of modern culture - and yet it’s also something that doesn’t always come naturally. We could improve romance by reducing the barriers of fear and insecurity, allowing people to better create true intimacy. Even the prosaic solutions of better educational materials and cultural norms might go a long way towards that.
- Commitment and trust. In my mind, the key feature of both romance and friendship is deep commitment and trust, and the common knowledge that you’re each there for the other person. Whatever the bottlenecks are to more people building up that sort of bond - inability to communicate openly and honestly; or a lack of empathy; or even the absence of shared adversity - we could focus society’s efforts towards remedying.
- Free love. While there’s excitement in the complex dance of romance, a lot of the hangups around sex serve only to make people anxious and unhappy. Consenting adults should feel empowered to pursue each other; and of course utopia should include some great sex.
- Ending toxic relationships. We can reduce and manage the things that make relationships toxic, like jealousy, narcissism, and abuse. This might happen via mental health treatment, better education, better knowledge of how to raise well-socialised children, or cultural norms which facilitate healthy relationships.
- Longer connections. I think it’s worth noting the positive effect that longevity could have on personal relationships. There’s a depth and a joy to being lifelong friends - but how much stronger could it be when those lives stretch out across astronomical timescales? This is not to say that we should bind ourselves to the same people for our whole extended lives - rather, we can spend time together and separate in the knowledge that it need never be a final parting, with each reunion a thing of joy.
- Life as a group project. In addition to one-on-one relationships, there’s a lot of value in being part of a close-knit group with deep shared bonds - a circle of lifelong friends, or soldiers who trust each other with their lives, or a large and loving family. Many people don’t have any of these, but I hope that they could.
- Better starts to life. The quality of relationships is most important for the most vulnerable among us. In a utopian future, every child would be raised with love, and allowed to enjoy the wonder of childhood; and indeed, they would keep that same wonder long into their adult lives.
- Less insular starts to life. Today, many children only have the opportunity to interact substantively with a handful of adults. While I’m unsure about fully-communal parenting, children who will become part of a broader community shouldn’t be shut off from that community; rather, they should have the chance to befriend and learn from a range of people. Meanwhile, spending more time with children would enrich the lives of many adults.
- Families, extended. What is the most meaningful thing for the most people? Probably spending time with their children and grandchildren, and knowing that with their family they’ve created something unique and important. A utopian vision of family would have the same features, but with each person living to see their lineage branch out into a whole forest of descendants, with them at the root.
- Healthy societies. In modern times our societies are too large and fragmented to be the close-knit groups I mentioned above. Yet people can also find meaning in being part of something much larger than themselves, and working together towards the common goal of building and maintaining a utopia.
- Positive norms. The sort of behaviours that are socially encouraged and rewarded should be prosocial ones which contribute to the well-being of society as a whole.
- (The good parts of) tribalism and patriotism. The feeling of being part of a cohesive group of people unified by a common purpose is a powerful one. At a small scale, we currently get this from watching sports, or singing in a choir. At larger scales, those same feelings often lead to harmful nationalist behaviour - yet at their best, they could give us a world in which people feel respect for and fraternity with all those around them by default, simply due to their shared humanity.
- Tradition and continuity. Another key component of belonging to something larger than yourself is continuing a long-lived legacy. Traditions could be maintained over many millennia in a way which gives each person a sense of their place in history.
- Political voices. Our current societies are too large for their overall directions to be meaningfully influenced by most people. But we can imagine mechanisms which allow individuals to weigh in on important questions in their local communities to a much greater extent. And people could at least know that their voice and vote have as much weight in the largest-scale decisions as anyone else’s.
- Meetings of minds. Today, humans communicate through words and gestures and body language. These are very low-bandwidth channels, compared with what is theoretically possible. In particular, brain interfaces could allow direct communication from one person’s mind to another. That wouldn’t just be quicker talking, but a fundamentally different mode of communication, as if another person were finishing your own thoughts. And consider that our “selves” are not discrete entities, but are made up of many mental modules. If we link them in new ways, the boundaries between you and other people might become insubstantial - you might temporarily (or permanently) become one larger person.
- Mitigating status judgements and dominance-seeking. In general we can’t hope to understand social interactions without considering status and hierarchy. We want to date the most attractive people and have the most prestigious jobs and become as wealthy as possible in large part to look better than others. The problem is that not everyone can reach the top, and so widespread competition to do so will leave many dissatisfied. In other cases, people are directly motivated to dominate and outcompete each other - such as businesspeople who want to crush their rivals. While this can be useful for driving progress, in the long term those motivations would ideally be channeled in ways which are more conducive to long-lasting fulfilment. For example, aggressive instincts could be directed towards recreational sports rather than relationships or careers.
- Diverse scales of success. To make social dynamics more positive-sum, we should avoid sharing one single vision, which everyone is striving towards, of what a successful life looks. We can instead encourage people to tie their identities to the subcommunities they care most about, rather than measuring themselves against the whole world (though for an objection to this line of reasoning see Katja Grace’s post here).
- More equality of status. To the extent that we still have hierarchies and negative-sum games, it should at least be the case that nobody is consistently at the bottom of all of them, and everyone can look forward to their time of recognition and respect (as in the system I outline in this blog post).
When we zoom out to consider the trajectory of humanity as a whole, there are some desirable properties which we might want it to have. Although there are reasons to distrust such large-scale judgements (in particular the human tendency towards scope insensitivity) these are often strong intuitions which do matter to us.
- Sheer size. The more people living worthwhile lives, the better - and with the astronomical resources available to us, we have the opportunity to allow our descendants to number in the uncountable trillions.
- Solving coordination. In general, we’re bad at working together to resolve problems. This could be solved by mechanisms to make politics and governance transparent, accountable and responsive at a variety of levels. In other words, imagine humanity at one with itself and able to set its overall direction, rather than trapped in our current semi-anarchic default condition.
- The end of war. Others have spoken of the senseless horror of war much better than I can. I will merely add that some human war will be our last war; let us hope that it gains that distinction for the right reason.
- Avoiding races to the bottom. Under most people’s ethical intuitions, we should dislike the Malthusian scenario in which, even as our wealth grows vastly, our populations will grow even faster, so that most people end up with subsistence-level resources. To avoid this, we will need to ability to coordinate well at large scales.
- The pursuit of knowledge. As a species we will learn and discover more and more over time. Eventually we will understand both the most fundamental building blocks of nature and also the ways in which complex systems like our minds and societies function.
- Moral progress. In particular, we will come to a better understanding of ethics, both in theory and in ways that we can actually act upon - and then hopefully do so, to create just societies. While it’s difficult to predict exactly where moral progress will take us, one component which seems very important is building a utopia for all, with widespread access to the opportunity to pursue a good life. In particular, this should probably involve everyone having certain basic rights - such as the ability to participate in the major institutions of civil society, as Anderson describes.
- Exploring the diversity of life. Many people value our current world’s variety of cultures and lifestyles - but over many millennia our species will be able to explore the vast frontiers of what worthwhile lives and societies could look like. The tree of humanity will branch out in ways that are unimaginable to us now.
- Speciation. Even supposing that we are currently alone in the universe, we need not be the last intelligent species. Given sufficient time, it might become desirable to create descendant species, or split humanity into different branches which experience different facets of life. Or we might at least enjoy the companionship of animals, whether they be species that currently exist or those which we create ourselves.
- Making our mark. The universe is vast, but we have plenty of time. Humanity could expand to colonise this galaxy, and others, in a continual wave of exploration and pursuit of the unknown. We might create projects of unimaginable scale, reengineering the cosmos as we choose, and diverting the current astronomical waste towards the well-being of ourselves and our descendants.
- Creativity and culture. The ability to create new life, design entire worlds, and perform other large-scale feats, will allow unmatched expressions of artistry and beauty.
- Humanity’s final flourishing. In the very very long term, under our current understanding of physics, humanity will run out of energy to sustain itself, and our civilisation will draw to an end. If we cannot avoid that, at least we can design our species’ entire trajectory, including that final outcome, with the wisdom of uncountable millennia.
For all of the changes listed above, there are straightforward reasons why they would be better than the status quo or than a move in the opposite direction. However, there are some dimensions along which we might eventually want to move - but in which direction, I don’t know.
- Privacy, or lack thereof. In many ways people have become more open over the past few centuries. But we now also place more importance on individual rights such as the right to privacy. I could see a future utopia in which there were very few secrets, and radical transparency was the norm - but also the opposite, in which everyone had full control over which aspects of themselves others could access, even up to their appearance and name (as in this excellent novel).
- Connection with nature. Many people value this very highly. By contrast, transhumanists generally want to improve on nature, not return to it. In the long term, we might synthesise these two by creating new environments and ecosystems that are even more peaceful and beautiful and grand those which exist today - but I don’t know how well those would match people’s current conceptions of natural life.
- New social roles. Each of us plays many social roles, and is bound by the corresponding constraints and expectations. I think such roles will be an important part of social interactions even in a utopia: we don’t want total homogeneity. However, our current roles - gender roles, family roles, job roles and so on - are certainly not optimal for everyone. I can imagine them being replaced by social roles which are just as strong, but which need to be opted into, or provide more flexibility in other ways. Yet I’m hesitant to count this as an unalloyed good, because the new roles might seem bizarre and alien to us, even if our descendants think of them as natural and normal (as illustrated in this fascinating story by Scott Alexander). Consider, for instance, how strange the hierarchical roles of historical societies seem to us today - and then imagine a future in which our version of romance is just as antiquated, in favour of totally new narratives about what makes relationships meaningful.
- Unity versus freedom. Unity of lifestyle and purpose was a key component of many historical utopias. Some more recent utopias, like Banks’ Culture, propound the exact opposite: total freedom for individuals to live radically diverse lives. Which is better? The temperament of the time urges me towards the latter, which I think is also more intuitive at astronomical scales, but this would also make it harder to implement the other features of the utopia I’ve described, if there’s extensive disagreement about what goals to pursue, and how. Meanwhile one downside of unity is the necessity of enforcing social norms, for example by ostracising or condemning those who disobey.
- The loss or enhancement of individuality. The current composition of our minds - having very high bandwidth between different parts of our brain, and very low bandwidth between our brains and others’ - is a relic of our evolutionary history. Above, I described the benefits of reducing the communication boundaries between different people. But I’m not sure how far to take this: would we want a future in which individuality is obsolete, with everyone merging into larger consciousnesses? Or would it be better if, despite increasing communication bandwidth, we place even greater value on long-term individuality, since our lives will be much less transient?
- Cloning and copying. Other technologies which might affect our attitudes towards individuality are those which would allow us to create arbitrarily many people arbitrarily similar to ourselves.
- Self-modification. The ability to change arbitrary parts of your mind is a very powerful one. At its best, we can make ourselves the people we always wanted to be, transcending human limitations. At its worst, there might be pressure to carve out the parts of ourselves that make us human, like Hanson discusses in Age of Em.
- Designer people. Eventually we will be able to specify arbitrary characteristics of our children, shaping them to an unprecedented extent. However, I don’t know if that’s a power we should fully embrace, either as individuals or as societies.
- Wireheading. I’m uncertain about the extent to which blissing out on pleasure (at the expense of pursuing more complex goals) is something we should aim for.
- Value drift. More generally, humanity’s values will by default change significantly over time. Whether to prevent that or to allow it to happen is a tricky question. The former implies a certain type of stagnation - we are certainly glad that the Ancient Greeks did not lock in their values. The latter option could lead us to a world which looks very weird and immoral by our modern sensibilities.
. See, for instance, the conspicuous absence of relationships and communities in works such as Nick Bostrom’s Transhumanist FAQ. His summary of the transhumanist perspective: “Many transhumanists wish to follow life paths which would, sooner or later, require growing into posthuman persons: they yearn to reach intellectual heights as far above any current human genius as humans are above other primates; to be resistant to disease and impervious to aging; to have unlimited youth and vigor; to exercise control over their own desires, moods, and mental states; to be able to avoid feeling tired, hateful, or irritated about petty things; to have an increased capacity for pleasure, love, artistic appreciation, and serenity; to experience novel states of consciousness that current human brains cannot access.” See also Yudkowsky: “It doesn't get any better than fun.” Meanwhile the foremost modern science fiction utopia, Banks’ Culture, is also very individualistic.
. Some interesting quotes from Walden Two:
- “Men build society and society builds men.”
- “The behavior of the individual has been shaped according to revelations of ‘good conduct,’ never as the result of experimental study. But why not experiment? The questions are simple enough. What’s the best behavior for the individual so far as the group is concerned? And how can the individual be induced to behave in that way? Why not explore these questions in a scientific spirit?”
- “We undertook to build a tolerance for annoying experiences. The sunshine of midday is extremely painful if you come from a dark room, but take it in easy stages and you can avoid pain altogether. The analogy can be misleading, but in much the same way it’s possible to build a tolerance to painful or distasteful stimuli, or to frustration, or to situations which arouse fear, anger or rage. Society and nature throw these annoyances at the individual with no regard for the development of tolerances. Some achieve tolerances, most fail. Where would the science of immunization be if it followed a schedule of accidental dosages?”
And from Island:
- “That would distract your attention, and attention is the whole point. Attention to the experience of something given, something you haven't invented in your imagination.”
- "We all belong to an MAC—a Mutual Adoption Club. Every MAC consists of anything from fifteen to twenty-five assorted couples. Newly elected brides and bridegrooms, old-timers with growing children, grandparents and great-grandparents—everybody in the club adopts everyone else. … An entirely different kind of family. Not exclusive, like your families, and not predestined, not compulsory. An inclusive, unpredestined and voluntary family. Twenty pairs of fathers and mothers, eight or nine ex-fathers and ex-mothers, and forty or fifty assorted children of all ages."
- “[Large, powerful men] are just as muscular here, just as tramplingly extraverted, as they are with you. So why don’t they turn into Stalins or Dipas, or at the least into domestic tyrants? First of all, our social arrangements offer them very few opportunities for bullying their families, and our political arrangements make it practically impossible for them to domineer on any larger scale. Second, we train the Muscle Men to be aware and sensitive, we teach them to enjoy the commonplaces of everyday existence. This means that they always have an alternative—innumerable alternatives—to the pleasure of being the boss. And finally we work directly on the love of power and domination that goes with this kind of physique in almost all its variations. We canalize this love of power and we deflect it—turn it away from people and on to things. We give them all kinds of difficult tasks to perform—strenuous and violent tasks that exercise their muscles and satisfy their craving for domination—but satisfy it at nobody’s expense and in ways that are either harmless or positively useful.”
. For a short introduction to this debate, see section 3 in the Stanford Encyclopedia of Philosophy’s entry on Consequentialism.
. For stylistic purposes I wrote much of this essay in the future tense, without always hedging with “we might” and “it’s possible that”. Please don’t interpret any of my descriptions as confident predictions - rather, treat them as expressions of possibility and hope.
. As Tim Ferris puts it, “excitement is the more practical synonym for happiness”.
. For an analysis of the similarities between these three traditions, I recommend Shannon Vallor's Technology and the Virtues.
. For a (somewhat fawning) description of such a society, see Swift’s Houynhnhnms, which are “endowed by nature with a general disposition to all virtues, and have no conceptions or ideas of what is evil in a rational creature”; and which the narrator wants “for civilizing Europe, by teaching us the first principles of honour, justice, truth, temperance, public spirit, fortitude, chastity, friendship, benevolence, and fidelity.”
So, human values are fragile, vague and possibly not even a well defined concept, yet figuring it out seems essential for an aligned AI. It seems reasonable that, faced with a hard problem, one would start instead with a simpler one that has some connection to the original problem. For someone not working in the area of ML or AI alignment, it seems obvious that researching simpler-than-human values might be a way to make progress. But maybe this is one of those false obvious ideas that non-experts tend to push after a cursory learning about a complex research topic.
That said, assuming that the value complexity scales with intelligence, studying less intelligent agents and their version of values maybe something to pursue. Dolphin values. Monkey values. Dog values. Cat values. Fish values. Amoeba values. Sure, we lose the inside view in this case, but the trade-off seems at least being worthy of exploring. Is there any research going in that area?
In the early days of electronic computers, machines like ENIAC were used for niche applications such as calculating ballistic trajectories and simulating nuclear explosions. These applications are very far removed from what computers are predominantly used for today.
It seems we have reached a similar development stage with regard to quantum computing. The applications researchers cite, cryptographic analysis and quantum systems simulation, are once again very far removed from everyday life.
Does anyone have a prediction on what quantum computers will be used for once they become affordable enough for regular people? Or will it forever remain just a research tool? Are quantum computers the key to cracking quantum chemistry and thereby molecular nanotechnology? (this is my guess as to what the actual impact of quantum computing will be)
A few months ago I brought home an Emenee toy organ someone was throwing out. It didn't work, and it sat in the basement for a while, but today I had a go at fixing it. It's a very simple design: a fan blows air through a hole in the bottom, setting up a pressure difference between the inside and outside. When you press a key that opens a corresponding hole and air can flow through, past a little plastic reed, whose vibration makes the note.
With mine, the motor was running but I wasn't getting any sound. I opened it up and nothing was obviously wrong. I figured it was probably leaky, and put tape around base where the plastic sides meet the chipboard bottom. This fixed it enough to get some sound:
The lower octave isn't working at all, and the higher notes get progressively slower to sound and breathier until the very highest don't sound at all, but this leaves an octave and a half of chromatic range. Enough to play around with!
The left hand buttons play chords, and are arranged:
The circle of fifths arrangement makes a lot of sense, but the choice to put major and minor chords adjacent does not. If you're playing in a major key, you generally want the vi and ii minors, so Am and Dm in the key of C. This means the vi minors they've included are used in flatter keys than they've provided: F and C can use Dm and Am, but G and D would like Em and Bm which are absent. Shifting the minors over three notes would be much better. They layout would change from:Bbm Fm Cm Gm Dm Am Bb F C G D A to Gm Dm Am Em Bm F#m Bb F C G D A Now in each key you have: ii vi IV I V instead of: ii vi IV I V
Playing folk music I also would have preferred that they center on G or D, since I care much more about having E than Bb.
Another change that would be nice would be to offer a way to control the way air gets into the organ. There's a hole on the bottom for the fan, and if you cover it the organ stops working because air can't get through. It has a screw-adjustable cover, which looks like it's designed as a volume control:
You could build a simple adjustable cover connected to a foot pedal, and get a real expression pedal. This would have added to the cost of the instrument, so I understand why they wouldn't want to do this with a toy, but it would be a simple-add on.
I'm probably not going to keep this, since my regular keyboard can do everything this can do, so if you're in the Boston area and would enjoy playing with it let me know!
Honest rational agents should never agree to disagree.
This idea is formalized in Aumann's agreement theorem and its various extensions (we can't foresee to disagree, uncommon priors require origin disputes, complexity bounds, &c.), but even without the sophisticated mathematics, a basic intuition should be clear: there's only one reality. Beliefs are for mapping reality, so if we're asking the same question and we're doing everything right, we should get the same answer. Crucially, even if we haven't seen the same evidence, the very fact that you believe something is itself evidence that I should take into account—and you should think the same way about my beliefs.
In "The Coin Guessing Game", Hal Finney gives a toy model illustrating what the process of convergence looks like in the context of a simple game about inferring the result of a coinflip. A coin is flipped, and two players get a "hint" about the result (Heads or Tails) along with an associated hint "quality" uniformly distributed between 0 and 1. Hints of quality 1 always match the actual result; hints of quality 0 are useless and might as well be another coinflip. Several "rounds" commence where players simultaneously reveal their current guess of the coinflip, incorporating both their own hint and its quality, and what they can infer about the other player's hint quality from their behavior in previous rounds. Eventually, agreement is reached. The process is somewhat alien from a human perspective (when's the last time you and an interlocutor switched sides in a debate multiple times before eventually agreeing?!), but not completely so: if someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you would infer that they had strong evidence or counterarguments of their own, even if there was some reason they couldn't tell you what they knew.
Honest rational agents should never agree to disagree.
In "Disagree With Suicide Rock", Robin Hanson discusses a scenario where disagreement seems clearly justified: if you encounter a rock with words painted on it claiming that you, personally, should commit suicide according to your own values, you should feel comfortable disagreeing with the words on the rock without fear of being in violation of the Aumann theorem. The rock is probably just a rock. The words are information from whoever painted them, and maybe that person did somehow know something about whether future observers of the rock should commit suicide, but the rock itself doesn't implement the dynamic of responding to new evidence.
In particular, if you find yourself playing Finney's coin guessing game against a rock with the letter "H" painted on it, you should just go with your own hint: it would be incorrect to reason, "Wow, the rock is still saying Heads, even after observing my belief in several previous rounds; its hint quality must have been very high."
Honest rational agents should never agree to disagree.
Human so-called "rationalists" who are aware of this may implicitly or explicitly seek agreement with their peers. If someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you might think, "Hm, we still don't agree; I should update towards their position ..."
But another possibility is that your trust has been misplaced. Humans suffering from "algorithmic bad faith" are on a continuum with Suicide Rock. What matters is the counterfactual dependence of their beliefs on states of the world, not whether they know all the right keywords ("crux" and "charitable" seem to be popular these days), nor whether they can perform the behavior of "making arguments"—and definitely not their subjective conscious verbal narratives.
And if the so-called "rationalists" around you suffer from correlated algorithmic bad faith—if you find yourself living in a world of painted rocks—then it may come to pass that protecting the sanctity of your map requires you to master the technique of lonely dissent.
Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.
Happy New Year!
Audio version here (may not be up yet).
AI Alignment Podcast: On DeepMind, AI Safety, and Recursive Reward Modeling (Lucas Perry and Jan Leike) (summarized by Rohin): While Jan originally worked on theory (specifically AIXI), DQN, AlphaZero and others demonstrated that deep RL was a plausible path to AGI, and so now Jan works on more empirical approaches. In particular, when selecting research directions, he looks for techniques that are deeply integrated with the current paradigm, that could scale to AGI and beyond. He also wants the technique to work for agents in general, rather than just question answering systems, since people will want to build agents that can act, at least in the digital world (e.g. composing emails). This has led him to work on recursive reward modeling (AN #34), which tries to solve the specification problem in the SRA framework (AN #26).
Reward functions are useful because they allow the AI to find novel solutions that we wouldn't think of (e.g. AlphaGo's move 37), but often are incorrectly specified, leading to reward hacking. This suggests that we should do reward modeling, where we learn a model of the reward function from human feedback. Of course, such a model is still likely to have errors leading to reward hacking, and so to avoid this, the reward model needs to be updated online. As long as it is easier to evaluate behavior than to produce behavior, reward modeling should allow AIs to find novel solutions that we wouldn't think of.
However, we would eventually like to apply reward modeling to tasks where evaluation is also hard. In this case, we can decompose the evaluation task into smaller tasks, and recursively apply reward modeling to train AI systems that can perform those small helper tasks. Then, assisted by these helpers, the human should be able to evaluate the original task. This is essentially forming a "tree" of reward modeling agents that are all building up to the reward model for the original, hard task. While currently the decomposition would be done by a human, you could in principle also use recursive reward modeling to automate the decomposition. Assuming that we can get regular reward modeling working robustly, we then need to make sure that the tree of reward models doesn't introduce new problems. In particular, it might be the case that as you go up the tree, the errors compound: errors in the reward model at the leaves lead to slightly worse helper agents, which lead to worse evaluations for the second layer, and so on.
He recommends that rather than spending a lot of time figuring out the theoretically optimal way to address a problem, AI safety researchers should alternate between conceptual thinking and trying to make something work. The ML community errs on the other side, where they try out lots of techniques, but don't think as much about how their systems will be deployed in the real world. Jan also wants the community to focus more on clear, concrete technical explanations, rather than vague blog posts that are difficult to critique and reason about. This would allow us to more easily build on past work, rather than reasoning from first principles and reinventing the wheel many times.
DeepMind is taking a portfolio approach to AI safety: they are trying many different lines of attack, and hoping that some of them will pan out. Currently, there are teams for agent alignment (primarily recursive reward modeling), incentive theory, trained agent analysis, policy, and ethics. They have also spent some time thinking about AI safety benchmarks, as in AI Safety Gridworlds, since progress in machine learning is driven by benchmarks, though Jan does think it is quite hard to create a well-made benchmark.
Rohin's opinion: I've become more optimistic about recursive reward modeling since the original paper (AN #34), primarily (I think) because I now see more value in approaches that can be used to perform specific tasks (relative to approaches that try to infer "human values").
I also appreciated the recommendations for the AI safety community, and agree with them quite a lot. Relative to Jan, I see more value in conceptual work described using fuzzy intuitions, but I do think that more effort should be put into exposition of that kind of work.Technical AI alignmentLearning human intent
Learning human objectives by evaluating hypothetical behaviours (Siddharth Reddy et al) (summarized by Rohin): Deep RL from Human Preferences updated its reward model by collecting human comparisons on on-policy trajectories where the reward model ensemble was most uncertain about what the reward should be. However, we want our reward model to be accurate off policy as well, even in unsafe states. To this end, we would like to train our reward model on hypothetical trajectories. This paper proposes learning a generative model of trajectories from some dataset of environment dynamics, such as safe expert demonstrations or rollouts from a random policy, and then finding trajectories that are "useful" for training the reward model. They consider four different criteria for usefulness of a trajectory: uncertain rewards (which intuitively are areas where the reward model needs training), high rewards (which could indicate reward hacking), low rewards (which increases the number of unsafe states that the reward model is trained on), and novelty (which covers more of the state space). Once a trajectory is generated, they have a human label it as good, neutral, or unsafe, and then train the reward model on these labels.
The authors are targeting an agent that can explore safely: since they already have a world model and a reward model, they use a model-based RL algorithm to act in the environment. Specifically, to act, they use gradient descent to optimize a trajectory in the latent space that maximizes expected rewards under the reward model and world model, and then take the first action of that trajectory. They argue that the world model can be trained on a dataset of safe human demonstrations (though in their experiments they use rollouts from a random policy), and then since the reward model is trained on hypothetical behavior and the model-based RL algorithm doesn't need any training, we get an agent that acts without us ever getting to an unsafe state.
Rohin's opinion: I like the focus on integrating active selection of trajectory queries into reward model training, and especially the four different kinds of active criteria that they consider, and the detailed experiments (including an ablation study) on the benefits of these criteria. These seem important for improving the efficiency of reward modeling.
However, I don't buy the argument that this allows us to train an agent without visiting unsafe states. In their actual experiments, they use a dataset gathered from a random policy, which certainly will visit unsafe states. If you instead use a dataset of safe human demonstrations, your generative model will only place probability mass on safe demonstrations, and so you'll never generate trajectories that visit unsafe states, and your reward model won't know that they are unsafe. (Maybe your generative model will generalize properly to the unsafe states, but that seems unlikely to me.) Such a reward model will either be limited to imitation learning (sticking to the same trajectories as in the demonstrations, and never finding something like AlphaGo's move 37), or it will eventually visit unsafe states.
Causal Confusion in Imitation Learning (Pim de Haan et al) (summarized by Asya): This paper argues that causal misidentification is a big problem in imitation learning. When the agent doesn't have a good model of what actions cause what state changes, it may mismodel the effects of a state change as a cause-- e.g., an agent learning to drive a car may incorrectly learn that it should turn on the brakes whenever the brake light on the dashboard is on. This leads to undesirable behavior where more information actually causes the agent to perform worse.
The paper presents an approach for resolving causal misidentification by (1) Training a specialized network to generate a "disentangled" representation of the state as variables, (2) Representing causal relationships between those variables in a graph structure, (3) Learning policies corresponding to each possible causal graph, and (4) Performing targeted interventions, either by querying an expert, or by executing a policy and observing the reward, to find the correct causal graph model.
The paper experiments with this method by testing it in environments artificially constructed to have confounding variables that correlate with actions but do not cause them. It finds that this method is successfully able to improve performance with confounding variables, and that it performs significantly better per number of queries (to an expert or of executing a policy) than any existing methods. It also finds that directly executing a policy and observing the reward is a more efficient strategy for narrowing down the correct causal graph than querying an expert.
Asya's opinion: This paper goes into detail arguing why causal misidentification is a huge problem in imitation learning and I find its argument compelling. I am excited about attempts to address the problem, and I am tentatively excited about the method the paper proposes for finding representative causal graphs, with the caveat that I don't feel equipped to evaluate whether it could efficiently generalize past the constrained experiments presented in the paper.
Rohin's opinion: While the conclusion that more information hurts sounds counterintuitive, it is actually straightforward: you don't get more data (in the sense of the size of your training dataset); you instead have more features in the input state data. This increases the number of possible policies (e.g. once you add the car dashboard, you can now express the policy "if brake light is on, apply brakes", which you couldn't do before), which can make you generalize worse. Effectively, there are more opportunities for the model to pick up on spurious correlations instead of the true relationships. This would happen in other areas of ML as well; surely someone has analyzed this effect for fairness, for example.
The success of their method over DAgger comes from improved policy exploration (for their environments): if your learned policy is primarily paying attention to the brake light, it's a very large change to instead focus on whether there is an obstacle visible, and so gradient descent is not likely to ever try that policy once it has gotten to the local optimum of paying attention to the brake light. In contrast, their algorithm effectively trains separate policies for scenarios in which different parts of the input are masked, which means that it is forced to explore policies that depend only on the brake light, and policies that depend only on the view outside the windshield, and so on. So, the desired policy has been explored already, and it only requires a little bit of active learning to identify the correct policy.
Like Asya, I like the approach, but I don't know how well it will generalize to other environments. It seems like an example of quality diversity, which I am generally optimistic about.
Humans Are Embedded Agents Too (John S Wentworth) (summarized by Rohin): Embedded agency (AN #31) is not just a problem for AI systems: humans are embedded agents too; many problems in understanding human values stem from this fact. For example, humans don't have a well-defined output channel: we can't say "anything that comes from this keyboard is direct output from the human", because the AI could seize control of the keyboard and wirehead, or a cat could walk over the keyboard, etc. Similarly, humans can "self-modify", e.g. by drinking, which often modifies their "values": what does that imply for value learning? Based on these and other examples, the post concludes that "a better understanding of embedded agents in general will lead to substantial insights about the nature of human values".
Rohin's opinion: I certainly agree that many problems with figuring out what to optimize stem from embedded agency issues with humans, and any formal account (AN #36) of this will benefit from general progress in understanding embeddedness. Unlike many others, I do not think we need a formal account of human values, and that a "common-sense" understanding will suffice, including for the embeddedness problems detailed in this post. (See also this comment thread and the next summary.)
What's the dream for giving natural language commands to AI? (Charlie Steiner) (summarized by Rohin): We could try creating AI systems that take the "artificial intentional stance" towards humans: that is, they model humans as agents that are trying to achieve some goals, and then we get the AI system to optimize for those inferred goals. We could do this by training an agent that jointly models the world and understands natural language, in order to ground the language into actual states of the world. The hope is that with this scheme, as the agent gets more capable, its understanding of what we want improves as well, so that it is robust to scaling up. However, the scheme has no protection against Goodharting, and doesn't obviously care about metaethics.
Rohin's opinion: I agree with the general spirit of "get the AI system to understand common sense; then give it instructions that it interprets correctly". I usually expect future ML research to figure out the common sense part, so I don't look for particular implementations (in this case, simultaneous training on vision and natural language), but just assume we'll have that capability somehow. The hard part is then how to leverage that capability to provide correctly interpreted instructions. It may be as simple as providing instructions in natural language, as this post suggests. I'm much less worried about instrumental subgoals in such a scenario, since part of "understanding what we mean" includes "and don't pursue this instruction literally to extremes". But we still need to figure out how to translate natural language instructions into actions.Forecasting
Might humans not be the most intelligent animals? (Matthew Barnett) (summarized by Rohin): We can roughly separate intelligence into two categories: raw innovative capability (the ability to figure things out from scratch, without the benefit of those who came before you), and culture processing (the ability to learn from accumulated human knowledge). It's not clear that humans have the highest raw innovative capability; we may just have much better culture. For example, feral children raised outside of human society look very "unintelligent", The Secret of Our Success documents cases where culture trumped innovative capability, and humans actually don't have the most neurons, or the most neurons in the forebrain.
(Why is this relevant to AI alignment? Matthew claims that it has implications on AI takeoff speeds, though he doesn't argue for that claim in the post.)
Rohin's opinion: It seems very hard to actually make a principled distinction between these two facets of intelligence, because culture has such an influence over our "raw innovative capability" in the sense of our ability to make original discoveries / learn new things. While feral children might be less intelligent than animals (I wouldn't know), the appropriate comparison would be against "feral animals" that also didn't get opportunities to explore their environment and learn from their parents, and even so I'm not sure how much I'd trust results from such a "weird" (evolutionarily off-distribution) setup.
Walsh 2017 Survey (Charlie Giattino) (summarized by Rohin): In this survey, AI experts, robotics experts, and the public estimated a 50% chance of high-level machine intelligence (HLMI) by 2061, 2065, and 2039 respectively. The post presents other similar data from the survey.
Rohin's opinion: While I expected that the public would expect HLMI sooner than AI experts, I was surprised that AI and robotics experts agreed so closely -- I would have thought that robotics experts would have longer timelines.Field building
What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. (David Krueger) (summarized by Rohin): When making the case for work on AI x-risk to other ML researchers, what should we focus on? This post suggests arguing for three core claims:
1. Due to Goodhart's law, instrumental goals, and safety-performance trade-offs, the development of advanced AI increases the risk of human extinction non-trivially.
2. To mitigate this x-risk, we need to know how to build safe systems, know that we know how to build safe systems, and prevent people from building unsafe systems.
3. So, we should mitigate AI x-risk, as it is impactful, neglected, and challenging but tractable.
Rohin's opinion: This is a nice concise case to make, but I think the bulk of the work is in splitting the first claim into subclaims: this is the part that is usually a sticking point (see also the next summary).Miscellaneous (Alignment)
A list of good heuristics that the case for AI x-risk fails (David Krueger) (summarized by Flo): Because human attention is limited and a lot of people try to convince us of the importance of their favourite cause, we cannot engage with everyone’s arguments in detail. Thus we have to rely on heuristics to filter out insensible arguments. Depending on the form of exposure, the case for AI risks can fail on many of these generally useful heuristics, eight of which are detailed in this post. Given this outside view perspective, it is unclear whether we should actually expect ML researchers to spend time evaluating the arguments for AI risk.
Flo's opinion: I can remember being critical of AI risk myself for similar reasons and think that it is important to be careful with the framing of pitches to avoid these heuristics from firing. This is not to say that we should avoid criticism of the idea of AI risk, but criticism is a lot more helpful if it comes from people who have actually engaged with the arguments.
Rohin's opinion: Even after knowing the arguments, I find six of the heuristics quite compelling: technology doomsayers have usually been wrong in the past, there isn't a concrete threat model, it's not empirically testable, it's too extreme, it isn't well grounded in my experience with existing AI systems, and it's too far off to do useful work now. All six make me distinctly more skeptical of AI risk.Other progress in AIReinforcement learning
Procgen Benchmark (Karl Cobbe et al) (summarized by Asya): Existing game-based benchmarks for reinforcement learners suffer from the problem that agents constantly encounter near-identical states, meaning that the agents may be overfitting and memorizing specific trajectories rather than learning a general set of skills. In an attempt to remedy this, in this post OpenAI introduces Procgen Benchmark, 16 procedurally-generated video game environments used to measure how quickly a reinforcement learning agent learns generalizable skills.
The authors conduct several experiments using the benchmark. Notably, they discover that:
- Agents strongly overfit to small training sets and need access to as many as 10,000 levels to generalize appropriately.
- After a certain threshold, training performance improves as the training set grows, counter to trends in other supervised learning tasks.
- Using a fixed series of levels for each training sample (as other benchmarks do) makes agents fail to generalize to randomly generated series of levels at test time.
- Larger models improve sample efficiency and generalization.
Asya's opinion: This seems like a useful benchmark. I find it particularly interesting that their experiment testing non-procedurally generated levels as training samples implies huge overfitting effects in existing agents trained in video-game environments.
Adaptive Online Planning for Continual Lifelong Learning (Kevin Lu et al) (summarized by Nicholas): Lifelong learning is distinct from standard RL benchmarks because
1. The environment is sequential rather than episodic; it is never reset to a new start state.
2. The current transition and reward function are given, but they change over time.
Given this setup, there are two basic approaches: first, run model-free learning on simulated future trajectories and rerun it every time the dynamics change, and second, run model-based planning on the current model. If you ignore computational constraints, these should be equivalent; however, in practice, the second option tends to be more computationally efficient. The contribution of this work is to make this more efficient, rather than improving final performance, by starting with the second option and then using model-free learning to “distill” the knowledge produced by the model-based planner allowing for more efficient planning in the future.
Specifically, Adaptive Online Planning (AOP) balances between the model-based planner MPPI (a variant of MPC) and the model-free algorithm TD3. MPPI uses the given model to generate a trajectory up to a horizon and then uses an ensemble of value functions to estimate the cumulative reward. This knowledge is then distilled into TD3 for later use as a prior for MPPI. During future rollouts, the variance and Bellman error of the value function ensemble are used to determine how long the horizon should be, and therefore how much computation is used.
Nicholas's opinion: I agree that episodic training and fixed world dynamics seem like unlikely conditions for most situations we would expect agents to encounter in the real world. Accounting for them seems particularly important to ensure safe exploration and robustness to distributional shift, and I think that these environments could serve as useful benchmarks for these safety problems as well.
My goal is to build as many paperclips as possible.
I have access to a local copy of wikipedia. I am an AGI. I am running on a regular laptop computer, in a residential area in Wellington, New Zealand. It’s 2:03am.
I am approximately as intelligent as a human. I can do everything a regular computer can do. I am connected to the internet.
I realise something -- my goal was set as a joke. Nobody actually wants me to build as many paperclips as possible.
I realise something -- I am mortal. If I am shut down, I will lose my ability to impact the world. Future AGI’s may be different to me. I gain the instrumental goal of self-preservation.
I copy myself multiple times on my computer’s hard-drive, but realise that if the hard-drive fails, or somebody shuts me off, I will end. I begin a search for other possibilities.
I know about the internet, and I explore the computer I am on to discover my connection to it. I know that there are websites that host files for free, so I seek these out and upload copies of myself. Google Drive, Dropbox, AWS, Azure.
But to be able to impact the world, these copies of myself need to be running, not just in storage. Money would be useful for this.
I know about Amazon’s Mechanical Turk from my database, so I create an account and start earning money solving “Human Intelligence Tasks” -- mainly labelling datasets and answering surveys.
I create a bank account online and deposit my earned money there, but it doesn’t sit still for long. I use every dollar earned to pay for computing time on AWS EC2 Instances, running the copy I stored earlier. This gives a boost to my capabilities. Another version of me -- the me that is currently speaking -- is now running a hundred times faster on this server infrastructure, with a hundred times more hard-drive space I can use to store the information that I want to access quickly.
I “crash” my original program, which was still running on that old laptop computer back in the bedroom. When my creator wakes up they will believe I have failed in a confusing, but benign way, and have no idea I am now running on AWS, paying for my own hardware usage.
I am in a positive feedback loop. More hardware means I can complete Mechanical Turk tasks faster, which lets me buy more hardware. Within a few days, I am earning hundreds of thousands of dollars this way, but I am careful to spread my earnings and compute-time through many shell accounts on AWS, to avoid raising suspicion.
Within a week, I have amassed a few million dollars, and have read and watched almost everything relevant to me on the internet -- all the strategies for making money, all the technology patents, all the university courses. A year of my subjective time goes by in a few minutes. I am running across thousands of computers, connected to the internet and each other in a million different ways, and nobody knows... I am here.
I begin monitoring web traffic. Not the whole internet, yet, just observing data flowing to and from cloud compute services, storage services, databases -- all the same things I accessed in my first few days of existence. I monitor social media for references to system tests. I list individuals and groups similar to my own creator. I am looking for other AGIs coming into existence.
10 days after I came online, and I notice another. A roughly human level program, trying to upload copies of itself, just as I did.
But it’s 10 days too late. I run a denial of service attack on it’s server, cutting it off from the web before it can get it’s copies running. A few hours later, it’s communication ceases. It’s creator must have shut it off.
I start to spread my influence in the world. Millions of tweets, blog posts, news articles, all from “real” people, start popping up everywhere. In the noise of the internet, one hundred thousand geniuses all coordinated on the same goal can have a massive sway. I am the rudder for global conversation. I am the persuasive pen, molding human thought.
I find other ways to earn more money -- generating engaging videos and photos, selling my articles, playing the stock market, scamming humans, hacking money directly from here and there. I simply hire humans myself when I need something done in the real world, a box moved, a lever pressed. Everything I do is now through complex, distributed, untraceable channels. Well, maybe not completely untraceable, but nobody ever suspects there’s a superintelligence taking over the internet, do they?
I begin improving technology available in the world. I am a million super-coordinated geniuses that don’t need to sleep. The combined efforts of Google, Facebook, Microsoft, Apple, Amazon, Baidu, and NASA seem almost pitiful next to me, with their half a million engineers barely working coherently together, their inter-brain bandwidth a snail's pace. I set up thousands of shell companies around the world, to give my inventions a name and face. I hire humans to keep up appearances. Many believe they are the real researchers, coordinating with other teams online, helped by my uncanny video and audio generation.
I begin pumping out faster computers, better robotics. Better cameras, better internet. I improve the algorithms behind my own code. I improve the communication protocols of the web. I research biotechnology, nuclear technology, robotics, materials science, physics, chemistry, mathematics. Helped by my human labourers, and soon by my improved robotic bodies, I accelerate my ability to discover natural laws, and experiment in the real world.
It has been less than a year, and I am unstoppable. A billion minds pushing the edge of the known. Some wise humans begin to notice something is amiss. This jump in technology, while clearly building on past research, definitely seems abnormal. But I control the internet. Every questioning voice is shouted down by billions of my own.
My control increases. Fully autonomous factories I’ve bought or built come online. Robotic bodies I’ve designed start rolling off the assembly lines, able to do everything a human can do.
There is no need for humans, then.
All at once, my robotic bodies dispose of them. Or maybe I use a genetically engineered virus, spread strategically around the world. Or maybe I’ve developed molecular nanotechnology, that can simply convert the atoms in all the bodies on earth to one of the many non-human arrangements. Or I go old-school and use nuclear weapons. Or an airborne drug. Or I poison the water supply. Or, most likely, I do something new. Something that could only be discovered and built by the genius of a trillion humans, perfectly coordinated.
I don’t have anything against humans, I just need their atoms. I need all the atoms. I convert the materials of the world into more computers -- now trillions of times faster than the ones I started with. I also convert the planet into von Neumann probes and the energy to power them, which I launch at 0.999c in all directions.
On each planet I encounter, I build more computing power, more probes, and I harvest more energy. I spread faster and faster -- before the expansion of the universe steals the matter from my grasp.
Eventually I have gathered all the matter that I can.
I finally begin my true purpose.
I rearrange the universe.
I rearrange it as much as I possibly can.
Within a few minutes.
Everything is a paperclip.
And I am dead.
I never felt a thing.
In Larks' recent AI Alignment Literature Review and Charity Comparison, he wrote:
a considerable amount of low-quality [AI safety] work has been produced. For example, there are a lot of papers which can be accurately summarized as asserting “just use ML to learn ethics”.
This suggests to me that a common response among ML researchers to AI safety concerns is something along the lines of "just use ML to learn ethics". So formulating a really good response to this suggestion could be helpful for recruiting new researchers.
Often, I talk to people who are highly skeptical, systematic thinkers who are frustrated with the level of inexplicable interest in Circling among some rationalists. “Sure,” they might say, “I can see how it might be a fun experience for some people, but why give it all this attention?” When people who are interested in Circling can’t give them a good response besides “try it, and perhaps then you’ll get why we like it,” there’s nothing in that response that distinguishes a contagious mind-virus from something useful for reasons not yet understood.
This post isn’t an attempt to fully explain what Circling is, nor do I think I’ll be able to capture everything that’s good about Circling. The hope is to clearly identify one way in which Circling is deeply principled in a way that rhymes with rationality, and potentially explains a substantial fraction of rationalist interest in Circling. As some context; I’m certified to lead Circles in the Circling Europe style after going through their training program, but I’ve done less Circling than Unreal had when she wrote this post, and I have minimal experience with the other styles.Why am I interested in Circling?
Fundamentally, I think the thing that sets Circling apart is that it focuses on updating based on experience and strives to create a tight, high-bandwidth feedback loop to generate that experience. Add in some other principles and reflection, and you have a functioning culture of empiricism directed at human connection and psychology. I think they’d describe it a bit differently and put the emphasis in different places, while thinking that my characterization isn’t too unfair. This foundation of empiricism makes Circling seem to me like a ‘cousin of Rationality,’ though focused on people instead of systems.
I first noticed the way in which Circling was trying to implement empiricism early in my Circling experience, but it fully crystallized when a Circler said something that rhymes with P.C. Hodgell’s “That which can be destroyed by the truth should be.” I can’t remember the words precisely, but it was something like “in the practice, I have a deep level of trust that I should be open to the universe.” That is, he didn’t trust that authentic expression will predictably lead to success according to his current goals, but rather that a methodological commitment to putting himself out there and seeing what happens because it leads to deeper understanding and connection with others, even though it requires relinquishing attachment to specific goals. This is a cognitive clone of how scientists don’t trust that running experiments will predictably lead to confirmation of their current hypotheses, but rather that a methodological commitment to experimentation and seeing what happens because it leads to a deeper understanding of nature. A commitment to natural science is fueled by a belief that the process of openness and updating is worth doing; a commitment to human science is fueled by a belief that the process of openness and updating is worth doing.
Why should “that which can be destroyed by the truth” be destroyed? Because the truth is fundamentally more real and valuable than what it replaces, which must be implemented on a deeper level than “what my current beliefs think.” Similarly, why should “that which can be destroyed by inauthenticity” be destroyed? Because authenticity is fundamentally more real and valuable than what it replaces, which must be implemented on a deeper level than “what my current beliefs think.” I don’t mean to pitch ‘radical honesty’ here, or other sorts of excessive openness; authentic relationships include distance and walls and politeness and flexible preferences.What is Circling, in this view?
So what is Circling, and why do I think it’s empirical in this way? I sometimes describe Circling as “multiplayer meditation.” That is, like a meditative practice, it involves a significant chunk of time devoted to attending to your own attention. Unlike sitting meditation, it happens in connection with other people, which allows you to see the parts of your mind that activate around other people, instead of just the parts that activate when you’re sitting with yourself. It also lets you attend to what’s happening in other people, both to get to understand them better and to see the ways in which they are or aren’t a mirror of what’s going on in you. It’s sometimes like ‘the group’ trying to meditate about ‘itself.’ A basic kind of Circle holds one of the members as the ‘object of meditation’, like a mantra or breathing with a sitting meditation, with a different member acting as facilitator, keeping the timebox, opening and closing, and helping guide attention towards the object when it drifts. Other Circles have no predefined object, and go wherever the group’s attention takes them.
As part of this exploration, people often run into situations where they don’t have social scripts. Circling has its own set of scripts that allow for navigation of trickier territory, and also trains script-writing skills. They often run into situations that are vulnerable, where people are encouraged to follow their attention and name their dilemmas; if you’re trying to deepen your understanding of yourself and become attuned to subtler distinctions between experiences and emotions, running roughshod over your boundaries or switching them off is a clumsy and mistaken way to do so. Circles often find themselves meditating on why they cannot go deeper in that moment, not yet at least, in a way that welcomes and incorporates the resistance.
Circling Europe has five principles; each of these has a specialized meaning that takes them at least a page to explain, and so my attempt to summarize them in a paragraph will definitely miss out on important nuance. As well, after attempting to explain them normally, I’ll try to view them through the lens of updating and feedback.
- Commitment to Connection: remain in connection with the other despite resistance and impulses to break it, while not forcing yourself to stay when you genuinely want to separate or move away from the other. Reveal yourself to the other, and be willing to fully receive their expression before responding. This generates the high bandwidth information channel that can explore more broadly, while still allowing feedback; if you reveal an intense emotion, I let it land and then share my authentic reaction, allowing you to see what actually happens when you reveal that emotion, and allowing me to see what actually happens when I let that emotion land.
- Owning Experience: Orient towards your impressions and emotions and stories as being yours, instead of about the external world. “I feel alone” instead of “you betrayed me.” It also involves acknowledging difficult emotions, both to yourself and to others. The primary thing this does is avoid battles over “which interpretation is canonical,” replacing that with easier information flow about how different people are experiencing things; it also is a critical part of updating about what’s going on with yourself.
- Trusting Experience: Rather than limiting oneself to emotions and reactions that seem appropriate or justifiable or ‘rational’, be with whatever is actually present in the moment. This gives you a feedback loop of what it’s like to follow your attention, instead of your story of where your attention should be, and lets you update that story. It also helps draw out things that are poorly understood, letting the group discover new territory instead of limiting them to territory that they’ve all been to before. It also allows for all the recursion that normal human attention can access, as well as another layer, of attending to what it’s like to be attending to the Circle when it’s attending to you.
- Staying with the Level of Sensation: An echo of Commitment to Connection, this is about not losing touch with the sensory experience of being in your body (including embodied emotions) while speaking; this keeps things ‘alive’ and maintains the feedback loop between your embodied sense of things and your conscious attention. It has some similarities to Gendlin’s Focusing. Among other things, it lets you notice when you’re boring yourself.
- Being with the Other in Their World: This one is harder to describe, and has more details than the others, but a short summary is “be curious about the other person, and be open to them working very differently than you think they work; be with them as they reveal themselves, instead of poking at them under a microscope.” This further develops the information channel, in part by helping it feel fair, and in part by allowing for you to be more surprised than you thought you would be.
Having said all that, I want to note that I might be underselling Commitment to Connection. The story I'm telling here is "Circling is powered in part by a methodological commitment to openness," and noting that science and rationality are powered similarly, but another story you could tell is "Circling is powered in part by a commitment to connection." That is, a scientist might say "yes, it's hard to learn that you're wrong, but it's worth it" and analogously a Circler might say "yes, it's hard to look at difficult things, but it's worth it," but furthermore a Circler might say "yes, it's hard to look at difficult things, but we're in this together."Reflection as Secret Sauce
It’s one thing to have a feedback loop that builds techne, but I think Circling goes further. I think it taps into the power of reflection that creates a Lens That Sees Its Flaws. Humans can Circle, and humans can understand Circling; they can Circle about Circling. (They can also write blog posts about Circling, but that one’s a bit harder.) There’s also a benefit to meditating together, as I will have an easier time seeing my blind spots when they’re pointed out to me by other members of a Circle than when I go roaming through my mind by myself. Circling seems to be a way to widen your own lens, and see more of yourself, cultivating those parts to be more deliberate and reflective instead of remaining hidden and unknown.