Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 16 минут 1 секунда назад

Effective Altruism and Rationalist Philosophy Discussion Group

16 сентября, 2020 - 05:45
Published on September 16, 2020 2:45 AM GMT

I've created a group on Facebook for rationalists and effective altruists to discuss philosophy (currently has about 120 members). To be clear, this isn't a group where the philosophy has to be related to effective altruism or rationality, but instead a group for Rationalists and Effective Altruists to discuss philosophy.

I know it's already possible to post about philosophy here, but people tend to be reluctant to post unless they've invested a huge amount of effort in writing it up, so it's useful to have somewhere else where there's a lower barrier.


Memorizing a Deck of Cards

16 сентября, 2020 - 04:31
Published on September 16, 2020 1:31 AM GMT

I just memorized a deck of cards, because I was told doing so would improve my focus.

The process took 2 hours and 13 minutes, including breaks. I used 20-5 pomodoro at first, and switched to 25-5 in the middle.

I initially wrote a script to simulate a randomly shuffled deck. After memorizing the first 18 cards, I switched to a physical deck (putting the same cards on top).

I chose not to use any strategy from an outside source (e.g. memory palace), because I wanted to see what I would come up with on my own, and I thought that using an external strategy might make it too easy.


I began by memorizing cards in groups of four. After memorizing a group, I would go back to the previous group and see if I still knew all of the cards in both groups. Then I would go back an arbitrarily far amount (wherever I cut the deck), and begin from there. Occasionally, I would go back to the beginning and go through the deck up to the point I had memorized to. If I messed up in an earlier group, I would repeat the process for that group.

However, after three such groups (12 cards), it became clear that four was too many to memorize at once, so I switched to groups of three. I then ran into another problem-I could memorize individual groups fairly easily, but I would have trouble remembering the order in which the groups themselves came. I was able to solve this problem by “linking” the groups together - memorizing the last card of the previous group along with the next group.

A few attributes of groups made them easier to memorize:

  • two or more cards in the group share a face or suit
    • fortunately, this is true for most groups
  • the group is monotonically increasing or decreasing
  • two consecutive cards in the group numbers which are related to one another
    • one is a factor of the other (e.g. 3 and 9),
    • they are consecutive (e.g. 10 and jack)

The easiest group was 4 of spades, 7 of spades, 8 of spades - increasing and all of the same suit.

Some of my most common mistakes were:

  • mixing up suits of the same color
  • getting the order of the groups wrong
  • mixing up aces and queens

About three hours after my first success, I went through the deck again, with no practice in between. I only got three of the cards wrong, though I frequently had to think for a while.


Sometime into the task, I grew anxious that I was wasting my time, or that it wouldn’t help, or that I should be working on something else. There were times when I felt an urge to stop and do anything else. I often get such urges when doing other activities, but this time I was able to resist them. I think that this might be because there was also a clear next step in this task, and also because I knew that the point of the task was to train my focus.


Did this activity improve my focus? I suppose we’ll have to wait and see. I was able to write this post, which isn’t something I would normally have the willpower to do, but that may have been me riding on the high of memorizing the deck (It felt so good when I finally memorized the whole thing). Maybe I’ll post an update next week.


Three kinds of competitiveness

16 сентября, 2020 - 01:30
Published on September 15, 2020 10:30 PM GMT

By Daniel Kokotajlo

In this post, I distinguish between three different kinds of competitiveness — Performance, Cost, and Date — and explain why I think these distinctions are worth the brainspace they occupy. For example, they help me introduce and discuss a problem for AI safety proposals having to do with aligned AIs being outcompeted by unaligned AIs.

Distinguishing three kinds of competitiveness and competition

A system is performance-competitive insofar as its ability to perform relevant tasks compares with competing systems. If it is better than any competing system at the relevant tasks, it is very performance-competitive. If it is almost as good as the best competing system, it is less performance-competitive. 

(For AI in particular, “speed” “quality” and “collective” intelligence as Bostrom defines them all contribute to performance-competitiveness.)

A system is cost-competitive to the extent that it costs less to build and/or operate than its competitors. If it is more expensive, it is less cost-competitive, and if it is much more expensive, it is not at all cost-competitive. 

A system is date-competitive to the extent that it can be created sooner (or not much later than) its competitors. If it can only be created after a prohibitive delay, it is not at all date-competitive. 

A performance competition is a competition that performance-competitiveness helps you win. The more important performance-competitiveness is to winning, the more intense the performance competition is.

Likewise for cost and date competitions. Most competitions are all three types, to varying degrees. Some competitions are none of the types; e.g. a “competition” where the winner is chosen randomly.

I briefly searched the AI alignment forum for uses of the word “competitive.” It seems that when people talk about competitiveness of AI systems, they usually mean performance-competitiveness, but sometimes mean cost-competitiveness, and sometimes both at once. Meanwhile, I suspect that this important post can be summarized as “We should do prosaic AI alignment in case only prosaic AI is date-competitive.”

Putting these distinctions to work

First, I’ll sketch some different future scenarios. Then I’ll sketch how different AI safety schemes might be more or less viable depending on which scenario occurs. For me at least, having these distinctions handy makes this stuff easier to think and talk about.

Disclaimer: The three scenarios I sketch aren’t supposed to represent the scenarios I think most likely; similarly, my comments on the three safety proposals are mere hot takes. I’m just trying to illustrate how these distinctions can be used.

Scenario: FOOM: There is a level of performance which leads to a localized FOOM, i.e. very rapid gains in performance combined with very rapid drops in cost, all within a single AI system (or family of systems in a single AI lab). Moreover, these gains & drops are enough to give decisive strategic advantage to the faction that benefits from them. Thus, in this scenario, control over the future is mostly a date competition. If there are two competing AI projects, and one project is building a system which is twice as capable and half the price but takes 100 days longer to build, that project will lose.

Scenario: Gradual Economic Takeover: The world economy gradually accelerates over several decades, and becomes increasingly dominated by billions of AGI agents. However, no one entity (AI or human, individual or group) has most of the power. In this scenario, control over the future is mostly a cost and performance competition. The values which shape the future will be the values of the bulk of the economy, and that in turn will be the values of the most popular and successful AGI designs, which in turn will be the designs that have the best combination of performance- and cost-competitiveness. Date-competitiveness is mostly irrelevant.

Scenario: Final Conflict: It’s just like the Gradual Economic Takeover scenario, except that several powerful factions are maneuvering and scheming against each other, in a Final Conflict to decide the fate of the world. This Final Conflict takes almost a decade, and mostly involves “cold” warfare, propaganda, coalition-building, alliance-breaking, and that sort of thing. Importantly, the victor in this conflict will be determined not so much by economic might as by clever strategy; a less well resourced faction that is nevertheless more far-sighted and strategic will gradually undermine and overtake a larger/richer but more dysfunctional faction. In this context, having the most capable AI advisors is of the utmost importance; having your AIs be cheap is much less important. In this scenario, control of the future is mostly a performance competition. (Meanwhile, in this same scenario, popularity in the wider economy is a moderately intense competition of all three kinds.)

Proposal: Value Learning: By this I mean schemes that take state-of-the-art AIs and train them to have human values. I currently think of these schemes as not very date-competitive, but pretty cost-competitive and very performance-competitive. I say value learning isn’t date-competitive because my impression is that it is probably harder to get right, and thus slower to get working, than other alignment proposals. Value learning would be better for the gradual economic takeover scenario because the world will change slowly, so we can afford to spend the time necessary to get it right, and once we do it’ll be a nice add-on to the existing state-of-the-art systems that won’t sacrifice much cost or performance.

Proposal: Iterated Distillation and Amplification: By this I mean… well, it’s hard to summarize. It involves training AIs to imitate humans, and then scaling them up until they are arbitrarily powerful while still human-aligned. I currently think of this scheme as decently date-competitive but not as cost-competitive or performance-competitive. But lack of performance-competitiveness isn’t a problem in the FOOM scenario because IDA is above the threshold needed to go FOOM; similarly, lack of cost-competitiveness is only a minor problem because if they don’t have enough money already, the first project to build FOOM-capable AI will probably be able to attract a ton of investment (e.g. via being nationalized) without even using their AI for anything, and then reinvest that investment into paying the extra cost of aligning it via IDA.

Proposal: Impact regularization: By this I mean attempts to modify state-of-the-art AI designs so that they deliberately avoid having a big impact on the world. I think of this scheme as being cost-competitive and fairly date-competitive. I think of it as being performance-uncompetitive in some competitions, but performance-competitive in others. In particular, I suspect it would be very performance-uncompetitive in the Final Conflict scenario (because AI advisors of world leaders need to be impactful to do anything), yet nevertheless performance-competitive in the Gradual Economic Takeover scenario.

Putting these distinctions to work again

I came up with these distinctions because they helped me puzzle through the following problem:

Lots of people worry that in a vastly multipolar, hypercompetitive AI economy (such as described in Hanson’s Age of Em or Bostrom’s “Disneyland without children” scenario) eventually pretty much everything of merely intrinsic value will be stripped away from the economy; the world will be dominated by hyper-efficient self-replicators various kinds, performing their roles in the economy very well and seeking out new roles to populate but not spending any time on art, philosophy, leisure, etc. Some value might remain, but the overall situation will be Malthusian. 
Well, why not apply this reasoning more broadly? Shouldn’t we be pessimistic about any AI alignment proposal that involves using aligned AI to compete with unaligned AIs? After all, at least one of the unaligned AIs will be willing to cut various ethical corners that the aligned AIs won’t, and this will give it an advantage.

This problem is more serious the more the competition is cost-intensive and performance-intensive. Sacrificing things humans value is likely to lead to cost- and performance-competitiveness gains, so the more intense the competition is in those ways, the worse our outlook is.

However, it’s plausible that the gains from such sacrifices are small. If so, we need only worry in scenarios of extremely intense cost and performance competition.

Moreover, the extent to which the competition is date-intensive seems relevant. Optimizing away things humans value, and gradually outcompeting systems which didn’t do that, takes time. And plausibly, scenarios which are not at all date competitions are also very intense performance and cost competitions. (Given enough time, lots of different designs will appear, and minor differences in performance and cost will have time to overcome differences in luck.) On the other hand, aligning AI systems might take time too, so if the competition is too date-intensive things look grim also. Perhaps we should hope for a scenario in between, where control of the future is a moderate date competition.

Concluding thoughts

These distinctions seem to have been useful for me. However, I could be overestimating their usefulness. Time will tell; we shall see if others make use of them.

If you think they would be better if the definitions were rebranded or modified, now would be a good time to say so! I currently expect that a year from now my opinions on which phrasings and definitions are most useful will have evolved. If so, I’ll come back and update this post.

30 March 2020

Thanks to Katja Grace and Ben Pace for comments on a draft.


Acasual blackmail between AIs?

15 сентября, 2020 - 22:05
Published on September 15, 2020 7:05 PM GMT

There's been a lot of discussion on here between potential acausal blackmail between humans and AIs, and positive-sum acausal trade between AIs, but has there been any significant discussion surrounding blackmail trade between AIs?

The scenarios I'm imagining are something like "paperclipper instantiates a large number of suffering minds & acausally negotiates with FAIs that it will end the minds' suffering in exchange for FAI creating paperclips." Or something similar. The obvious answer is that the FAI would just ignore it, but I'm not 100% sure on that; is this a topic that's been talked about somewhere?


The Axiological Treadmill

15 сентября, 2020 - 21:36
Published on September 15, 2020 6:36 PM GMT

The obvious reason that Moloch is the enemy is that it destroys everything we value in the name of competition and survival. But this is missing the bigger picture. We value what we value because, in our ancestral environment, those tended to be the things that helped us with competition and survival. If the things that help us compete and survive end up changing, then evolution will ensure that the things we value change as well.

To borrow a metaphor: Elua cheats. The hedonic treadmill has nothing on the axiological treadmill.

Consider a thought experiment. In Meditations on Moloch, Scott Alexander dreams up a dictatorless dystopia:

Imagine a country with two rules: first, every person must spend eight hours a day giving themselves strong electric shocks. Second, if anyone fails to follow a rule (including this one), or speaks out against it, or fails to enforce it, all citizens must unite to kill that person. Suppose these rules were well-enough established by tradition that everyone expected them to be enforced.So you shock yourself for eight hours a day, because you know if you don’t everyone else will kill you, because if they don’t, everyone else will kill them, and so on. Every single citizen hates the system, but for lack of a good coordination mechanism it endures. From a god’s-eye-view, we can optimize the system to “everyone agrees to stop doing this at once”, but no one within the system is able to effect the transition without great risk to themselves.

Even if this system came into being ex nihilo it probably wouldn’t be stable in reality; a population that spends eight hours a day receiving strong shocks isn’t going to be able to feed itself, or reproduce. But assume for a moment that this system starts out economically and biologically stable (that is, people can still eat, and reproduce at the rate of replacement, despite the electric shocks, and that there are no outside countries ready to invade). What do we expect to happen over the long run?

Well, obviously there’s a strong evolutionary pressure to be tolerant to electric shocks. People who can tolerate those shocks better will do better on average than those who can’t. However, there’s another more subtle pressure at play: the pressure to ensure you shock yourself. After all, if you forget to shock yourself, or choose not to, then you are immediately killed. So the people in this country will slowly evolve reward and motivational systems such that, from the inside, it feels like they want to shock themselves, in the same way (though maybe not to the same degree) that they want to eat. Shocking themselves every day becomes an intrinsic value to them. Eventually, it’s no longer a dystopia at all.

They would be aghast at a society like ours, where Moloch has destroyed the value of receiving electrical shocks, all in the name of more perfect competition.

[Cross-posted from Grand, Unified, Empty.]


God in the Loop: How a Causal Loop Could Shape Existence

15 сентября, 2020 - 21:29
Published on September 15, 2020 2:40 PM GMT

Crossposted from Vessel Project.

My last article, “Life Through Quantum Annealing” was an exploration of how a broad range of physical phenomena — and possibly the whole universe — can be mapped to a quantum computing process. But the article simply accepts that quantum annealing behaves as it does; it does not attempt to explain why. That answer lies somewhere within a “true” description of quantum mechanics, which is still an outstanding problem.

Despite the massive predictive success of quantum mechanics, physicists still can’t agree on how its math corresponds to reality. Any such proposal, called an “interpretation” of quantum mechanics, tends to straddle the line between physics and philosophy. There is no shortage of interpretations, and in the words of physicist David Mermin, “New interpretations appear every year. None ever disappear.” Am I going to throw one more on that pile? You bet.

I’m not going to start from scratch though; I simply propose an ever-so-slight modification to an existing forerunner: the many-worlds interpretation, where other “worlds” or timelines exist in parallel to our own. My modification is this: the only worlds that can exist are those that exist within a causal loop. Stated another way: our universe, or any possible universe, must be a causal loop.

I will introduce the relevant concepts and provide an argument for my proposal, but my goal is not to once-and-for-all prove this interpretation as true. Rather, my goal is to explore what happens if we accept the interpretation as true. If we start with the assumption that only causal loop universes can exist, then several interesting things follow — we find parallels to our own universe, and we might even find God.

Causality & Quantum Interpretations

Before talking about causal loops, let’s take a step back and talk about causality — perhaps the single most fundamental concept in all the sciences. It plays a starring role in the two most important theories in physics: general relativity and quantum mechanics.

General relativity, developed by Einstein, combines space, time, and gravity in a geometric description of spacetime. In spacetime, two observers might not agree on the space between two events or time between two events — but they always agree on the spacetime interval, which corresponds to a causal relationship between two events. As such, causality is the only thing that is universally agreed on, making it the only proper description of objective reality. Another phenomenon predicted by general relativity is the cosmic speed limit — the speed of light — which is more appropriately understood as the speed of causality, more fundamental than light alone. Here we see that spacetime and the speed of light are not inherently real; they are just useful ways of describing causality, the only objective reality.

But if general relativity is interesting because we can only agree on causality, then quantum mechanics is interesting because we can’t agree on causality.

As I alluded to earlier, the full explanation of quantum mechanics is still a mystery, and that mystery has everything to do with causality — specifically how objective, causal reality relates to the wave function. The wave function of a quantum system is most easily understood as a probability distribution, where the probability of the system being in any given state is calculated when you square the amplitude of the wave function for that state. The wave function is in a “superposition” of all possible states until it is measured, after which we observe a single state.

Simple depiction of a quantum wave function with a single crest. (Image by Louay Fatoohi)

On the surface, it appears that quantum physics is inherently random if it can only be described by probability, and somehow the act of measuring or observing the system causes it to assume an objective state. If you’re convinced of this, then you basically agree with the Copenhagen interpretation of quantum mechanics, which posits an interaction between the system and observer that causes the wave function to randomly “collapse.” This has been the standard interpretation for a long time, although others interpretations have been consistently gaining steam.

As an alternative, maybe you don’t think the universe is random at all — it’s deterministic, but there are “hidden” variables we don’t yet know about. In this case, there is no wave function collapse, so we don’t need to introduce any extra physics to explain what happens when we observe the system. If you’re on board with that, then you just signed up for the de Broglie-Bohm theory, also known as the pilot wave theory.

But maybe you’re still not quite convinced. Let’s make things even simpler: the universe isn’t random, but there aren’t hidden variables either. You have the wave function, and that’s it — what you see is what you get. That, it turns out, pretty much sums up the many-worlds interpretation. In this theory, all possible “worlds” described by the wave function do exist; we just happen to occupy one of them. While it requires the least explanation, the bizarre implication is that many parallel worlds exist as branches of different possible outcomes.

These three interpretations tend to be the top contenders, and they each take a different approach to answer the question of “what causes what?” The fact that a basic causal structure of physics cannot gain consensus makes this interesting territory, plus it has far-reaching implications. A proper explanation doesn’t just account for the non-locality of entanglement or the apparent uncertainty of superposition — it explains how humans fit into reality.

As observers, do we cause the wave function to collapse? That would certainly seem to elevate the role of consciousness in the causal nature of reality (which has not gone unnoticed by experts and cranks alike). Or is the wave function itself the only causal, objective reality? If so, that's one more reason to believe we’re at the mercy of a universe that’s indifferent to our existence.

The third interpretation is the one I want to revisit later in this article: the many-worlds interpretation. Keep it in mind. While it appears to threaten our sense of importance and potentially free will, it's not so bad if we just add a twist — or better yet, a loop.

Causal Loops

It may be fairly self-explanatory, but I’ll nonetheless define a causal loop as follows: a closed causal chain of events, where each event is the effect of another event on the chain. A simple example is a loop of three distinct events where Event A causes Event B, which causes Event C, which in turn causes Event A — each event is causally connected to another on the loop, and there is no “first” event. If you start at any one event, the sequence that follows inevitably leads back to the same event as if it caused itself.

Causal loops sound absurd, but their possibility has been successfully defended, particularly by philosopher Richard Hanley in his paper, “No End in Sight: Causal Loops in Philosophy, Physics, and Fiction” (and any mention of Hanley moving forward is in reference to this paper). Hanley points out that causal loops are not logically inconsistent nor physically impossible — at worst, they simply require grand coincidences. Causal loops as a whole aren’t created, they simply exist, and any strangeness about this is merely apparent — they’re in no worse a position concerning the question of why anything exists. Interestingly, Hanley also mentions that the idea of a causal loop universe is taken very seriously in cosmology, and that causal loops are actually more likely to occur in a universe like ours which hosts intelligent agents.

Let’s unpack that a bit. If intelligent agents discover their universe is indeed a giant causal loop, they may have incentive to maintain the loop they inhabit by causing the very events that lead to their own existence. Furthermore, they can intentionally make events happen that would otherwise require coincidence. But there is nothing coincidental or mysterious about an intentional action; we intentionally do things every day. Hanley notes, “the existence of agency may be the very thing that permits causal loops to obtain.”

This is where I’ll take it one step further than Hanley: not only are causal loop universes possible, but all possible universes must be causal loops. As I mentioned earlier, I’m going to run with this as an assumption, but I’ll still attempt to provide some reasoning.

That reasoning boils down to two propositions: the first is that all events must have causes; the second is that only in closed causal chains do all events have causes. We saw that in a causal loop all events have definitive causes — other events on the loop. There is no issue. But in an open causal chain (imagine a straight line), one more event is always required to explain causation. We’re left with a case of infinite regress, which isn’t inherently problematic, but its “openness” implies there must be an event without a cause, which is impossible. Furthermore, any series of causes and effects cannot, by definition, be considered as part of separate causal systems. If we define a universe to be a causal system, then it follows that all universes must also be causal loops, including our universe.

Using a causal-loop-only starting point, we can dive into some pretty interesting things.

Different Paths: Curved Spacetime & Clever Demons

In general relativity, causal loops are permissible in the context of a “block universe.” In causal loops, all events in the loop are equally real all the time; they must be for the loop to exist. This closely aligns with a block universe, where all past, present, and future points in spacetime exist “at once”; we simply find ourselves at one point along its progression. In both views, travelling back to the past is possible, but you cannot change the past. If you do travel back to the past though, you may find yourself travelling along a different timeline after that — which brings us back to quantum mechanics.

You took note of the many-worlds interpretation (MWI) of quantum mechanics, right? If the universe is a causal loop, then whatever interpretation we use must be deterministic since all events along a causal loop are equally real — they do not spontaneously become real only after another event. Technically any deterministic interpretation suffices to meet that criteria, but I think the MWI best illustrates the range — and restrictions — of how possible universes can unfold. The MWI entails the universe “splitting” into alternate histories at every point in time. If we introduce a constraint where only causal loop universes can exist, that directly impacts the range of possible universes we can ever split into. Nothing else about the MWI needs to change; there are still many parallel worlds, but they all must maintain a causal loop. So if we somehow traveled back in time, we may find ourselves splitting into a different looped timeline than before.

Things get interesting when we start to look at possible loops. There are only two ways a causal loop can be maintained in practice: closed timelike curves (CTCs), and reverse causation. While the two entail similarities, they are slightly distinct.

CTCs are theoretically possible in certain solutions of spacetime. One example, popular in science fiction, is a wormhole. In some wormholes, you’ll enter one end and exit the other at a previous point in time. But there is serious doubt about whether they could be feasibly traveled through, plus they’re just local anomalies. If we’re talking about the whole universe, we need to go bigger.

The great logician Kurt Gödel did find one solution to Einstein’s equations, now called the Gödel universe, where the entire universe is a CTC. In such a universe, all points in spacetime return to themselves as we’d expect in a causal loop, but it requires that all galaxies have a preferred direction of rotation, for which there is no evidence. When Gödel found his solution, the tools used to study cosmology were not powerful enough to confirm if our universe was a Gödel universe. As the technology became more sophisticated throughout his life up until his death in 1978, Gödel would ask, “Is the universe rotating yet?” The answer was always no. As best as we can tell, our universe is not a giant CTC, but Gödel might not be out of luck just yet.

A Gödel universe represented by “light cones” and a possible path of a light through spacetime.

Reverse causation, as Hanley defines it, is simply “a cause and effect relation where effect precedes cause.” Any notion of reverse causation, or causal loops in general, is intimately tied to information. Every single event or state of the universe exists in terms of information, as does each causal relationship. Information is also what makes events distinct and unique. If the universe were to suddenly return to some state X that existed an hour ago — informationally identical in every way — then we’re not talking about another state similar to X; that is X. Each event in a causal loop is fully and uniquely described by information.

One feature of our universe is that information becomes increasingly diffuse, a natural result of the second law of thermodynamics, which holds that the universe always trends toward maximum entropy, or equilibrium. Entropy can be understood as a measure of disorder; it always tends to increase locally, but the overall entropy of the universe stays constant. Said another way: although information is never actually lost, it tends to become more disordered.

Therein lies our grand dilemma. As physicist Lee Smolin writes in The Singular Universe, “The fact to be explained is why the universe, even 13.8 billion years after the Big Bang, has not reached equilibrium, which is by definition the most probable state, and it hardly suffices to explain this by asserting that the universe started in an even less probable state than the present one.” How did the universe ever arrive at a more ordered state when it clearly prefers the opposite? Obviously it's a conundrum in our existing models, but doubly so if we are to imagine a future in our causal loop that goes totally against a law of nature. This question has already drawn some eyebrow-raising proposals.

Ludwig Boltzmann, the 19th century physicist who developed the second law of thermodynamics, gave one proposal: the second law is a statistical phenomenon, so given enough time, there’s a non-zero chance the universe will randomly fluctuate back into a low-energy state. But according to Boltzmann’s own principles, something like the big bang is literally the least likely thing that can happen; while not necessarily impossible, we’re going to explore a more probable scenario.

A contemporary of Boltzmann, James Clerk Maxwell, devised a thought experiment called “Maxwell’s demon” in an attempt to violate the second law. He imagined a demon that controlled a small door between two gas chambers. As individual gas molecules approached the door, the demon would quickly open and close it so that all the fast molecules became trapped in one chamber, and the slow molecules in the other. In doing so, Maxwell proposed the second law was violated since the chamber system became more ordered; one side became hotter and the other became cooler, even though it was totally mixed before. With regard to information, entropy had been lowered — or so he thought.

In this Maxwell’s demon setup, chambers A and B both start with mixed gas, but over time chamber A becomes cold and chamber B becomes hot. (Source: Htkym / CC BY-SA)

Others said not so fast. Although entropy in the chambers decreased, the entropy in the demon’s memory increased. Imagine that the demon’s memory started as a blank slate — highly ordered. As it observed the system, it had to fill its memory with information about the gas molecules to know how to operate the door. In doing so, the information in its memory became more disordered, thereby preserving the second law.

But the demon can just forget that information, right? In doing so, its memory goes back to a blank slate, but the gas is still highly ordered. Seems like an easy solution. Again, not so fast — the loss of information entails a dissipation of heat, which increases the entropy of its surroundings. Alas, it seems the second law cannot be slayed. But maybe it doesn’t need to be.

When looking at the entire system in Maxwell’s thought experiment — which really includes the chambers, the demon, and the demon’s environment — we notice several things. One is that information can take several forms, such as the properties of gas, memory in the brain, and in the effects of heat. Another is that although the second law is maintained and entropy’s trend toward disorder never ceases, local arrangements of information can become more ordered, thus local entropy can decrease. To reiterate an earlier point: overall entropy never changes, but local entropy can. A third observation concerns what is required to produce local order: the demon. More generally, knowledge about the system, or memory, as well as the ability to act upon it to rearrange information. In fact, if an agent has perfect knowledge of a system, it can rearrange it in any way it desires.

Maybe you can see where this is going — intelligence can manipulate information, and enough intelligence can hypothetically recreate a prior state of information in its own system, maintaining a causal loop.

Let’s recap a bit: if we assume our universe is a causal loop, but it is not a CTC, and it probably did not randomly fluctuate to a highly-ordered state, then the only option left is to think that intelligence was used to cause a previous, highly-ordered state in the loop.

You may think “yeah, but what are the odds of that?” I’m inclined to respond with, “better than the alternatives.” Remember, Hanley tells us these things are not impossible, they are merely coincidental; and causal loops are more likely to happen in a universe with intelligent agents. If a causal loop is the only type of universe that can exist, then it’s not coincidental at all; it’s simply how anything must exist. That alone eliminates the apparent absurdity. And although we’re working with a sample size of one, the fact that our universe hosts intelligent life already makes the “intentional causality” path more probable than a random fluctuation.

I’ll also add that this aligns with my discussion on quantum annealing, where a quantum annealing universe converges on it’s highest probability state. If the many parallel timelines in the MWI follow a probability distribution, and all timelines must form causal loops, then not only are the most probable loops are those that contain intelligence, as Hanley suggests, but each loop that takes the intelligence “route” must ultimately land on a set of common characteristics — they must all have the ability to manipulate information, or reality itself, in order to maintain a causal loop. If any one of them did not converge on this knowledge or technological sophistication, then the timeline would not exist in the first place, thus would not be included in the probability distribution. As such, any timeline we follow in the causal-loop-MWI formulation must converge on those traits too.

I also explained how a reward function within quantum annealing would result in the system having incentive to “restart” itself in order to maximize reward. Both causal loops and a quantum annealing universe involve a convergence on intelligence to facilitate a restart, and they involve the act of “forgetting” in order to restore a previous informational state. And although it’s far cry from any firm proof, this heat-releasing forgetting process sounds a lot like our early universe — a hot universe with a highly-ordered state.

From my perspective, causal loops and quantum annealing look like two sides of the same coin. Is it a coincidence that we seem to arrive at the same conclusions from two entirely different approaches? Or have we done away with coincidences?

Make It Loop: A How-To Guide

We manipulate information every day, whether it be physically, mentally, or digitally, but we could use more guidance in the way of resetting a universe — it's a tall order. Information and entropy can take many forms, but there does appear to be one form that rules them all: Von Neumann entropy. I couldn’t possibly summarize it better than physicist Matt O’Dowd from PBS Space Time, so I won’t try:

Quantum entropy, also known as Von Neumann entropy . . . describes the hidden information in quantum systems, but more accurately, it's a measure of entanglement within quantum systems. In fact, the evolution of quantum entanglement may be the ultimate source of entropy, the second law, the limits of information processing, and even the arrow of time.

Von Neumann entropy is of particular interest in the study of quantum information — namely, in black holes and quantum computing. One foundational tenet of quantum theory is that quantum information is never lost or destroyed. This presented a real problem in the “black hole information paradox” where physicist Steven Hawking pointed out that information seemed to be forever lost through what he called Hawking radiation, where information-carrying particles fall into a black hole, adding to its mass, but this same mass can escape through informationless photons, thereby erasing information.

Many physicists thought this paradox couldn’t possibly be, so they devised several solutions to resolve it. Hawking himself even abandoned the paradox, convinced that information was preserved. One promising solution uses entanglement, the phenomenon whereby two particles, or qubits, must be described as a single state. In this solution, the photons escaping through a black hole’s radiation are imprinted with information through entanglement — information that can theoretically be retrieved. Norman Yao, from the University of California, Berkeley, told Quanta Magazine, “If you were God and you collected all these Hawking photons, there is in principle some ungodly calculation you can do to re-extract the information in [each swallowed] qubit.”

Is a literal God required to gather the information needed to connect our loop? Maybe, but I’m only human, so it’s beyond me. Perhaps it's not the only option though. What if we don’t need to know everything; we just need to know enough? As intelligent beings, we do have the ability to reason after all. Can we arrive at the necessary information by means of deduction, without having all the raw data? A step in that direction might concern entanglement; it doesn’t just save information from being lost in our universe — it might show us how to build a new one.

One implication of the entanglement solution to the black hole paradox is that our universe may be a hologram. It sounds rather strange, but the “holographic principle” is taken quite seriously and is of great interest in the quest for quantum gravity. In this approach, spacetime emerges from a network of entangled particles, and our entire universe may be a hologram of information encoded on the surface of a black hole. This is where we may be able to make some progress.

As I mentioned, Von Neumann entropy is also relevant to quantum computing. In fact, there are remarkable parallels between black holes and quantum computing, and the more we study one, the more we tend to learn about the other. Advancements in quantum computers allow us to probe the mysteries of our universe. We’ve already been able to do some pretty mind-bending things with experimental systems, like those that mysteriously “snap back” into order from equilibrium, entangle particles over time (not just space), reverse time, and challenge our notion of normal causal order. In time, we may come to find that we actually live in a quantum computer; which means — in keeping with a causal loop — we’ll recreate the universe through quantum computing too.

It’s no secret I’m a strong proponent of one particular form of quantum computing as a model of our universe: quantum annealing. In alignment with the holographic principle, quantum annealing utilizes a network of entangled qubits, where entanglement steadily increases in accordance with our observations of Von Neumann entropy. There are many other similarities (and I promise I’ll stop mentioning quantum annealing now), but my point, more generally, is that there are reasons to believe we can indeed recreate the universe through some form of quantum computing. For simplicity, I’ll discuss this in terms of a “simulation,” but I want to emphasize that this doesn't imply a simulation is any less “real” than anything else — it’s all quantum information at the end of the day, and existence within a causal loop could just be simulation in perpetuity anyway.

From my vantage, this could go one of two ways. In each scenario, the goal is to create a matching “first” moment within a simulation; as long as that configuration of information is always the same between simulations, and a nested simulation remains coherent, then the causal loop is maintained. Again, an event on the loop is simply a specific arrangement of information. Both options require a super-advanced civilization in our distant future; relatively speaking, they may even seem like gods, but these options don’t require capital-G God.

The first way is that we’re able to deduce some set of parameters and initial conditions of our universe. If we exactly calculate its information capacity (Beckenstein bound), universal constants, laws, and find a grand unified theory, then we can also find some entanglement geometry that permits all of those properties. We’d then create a quantum system with matching parameters and hope we run it from the same “starting” point as our own — that might be some point of minimum entropy where the system couldn’t possibly be any simpler, similar to how we view the singularity before the big bang. This assumes that some simulation “before” us chose the same starting point as the obvious choice since any possible timeline can then follow as its trends back towards equilibrium. The enormous energy required for such a task might spell the annihilation of the parent simulation, like a cosmic self-sacrifice, but maybe that’s the point.

The second and possibly more intriguing way is the “message in a bottle” approach. Imagine that when a simulated universe is programmed, instructions are left for the inhabitants of that simulation to then recreate the same simulation. This makes sense if intelligent life has a vested interest in maintaining the causal loop it occupies. They would leave instructions in something ubiquitous and unchanging like the universal constants, the cosmic microwave background, or in our DNA. In fact, all human DNA differs by less than 1%, and about 98% of our DNA is considered to be non-coding, or “junk” DNA — it’s an ingenious place to pass along crucial information. And DNA is simply a pattern of information that can be easily programmed; meaning DNA would be encoded into the initial conditions, so the universe emerges around DNA-based life, differing from the “absolute simplicity” initial conditions of the first option. Though, we’d still require a universal language that can be understood by any intelligent life to decode the instructions; maybe it all really is in the maths and options 1 and 2 are more alike than we think.

It’s also worth noting that Hanley specifically cites the use of genetics in an example of a “person loop,” where, “Given the normal recycling of cells, it may be that a person’s body has entirely replaceable parts.” Yet genetic code (ideally) remains unchanged, so that information could remain consistent in a loop. In fact, if DNA is the focal point of a causal loop, then it seems the only information that needs to be simulated is that which constitutes the experience and collective memory of DNA-based life. If information changes outside of that, who would ever notice? The information requirement for this simulation becomes much more manageable since we don’t need to render every property of every particle throughout the observable universe.

Of course, this is all wild speculation, but it does make for a fun exercise. Maybe there are alternate routes that will become obvious as we learn more about reality. I’m just trying to get the ball rolling in case we do live within a causal loop. It's my loop too, after all.

Finding God

When exploring the idea of a causal-loop-only universe, it's almost impossible to ignore some of the implications for life within that universe.

For one, it appears to make intelligent life necessary for anything to exist — at least in any universe that’s not a CTC. From this view, life isn’t rare: it's required. If no intelligence emerges, there is no feasible way for a causal loop to remain informationally consistent. This also means that any life-carrying universe must follow a series of causes and effects that enable a minimum degree of intelligence and agency — life must gain the ability to manipulate the information of the universe itself. So not only is intelligence required, but highly advanced intelligence is required.

What does this all look like from the perspective of life within such a universe? Well, look at our own — the entire universe is “cooling down” towards disorder, but intelligent life and what it touches are the only things that trend towards more order. Over time, our knowledge and technological capabilities increase. What’s the upper limit to this trend? Is it a coincidence that we’ve come to a point where we can start exploring and controlling quantum information, the very fabric of reality? How much more will we achieve in the next century, millennium, or ten millennia?

Maybe we really have just gotten lucky, but in a causal loop this trajectory is not a coincidence — it's a certainty. Life doesn’t just veer off the rails into oblivion; it’s locked on a path, or lots of equivalent paths that are all destined to tell the same story — the same universal archetype. The loop cannot be broken, else it would have never existed. Life is bound to persist, bound to overcome, bound to exist again — isn’t this the kind of hope people normally place in God?

I’m not saying God literally exists. Maybe an omniscient being exists as the highest expression of intelligence on a loop right before it must reset, but that seems like a distraction to a more meaningful point: existing in a causal loop — at any point — is practically like living in a universe where God exists too.

Isn’t that the case if nearly everything about existence takes the shape of a series of unending coincidences? Otherwise, the odds of life arising in our universe are astronomically unfavorable, as is the fact that life has evaded extinction for a few billion years to become what it is today. If you recognize coincidence after coincidence, it's not much of a leap for a rational mind to think that a higher power ordains each moment, following some grand design. Many of us have stepped away from that worldview, but maybe we just had an incomplete perspective. Maybe we have reason to believe again. As we step closer to truth, we might see that our old silhouette of God was simply the negative space of an equally hopeful structure of reality.


Low hanging fruits (LWCW 2020)

15 сентября, 2020 - 21:15
Published on September 15, 2020 6:15 PM GMT

During Less Wrong Community Week-end (Europe), in one event people share low-hanging fruit they used. I chaired it this year and defined low-hanging fruit as something that can be easily bought or done that improved the life. Here is the list of fruits shared in 2020. All typos are mine. "I" usually reflect the one who gave the message and not the author of this blog post.

Water watering bulbs

If you don't know how much to water your plants, if you did too much or not enough, let bulbs do it for you. Examples of bulbs are

Before work time

Reserve some time in the morning, before you head out to work, and invest this into something that is important to you. You do this without any experience from the day (bad/good) and with you full physical ability.

Ad block on smartphone

Blokada https://blokada.org/index.html Add blocking for android. Reduce noises, distraction, bandwidth (?), easy to install. added bonus: Firefox + Ublock origin


Playing music is to listening to live music as live music is to recorded music. If you don't want to spend time to learn an instrument, the Kalimba is cheap and directly lead to beautiful musics.

Better sleeping No device at night

Set all devices to lock at sleeping time.


Get a smart lightbulb, and setting to slowly dim/become red for the half hour before bed - makes going to bed at the right time the default action and makes me feel tired, and signficantly decreases the willpower it takes.

End the day with a paper book to avoid looking at screen.

Two to ten minutes of sun in the morning, or by default strong light


Takes melatonine.


Schedule things in morning so that you have incentive to sleep.

Having phone 3 meters from bed, to force to go outside of bed to turn it of

Day / night separation

Ensuring you don't see your bed from your workplace (and reciprocally) , to feel the separation between work space and personal space

Neater writing

Switching to using a fountain pen can force you to write slower and therefore neater. Pilot makes very cheap and good-writing fountain pens (the Pilot Varsity) that are disposable just like regular ballpoints, they're around $20 US for a pack of 12

Note taking One note

Use the software One Note to keep tracks of notes about everything.

Recalling fact about friends

You can use space repetition system to recall fact about friends (who is friend with who, where they moved to, what's the current job...)

Writing a name done help to remember it (some people at least)

Uses Facebook event to remember who went to an event, who you met, and use it to take note (avoid notes from fetlife events)

Spaced repetition also helps to recall birthdays of friends/family!

Relate knowledge

Create your own wiki (e.g. wikimedia) to keep notes and links them together, so that you can revisit them when you want. Pre-commit two hours each month to see if you want to improve the wiki, add links, explanation


Bullet journaling, there's a great guide on reddit at /r/bulletjournal


carrying a small A5 or A6 sized notebook with you can be very useful

Idea catcher

Have an idea catcher, some place to write idea down to not forget it and not keeping the idea in mind while you do something else

Password manager

Use a password manager to recall general private information you need to generally have private and accessible such as account number, all previous addresses, photo, file

Win time while writing

If there are unicode symbols you expect to use often (e.g. math symbol, foreign letter), save them as shortcut or emoji, so they can be accessed quickly and put in any message

Using autocorrect allow to shorten text you write often. E.g. @@ is automatically replaced by your email adresse, sigma replace by σ )

You can also do this on windows with autohotkey and there are similar scripts for linux/mac

even lower hanging: I find the US international layout helpful"

Water stone

Start with combination stone. low to get better knife, and then cook easier to cut your food. It helps to relax. Allow more flexibility/granularity than standard knife sharpening


Set up email inbox rules to sort messages from senders you consider low-priority or work-related (if you use a combined inbox) to their own folder so you don't have to look at them all the time. also: if using Outlook Web App there's an option to have a text message sent to your phone as a result of an inbox rule: I have a rule that texts me if my boss's boss (2 levels up) sends a direct email with my name in the To: line


Use more gif in chat conversation to add more silliness and joy in chat discussions


As your light bulbs in your house burn out convert to LED ones, they use less power and can be softer too! You can buy LEDs with the sun spectrum and same intensity!

Sex life

Keep a list of desire/fantasy, so that if a new relationship ask what new thing you'd want to try, you don't get lost thinking about it

Back pain

Leaning about the back of the chair a few time a day relax. Most office chair allow to do it

Instead of getting a gaming chair spend the same money on a lightly used executive office chair like a Herman Miller or a Steelcase, your back will thank you. also: if you have a dealer that sells those fancy chairs near you, you can go try them to find out what size fits you best before you look for used one

Lost wallet

Have a list of phone number to call in your wallet so that the people can call you.

Extra lists

Neel Nanda shared his personal list of low hanging fruit.


AI Safety Discussion Day

15 сентября, 2020 - 17:40
Published on September 15, 2020 2:40 PM GMT

Monday, September 21

4pm - 9pm UTC

See here for more info: https://docs.google.com/document/d/1J5sTtquNud-XINMipo_r9tJZB_c4L6R0SuSiqqLJ-IM/edit


Gems from the Wiki: Paranoid Debating

15 сентября, 2020 - 06:51
Published on September 15, 2020 3:51 AM GMT

During the LessWrong 1.0 Wiki Import we (the LessWrong team) discovered a number of great articles that most of the LessWrong team hadn't read before. Since we expect many others to also not have have read these, we are creating a series of the best posts from the Wiki to help give those hidden gems some more time to shine.

Most of the work for this post was done by freyley and JenniferRM who I've added as coauthors to this post. Wiki edits were also made by all of the following: BJR, PeerInfinity, Admin, PotatoDumplings, Vladimir Nesov, Zack M. Davis, Freyley and Grognor. Thank you all for your contributions!

Paranoid Debating is a variant of The Aumann Game where one player purposefully subverts the group estimate. Similar to The Aumann Game, the activity consists of a group jointly producing a confidence interval for an unknown, but verifiable quantity, which is then scored for accuracy and calibration. One individual is designated the spokesperson, who is responsible for choosing the final estimate. However, before the activity begins, one individual is secretly assigned the role of misleading the other members. The deceiver is scored higher the worse the final estimate is.The activity is intended to teach accurate estimate, proper agreement techniques, and recognition of deception.

A typical subject for the game might be "How much maize is produced in Mexico annually?".

  • Select player roles. In person, each player receives or selects a card from a pack of role cards. For 4 players, create a pack of role cards by combining 3 black cards with 1 red card. For 4-6 players there should be 1 red card and the rest black with the rest being enough for one card per person. For 7-9 players, 2 red cards. Some variants include a role named the Advocate, which you can designate one of the black cards to represent.

Simplest variant

  • Each player receives a role. No advocate.
  • A question is asked.
  • Players discuss for 20 minutes, then write down their individual response on a card.
  • The answer is researched.
  • Scores are assigned.

Advocate variant, #1

  • Each player receives a role. One advocate in the deck. The player who receives the Advocate displays it to the group.
  • A question is asked.
  • Players discuss for 20 minutes, attempting to convince the advocate. The advocate writes down their response on a card. This is the group's answer.
  • The answer is researched, scores are assigned.

Advocate variant, #2

  • Each player receives a role. One advocate in the deck. No player may display their card.
  • A question is asked.
  • Players discuss for 20 minutes. Anyone may say anything. At the end, the advocate writes down what they think the group's response is on a card, and the group is scored for this.
  • Answer researched, scores assigned.

Variation-by-argument variant

  • Each player receives a role. No advocate. No player may display their card.
  • A question is asked.
  • Players have 2-5 minutes to write down their initial, individual estimate.
  • Players discuss for 20 minutes. Anyone may say anything. At the end, players write their revised estimates on their card.
  • Players are scored based on their delta -- the more you go toward the correct answer from your initial estimate, the more points.

Southern California Variant #1

At the February 2011 Southern California LW Meetup we tried playing the game. For questions we bought a game of Wits & Wagers (which has trivia questions with numerical answers) and looked at the cards to find questions that were about substantive topics where Fermi estimates seemed useful. The speaker/advocate was chosen on a rotating basis so that everyone gets at least one chance to play that role, and cards are dealt from a deck of playing cards to everyone else. Red cards mean you're trying to make the group deliver a bad answer. Black cards mean you're trying to make the group deliver a good answer. This makes the number of people to be suspicious of itself an unknown parameter and leads to funny outcomes and interesting coordination problems. Scoring used the experimental scoring code that is intended to assign the most credit to small error bars around high confidence correct answers.


It's really easy to ask a question that is then very difficult to answer later. For example, the question "How many miles of railroad are there in Africa?" is somewhat difficult to answer. Walking through the CIA World Fact Book one country at a time, we arrived at an answer in the range of 48,000-49,000. However, in cross-checking that information, we discovered that in Uganda, there are only 125 miles of active railroad, but 1200km listed in the Fact Book. It seems likely, therefore, that the total estimate includes some non-active miles of railroad, and is thus too high. This section is here to list good and bad questions and resources to get questions from or answer questions unusually easily. If listing an answer, please make the text of the answer white so people can use it if they want.


A not-so-trivial inconvenience to playing the game is figuring out how to score it properly.

To make this easier there is now a tentative file format for representing a game of paranoid debate and a python script for scoring games represented in this format. If you'd like to download or edit this software check out this github project. Please note that the game format and the code are very likely to evolve to remove bugs and support whatever sort of play turns out to be the most fun and/or educational.

Blog postsSee also


Book Review: Working With Contracts

15 сентября, 2020 - 02:22
Published on September 14, 2020 11:22 PM GMT

Contracts is one of those areas that I always figured I ought to study, at least enough to pick up the basics, but never seemed either interesting or important enough to reach the front of my queue. On top of that, there’s a lot of different angles from which to approach the subject: the law-school-style Contracts 101 class covers the legal principles governing contracts, the economists’ version abstracts away the practical specifics and talks about contracts in game-theoretic terms, more business-oriented books often focus on negotiation, etc.

Working With Contracts: What Law School Doesn’t Teach You” is about the practical skills needed for working with contracts on an everyday basis - specifically the sort of skills usually picked up on the job by young lawyers. It talks about things like what to look for when reviewing a contract, how to organize contracts, why lawyers use weird words like “heretofore”, various gotchas to watch out for, etc. It assumes minimal background knowledge, but also includes lots of technical nuts and bolts. In short, it’s the perfect book for someone who wants a technical understanding of real-world contract practice.

This post will review interesting things I learned from the book.

Background Knowledge

First, some very brief background info, which the book itself mostly assumes.

Legally, in order to count as a “contract”, we need four main pieces:

  • Offer: someone offers a deal
  • Acceptance: someone else accepts it
  • Consideration: both parties gain something from the deal; it’s not a gift
  • Mutual understanding: both parties agree on what the deal is and the fact that they’ve agreed to it

A Contracts 101 class has all sorts of details and gotchas related to these. Notice that “signature on a piece of paper” is not on that list; e.g. oral contracts are entirely enforceable, it’s just harder to prove their existence in court. Even implicit contracts are enforceable - e.g. when you order food from a restaurant, you implicitly agree to pay for it, and that’s a legally-enforceable contract. That said, we’ll focus here on explicit written contracts.

Once formed, a contract acts as custom, private law between the parties. Enforcement of this law goes through civil courts - i.e. if someone breaches the contract, then the counterparty can sue them for damages. Note the “for damages” in that sentence; if a counterparty breaches a contract in a way that doesn’t harm you (relative to not breaching), then you probably won’t be able to sue them.  (Potentially interesting exercise for any lawyers in the audience: figure out a realistic contractual equivalent of Newcomb’s problem, where someone agrees to one-box on behalf of someone else but then two-boxes, and claims in court that their decision to two-box benefited the counterparty rather than harming them. I’d bet there’s case law on something equivalent to this.)

Note that this is all specific to American law, as is the book. In particular, other countries tend to more often require specific wording, ceremonial actions, and the like in order to make a contract (or component of a contract) enforceable.

What Do Contracts Do?

The “functional” components of a contract can be organized into two main categories: representations and covenants. A representation says that something has happened or is true; a covenant says that something will happen or will be true.

Some example representations:

  • ABC Corp signs a statement that they have no pending lawsuits against them.
  • Bob signs a statement that the house he’s selling contains no lead-based paint or asbestos insulation.
  • Carol signs a statement that the forms she provided for a mortgage application are accurate and complete.
  • Title Corp signs a statement that there are no outstanding mortgages on a piece of property.

Nominally, each of these is a promise that something is true. However, that’s not quite how they work functionally. Functionally, if a counterparty acts based on the assumption that the statement is true and is harmed as a result, then they can sue for damages. In other words, when providing a representation, we provide insurance against any damages which result from the representation being false. Bob may not even have checked that the house he’s selling contains no asbestos, and that’s fine - if he’s willing to insure the counterparty against any asbestos-related risk.

This idea of insurance becomes important in contract negotiations - there’s a big difference between e.g. “no environmental problems” and “no environmental problems to the best of their knowledge”. The former insures against any environmental problems, while the latter insures against any environmental problems which the signer knew about at time of signing. One puts the duty/risk of finding/fixing unknown problems on the signer, while the other puts it on the counterparty.

The other key thing to notice about representations is that they’re as of the signing date. When Bob states that his house contains no asbestos, that does not insure against the house previously containing asbestos or containing asbestos in the future. It only needs to be true as of that one moment in time. This becomes relevant in complex multi-stage contracts, where there’s an initial agreement subject to a bunch of conditions and reviews, and the final closing comes later after all that review is done. For instance, in a mortgage there’s an initial agreement subject to the borrower providing lots of forms (credit check, proof of income, proof of insurance, etc…), and the final contract is closed after all that is reviewed. In these situations, the borrower usually makes some representations early on, and then has to “bring down” the representations at closing - i.e. assert that they’re still true.

While representations deal with past and present, covenants deal with the future. They’re the classic idea of contract provisions: precommitments to do something. Some examples:

  • ABC Corp agrees to not sell the machinery they’re leasing.
  • Bob agrees to not use any lead-based paint on the house he’s buying.
  • Carol agrees to maintain minimum levels of insurance on the house she’s mortgaging.
  • Monitoring Corp agrees to alert Bank if there is any change in the credit rating of Company.

These work basically like you’d expect.

Representations and covenants often run in parallel: a representation that X is true will have a corresponding covenant to make X continue to be true in the future. For instance:

  • ABC corp states that they do not currently have any liens on their main plant, and agrees to not create any (i.e. they won’t borrow any money with the plant as collateral).
  • Carol states that she currently has some level of insurance coverage on her house, and agrees to maintain that level of coverage.

This is mainly for contracts which will be performed over a long time, especially debt contracts. One-off contracts (like a purchase/sale) tend to have relatively few covenants; most of their substance is in the representations.

Parallels to Software Development

Representations and covenants seem pretty straightforward, at least conceptually. One is insurance against some fact being false, the other is a precommitment.

The technical complexity of contracts comes from the interplay between two elements. First:

The goal of a contract is to describe with precision the substance of the meeting of two minds, in language that will be interpreted by each subsequent reader in exactly the same way.

In other words, we want no ambiguity, since any ambiguity could later be used by one of the parties to “cheat” their way out of the contract. This creates a headache very familiar to software developers: like programs, contracts mean exactly what they say. There is no “do what I mean” button; we can’t write something ambiguous and rely on the system to figure out what we meant.

Second: we don’t have perfect knowledge of the future. When making a precommitment in a contract, that precommitment is going to operate fairly mechanically in whatever the future environment looks like. Just like a function written in code may encounter a vast space of unusual inputs in the wild, a precommitment in a contract may interact with a vast space of unusual conditions in the wild. And since we don’t know in advance which conditions will be encountered, the person writing the code/contract needs to consider the whole possible range. They need to figure out, in advance, what weird corner cases could arise.

Put those two pieces together, and the picture should feel very familiar to software developers.

The result is that a lawyer’s job ends up involving a lot of the same pieces as a software engineer’s job. A client/manager says “here’s what we want”, the lawyer/programmer says “ummm I don’t think you really want that, because <problem> happens if <circumstance>”, and they go back-and-forth for a while trying to better define what the client/manager really wants. An example from the book pictures a lawyer reviewing a contract with a client (simplified slightly by me):

Lawyer: This is a covenant that restricts your business from incurring debt…

Client: That’s fine, we don’t plan to use any bank financing.

Lawyer: Well, the definition of “debt” used is very broad. For instance, it includes payment plans on any equipment you buy…

Client: Well, we can add some room for that.

Lawyer: How much room do you need?

Client: Based on our current needs, less than $1M at any given time.

Lawyer: But if that new plant you were talking about gets off the ground, won’t you need to buy a bunch of new equipment for it?

Client: Good point, we’d better ask for $5M…

This could go on for a while.

Despite the parallels, lawyers are not very good software engineers, in general. The most common solution to the sorts of problems above is to throw a patch on it, via two kinds of exceptions:

  • Carveouts: action X is generally forbidden, except for special case Y.
  • Baskets: action X is generally forbidden, except in amounts below some limit (e.g. the $5M limit in the example above)

Over the course of negotiations, patches are layered on top of patches. An example from the book:

Little Corp may not transfer any Shares during the term of this Agreement, except for (i) transfers at any time to its Affiliates (including, without limitation, Micro Corp) other than Medium Corp, and (ii) so long as an Event of Default attributable to Big Corp shall have occurred and be continuing, transfers to any Person (including, for the avoidance of doubt, Medium Corp).

This mess is the contractual equivalent of a series of if-statements nested within if-statements. This is, apparently, standard practice for lawyers.

(Another complaint: in a complex contract, it would not be hard to include provisions alongside the table of contents which nullify provisions which appear in the wrong section. Then people reviewing the contract later wouldn’t have to read the whole thing in order to make sure they didn’t miss anything relevant to their use-case; it would be the contract equivalent of variable scope. My mother’s a lawyer in real estate and wills, so I asked her why lawyers don’t do this. Her possibly-tongue-in-cheek-answer: might put lawyers out of business. Kidding aside, the bar association engages in some pretty incestuous rent-seeking, but judges have been pushing for decades to make contracts and other legal documents more legible to non-lawyers.)

The “Do What I Mean” Button

A contract writer’s job is much easier than a programmer’s job in one key respect: a contract will ultimately be interpreted by humans. That means we can say the equivalent of “look, you know what I mean, just do that”, if we expect that a court will actually know what we mean. 

This gives rise to a bunch of standard tricks for invoking the do-what-I-mean button. We’ll talk about three big ones: materiality, reasonableness, and consistency with “ordinary business”/”past practice”.


Roughly speaking, materiality means ignoring small things. For instance, compare:

  • “Borrower shall not default in its obligations under any contract”, vs
  • “Borrower shall not default in its obligations under any material contract”

The first would be breached if e.g. the borrower forgot to update their payment information on their $10 monthly github subscription, and the payment was late. The second would ignore small things like that.

In general, materiality is relative to the size of the business. A $100k oversight would be quite material to most small businesses, but immaterial to AT&T. It’s also relative to the contract - if that $100k oversight is directly relevant to a $300k contract, then it’s material, even if the $300k contract itself is small change to AT&T.

Where’s the cutoff line? That’s for courts to decide, if and when it matters. That’s how pushing the do-what-I-mean button works; you have to rely on the courts to make a sensible decision.

One particularly common usage of materiality: “material adverse change/effect”. Rather than saying “X has no pending lawsuits”, we say “X has no pending lawsuits whose loss would entail a material adverse effect”. Rather than saying “Borrower will notify Lender of any change in their business forecasts”, we say “Borrower will notify Lender of any material adverse change in their business forecasts”. This way a lender or buyer finds out about problems which actually matter, without being inundated with lots of minor details.


Reasonableness is exactly what it sounds like. It’s saying something that has some obvious loophole to abuse, then giving a stern look and saying “don’t go pulling any bullshit”. Example: “Company shall reimburse X for all of X’s out-of-pocket expenses arising from...” vs “Company shall reimburse X for all of X’s reasonable out-of-pocket expenses arising from…”

Some patterns where reasonableness shows up:

  • Reasonable expectations, e.g. “Borrower shall notify Lender of any changes which could reasonably be expected to have a material adverse effect…”
  • Consent not to be unreasonably withheld, e.g. “ABC Corp may not X without consent of XYZ Corp, such consent not to be unreasonably withheld.”
  • Reasonable efforts, e.g. “Borrower shall obtain X from their insurer.” vs “Borrower shall exert reasonable effort to obtain X from their insurer.”

What would each of these do without the reasonableness clause? In the first case, the borrower could claim that they didn’t expect Obvious Bad Thing to impact their business. In the second case, XYZ Corp could withhold consent for some case they obviously don’t care about in order to extract further concessions from ABC Corp. In the third case, an insurer could simply refuse to provide X, and the borrower wouldn’t be able to do anything about it.

Behaving Normally

Sometimes a lender or prospective buyer wants to say “what you normally do is fine, so do that and don’t go crazy”. Two (similar) standards for this: “in the ordinary course of business” and “consistent with past practice”.

Typical examples:

  • “Borrower will not incur any <debt of specific type> except in the ordinary course of business.”
  • “ABC Corp will not make any payments to <subsidiary> except in a manner consistent with past practice.”

In general, this is a pretty good way to let business continue as usual without having to go into all the tiny details of what business-as-usual involves, while still ensuring that e.g. a borrowing company doesn’t sell all their assets, distribute the funds as a dividend to a parent company, and then declare bankruptcy.

Remedial Provisions

In general, if a contract is breached, the counterparty can sue for damages. If you want anything else to happen as the result of a breach, then it needs to be included in the contract. In particular, common things triggered by a breach include:

  • Termination: counterparty gains the right to terminate the contract
  • Acceleration: loaned money must be paid back immediately
  • Indemnification: counterparty must be paid for any breach-related damages

The last is somewhat redundant with the court system, but by including it explicitly, the contract can also specify how to calculate damages, how damages are to be paid, caps or exceptions to liability, etc. Rather than leaving such matters to the whims of a court, the contract can specify them.

Termination and acceleration are particularly relevant from a negotiation standpoint - the former for one-shot contracts like sales, and the latter for long-term contracts like debt.

The earlier stages of a complex sale (e.g. a merger/acquisition of a company) involve an agreement to sell subject to a long list of conditions being satisfied - i.e. the “due diligence” conditions. If any of those conditions are not met, then the buyer gains the right to terminate the contract - i.e. walk away from the deal. But these things can take months; the last acquisition I saw took around a year. During that time, the buyer may change their mind for reasons entirely unrelated to the seller - e.g. market prices for the seller’s assets may change. The seller wants to prevent the buyer from walking away in a case like that.

This means that the buyer has incentive to ask for very complicated and/or very subjective conditions, to give themselves the opportunity to walk away whenever they want. For instance, if a buyer manages to get a condition which requires “X which is satisfactory in Buyer’s sole discretion”, then the buyer effectively gains a blanket option to walk away from the deal; they can always just claim that some inane detail of X is unsatisfactory. (This is a good example where reasonableness can fix the problem.) In particular, if market conditions change, then the buyer may use that option to negotiate more concessions, like a lower purchase price.

Acceleration has a similar effect in debt deals. Nobody ever wants to accelerate debt; it’s a surefire way to end up in bankruptcy court. When a contract breach gives a lender the option to accelerate, what actually happens is that they use that option as leverage to negotiate a new deal. They’ll want a higher interest rate, or a claim on more of the borrower’s assets, or the like.

Takeaway: just because a contract specifies a particular penalty for breach does not mean that the penalty actually happens. Often, the penalty is really used as an option by one party to renegotiate the contract, and provides leverage for such a negotiation.


Contracts are a lot like computer programs: they’re taken very literally, and they could potentially encounter a wide variety of corner cases in the wild. Together, those two pieces make a contract writer’s job quite similar to a programmer’s job: a client/manager will tell you what they think they want, and then you go back-and-forth trying to formulate what they really want.

Compared to (good) software developers, lawyers do not seem to be very good at this; they tend to throw patches on top of patches, creating more corner cases rather than fewer. They don’t seem to have even realized that enforced scope and modularity are things which one could use in a contract; consequently, every contract must be read in its entirety by anyone relying on it. That puts a sharp limit on the scale of today’s contracts.

Unlike programmers, lawyers do have a “do what I mean” button, although its use comes with a cost; it means leaving interpretation to the whims of a court. For many “simple” things, that cost is relatively minor - so contracts can ignore “immaterial” problems, or require “reasonable” behavior, or stipulate consistency with “past practice” and “the course of ordinary business”.

Functionally, contracts provide insurance against stated facts being false, and they provide precommitments for the future. They can also stipulate nominal penalties for breach of contract, though in practice these penalties often serve as options to renegotiate (with leverage) rather than actually being used.


A case study in simulacra levels and the Four Children of the Seder

15 сентября, 2020 - 01:31
Published on September 14, 2020 10:31 PM GMT

This was originally going to be a comment on Zvi's excellent post, The Four Children of the Seder as the Simulacra Levels, but it got too long and I thought it warranted its own post.

My cousin's kid is having a tough time lately. He's stealing trinkets, destroying things around the house, and according to his parents he "lies all the time." His mom will grill him over whether he's lying or not - asking him again and again whether he's brushed his teeth, until he breaks down and admits that he didn't.

It's not clear that she has evidence in cases like this that he was lying. I suspect that the experience of being grilled is so uncomfortable that the kid finds it easier to make a false confession and brush his teeth a second time than to stand up for himself. I also guess that some of his stealing and destroying habits come from acting out on frustration with authority figures. It's a way of practicing deception, provoking reactions, and testing adults. Because he doesn't see a way to gain the trust and respect of adults, he's trying to figure out how to trick them most effectively.

Why are his parents behaving this way? It is because they have become far less concerned with object-level reality - whether or not he's brushed his teeth - than with the question of whether their child is a liar. The kid understands that everything they ask him to do is a test of his honesty. It's a symbol. Brushing his teeth isn't to prevent cavities. It's a trial of his character.

So his parents are speaking to him on the level of simplicity. He may have started wise, but is becoming wicked as his parents draw him deeper and deeper into a world of symbolism.

This highlights one of the paradoxes of the levels. Whether or not the kid lied about brushing his teeth is an object-level truth. And if you asked his parents why they care, they'd tell you "because we don't want him to get cavities."

A relationship that's on a higher simulacrum level is often still connected to level one. The higher levels accumulate, rather than replacing the lower levels. Brushing his teeth is about cavities, but it's also about whether you can trick your parents, and it's also about whether or not your child is a liar.

Our family is concerned about this, and we're operating on level four. We understand that bringing this up with the parents is a delicate issue, because we don't want to imply that they're bad parents. And we primarily struggle with "how to ask" them about the situation. To us, the question of whether or not the kid brushes his teeth is almost irrelevant. We're not trying to get anything out of them or control their behavior.

We're peering into level three, trying to understand the symbolism around everybody's behaviors, and how our word choice, tone of voice, body language, and the context of the discussion might fit into the symbolism of the discussion as interpreted by the parents.

Fortunately, we have slightly more clarity about how to deal with this than the Rabbis seem to, though not much. Our best ideas so far:

  1. Talking with each other about what's going on, and really taking our time before engaging with his parents. Then talking with the parents to start understanding their worldview. Peering from level 4 deeper into level 3.
  2. Suggesting that they agree on family therapy. This way, they'd have a single, credible, shared authority figure - a therapist - rather than a patchwork of advice, books, and their own opinions. We hope that the therapist can help them escape level 3 and get to level 2, so that they can stop brooding on this question of "is our son a good-for-nothing liar" and start asking "how are our words and actions influencing our son's behavior, and how can we influence him in ways that we like better?"
  3. Getting them to focus more on verifying their son's behavior through evidence rather than grilling him, and giving the kid tasks that focus on directly engaging with reality. We have him help cook using sharp knives, we teach him the names of plants in the garden, and direct him to observe nature closely and learn rules that count for something: the patterns on a spider's back, the shape of a weed's roots, the rules of chess. And we try to give him opportunities to "teach" others about what he learns - telling his sister that you can eat nasturtium petals, for example. Rewarding him for his engagement with reality.

In general, being lost in higher simulacra levels seems to involve a breakdown of trust that basic care, forgiveness, and acceptance is available; a fragmentation of the group's wisdom and perspective; and stronger incentives being attached to the higher simulacra levels than the lower levels.

This suggests to me in particular that we have gone deeply astray with our obsession with people's character. The drive to figure out "what kind of a person" somebody is, or "what they think of our character," leads us to experience simple activities as tests of our character. We experience requests, advice, feedback, and just simple factual claims as part of the test, not attempts to steer an object-level outcome. We become highly self-conscious, extremely concerned about how every aspect of our selves might be interpreted.

This goes on and on. Even people who ostensibly want to fight this can get caught up in it themselves. Saying "I'm only tolerant of intolerance" creates the perception that you're constantly engaged in testing the people around you for having a character of intolerance. Nobody will be able to rest easy unless they commit, one way or another, to just not caring what you think about them.

And of course, when this gets done on a massive scale, you get "I'm only intolerant of enforced tolerance." You tolerate the most objectively reprehensible behavior, not because you think it's OK, but in order to show just how far you're willing to go to push back against the other side.

What might be the way forward?

If I'm right, and the levels are layers, then we have to scrape them away. The way that starts is by establishing trust - first within our own side, and only then with the other side. We need to make sure we can credibly show that we've got enough unity amongst ourselves not to turn a reconciliation attempt into an attack, and that we bring wisdom to the table.

With trust established, we try to bring in an agreed-upon authority. This could be shared set of values or concepts, a group of people with the credibility to serve as a reconciliation figure, or a process that creates space for the disputants to figure out what they actually want from life, not from their enemies.

Having a sense of shared authority and process, we look for any opportunity to reward people who are operating on level 1 and displaying a conscious rejection of levels 2-4. Bring facts to the table? Applause. Read that book rather than assuming you understand it from the title? Applause. Criticize the fallacies of your own side? Applause.

Going forward, I will try to bring up the idea with my friends (all American liberals, like me), that a lot of the "other side" might be trying to react to a perceived authoritarianism of the left by ostentatiously embracing what we find repugnant. I want to see if we can form enough agreement around that that it would become imaginable that we could try to interface with the other side and build trust there as well.


Most PDs are Stag Hunts; Most Stag Hunts are Battle of the Sexes

15 сентября, 2020 - 01:13
Published on September 14, 2020 10:13 PM GMT

I previously claimed that most apparent Prisoner's Dilemmas are actually Stag Hunts. I now claim that they're Battle of the Sexes in practice. I conclude with some applications to infohazards and AI strategy.

In a comment on The Schelling Choice is "Rabbit", not "Stag" I said:

In the book The Stag Hunt, Skyrms similarly says that lots of people use Prisoner's Dilemma to talk about social coordination, and he thinks people should often use Stag Hunt instead.

I think this is right. Most problems which initially seem like Prisoner's Dilemma are actually Stag Hunt, because there are potential enforcement mechanisms available. The problems discussed in Meditations on Moloch are mostly Stag Hunt problems, not Prisoner's Dilemma problems -- Scott even talks about enforcement, when he describes the dystopia where everyone has to kill anyone who doesn't enforce the terrible social norms (including the norm of enforcing).

This might initially sound like good news. Defection in Prisoner's Dilemma is an inevitable conclusion under common decision-theoretic assumptions. Trying to escape multipolar traps with exotic decision theories might seem hopeless. On the other hand, rabbit in Stag Hunt is not an inevitable conclusion, by any means.

Unfortunately, in reality, hunting stag is actually quite difficult. ("The schelling choice is Rabbit, not Stag... and that really sucks!")

Inspired by Zvi's recent sequence on Moloch, I wanted to expand on this. These issues are important, since they determine how we think about group action problems / tragedy of the commons / multipolar traps / Moloch / all the other synonyms for the same thing.

My current claim is that most Prisoner's Dilemmas are actually battle of the sexes. But let's first review the relevance of Stag Hunt.

Your PD Is Probably a Stag Hunt

There are several reasons why an apparent Prisoner's Dilemma may be more of a Stag Hunt.

  • The game is actually an iterated game.
  • Reputation networks could punish defectors and reward cooperators.
  • There are enforceable contracts.
  • Players know quite a bit about how other players think (in the extreme case, players can view each other's source code).

Each of these formal model creates a situation where players can get into a cooperative equilibrium. The challenge is that you can't unilaterally decide everyone should be in the cooperative equilibrium. If you want good outcomes for yourself, you have to account for what everyone else probably does. If you think everyone is likely to be in a bad equilibrium where people punish each other for cooperating, then aligning with that equilibrium might be the best you can do! This is like hunting rabbit.

Exercize: is there a situation in your life, or within spitting distance, which seems like a Prisoner's Dilemma to you, where everyone is stuck hurting each other due to bad incentives? Is it an iterated situation? Could there be reputation networks which weed out bad actors? Could contracts or contract-like mechanisms be used to encourage good behavior?

So, why do we perceive so many situations to be Prisoner's Dilemma -like rather than Stag Hunt -like? Why does Moloch sound more like each individual is incentivized to make it worse for everyone else than everyone is stuck in a bad equilibrium?

 Sarah Constantine writes:

A friend of mine speculated that, in the decades that humanity has lived under the threat of nuclear war, we’ve developed the assumption that we’re living in a world of one-shot Prisoner’s Dilemmas rather than repeated games, and lost some of the social technology associated with repeated games. Game theorists do, of course, know about iterated games and there’s some fascinating research in evolutionary game theory, but the original formalization of game theory was for the application of nuclear war, and the 101-level framing that most educated laymen hear is often that one-shot is the prototypical case and repeated games are hard to reason about without computer simulations.

To use board-game terminology, the game may be a Prisoner's Dilemma, but the metagame can use enforcement techniques. Accounting for enforcement techniques, the game is more like a Stag Hunt, where defecting is "rabbit" and cooperating is "stag".

Battle of the Sexes

But this is a bit informal. You don't separately choose how to metagame and how to game; really, your iterated strategy determines what you do in individual games.

So it's more accurate to just think of the iterated game. There are a bunch of iterated strategies which you can choose from.

The key difference between the single-shot game and the iterated game is that cooperative strategies, such as Tit for Tat (but including others), are avaliable. These strategies have the property that (1) they are equilibria -- if you know the other player is playing Tit for Tat, there's no reason for you not to; (2) if both players use them, they end up cooperating.

A key feature of Tit for Tat strategy is that if you do end up playing against a pure defector, you do almost as well as you could possibly do with them. This doesn't sound very much like a Stag Hunt. It begins to sound like a Stag Hunt in which you can change your mind and go hunt rabbit if the other person doesn't show up to hunt stag with you.

Sounds great, right? We can just play one of these cooperative strategies.

The problem is, there are many possible self-enforcing equilibria. Each player can threaten the other player with a Grim Trigger strategy: they defect forever the moment some specified condition isn't met. This can be used to extort the other player for more than just the mutual-cooperation payoff. Here's an illustration of possible outcomes, with the enforceable frequencies in the white area:

The entire while area are enforceable equilibria: players could use a grim-trigger strategy to make each other cooperate with very close to the desired frequency, because what they're getting is still better than mutual defection, even if it is far from fair, or far from the Pareto frontier.

Alice could be extorting Bob by cooperating 2/3rds of the time, with a grim-trigger threat of never cooperating at all. Alice would then get an average payoff of 2⅓, while Bob would get an average payout of 1⅓.

In the artificial setting of Prisoner's Dilemma, it's easy to say that Cooperate, Cooperate is the "fair" solution, and an equilibrium like I just described is "Alice exploiting Bob". However, real games are not so symmetric, and so it will not be so obvious what "fair" is. The purple squiggle highlights the Pareto frontier -- the space of outcomes which are "efficient" in the sense that no alternative is purely better for everybody. These outcomes may not all be fair, but they all have the advantage that no "money is left on the table" -- any "improvement" we could propose for those outcomes makes things worse for at least one person.

Notice that I've also colored areas where Bob and Alice are doing worse than payoff 1. Bob can't enforce Alice's cooperation while defecting more than half the time; Alice would just defect. And vice versa. All of the points within the shaded regions have this property. So not all Pareto-optimal solutions can be enforced.

Any point in the white region can be enforced, however. Each player could be watching the statistics of the other player's cooperation, prepared to pull a grim-trigger if the statistics ever stray too far from the target point. This includes so-called mutual blackmail equilibria, in which both players cooperate with probability slightly better than zero (while threatening to never cooperate at all if the other player detectably diverges from that frequency). This idea -- that 'almost any' outcome can be enforced -- is known as the Folk Theorem in game theory.

The Battle of the Sexes part is that (particularly with grim-trigger enforcement) everyone has to choose the same equilibrium to enforce; otherwise everyone is stuck playing defect. You'd rather be in even a bad mutual-blackmail type equilibrium, as opposed to selecting incompatible points to enforce. Just like, in Battle of the Sexes, you'd prefer to meet together at any venue rather than end up at different places.

Furthermore, I would claim that most apparent Stag Hunts which you encounter in real life are actually battle-of-the-sexes, in the sense that there are many different stags to hunt and it isn't immediately clear which one should be hunted. Each stag will be differently appealing to different people, so it's difficult to establish common knowledge about which one is worth going after together.

Exercize: what stags aren't you hunting with the people around you?

Taking Pareto Improvements

Fortunately, Grim Trigger is not the only enforcement mechanism which can be used to build an equilibrium. Grim Trigger creates a crisis in which you've got to guess which equilibrium you're in very quickly, to avoid angering the other player; and no experimentation is allowed. There are much more forgiving strategies (and contrite ones, too, which helps in a different way).

Actually, even using Grim Trigger to enforce things, why would you punish the other player for doing something better for you? There's no motive for punishing the other player for raising their cooperation frequency.

In a scenario where you don't know which Grim Trigger the other player is using, but you don't think they'll punish you for cooperating more than the target, a natural response is for both players to just cooperate a bunch.

So, it can be very valuable to use enforcement mechanisms which allow for Pareto improvements.

Taking Pareto improvements is about moving from the middle to the boundary:

(I've indicated the directions for Pareto improvements starting from the origin in yellow, as well as what happens in other directions; also, I drew a bunch of example Pareto improvements as black arrows to illustrate how Pareto improvements are awesome. Some of the black arrows might not be perfectly within the range of Pareto improvements, sorry about that.)

However, there's also an argument against taking Pareto improvements. If you accept any Pareto improvements, you can be exploited in the sense mentioned earlier -- you'll accept any situation, so long as it's not worse for you than where you started. So you will take some pretty poor deals. Notice that one Pareto improvement can prevent a different one -- for example, if you move to (1/2, 1), then you can't move to (1,1/2) via Pareto improvement. So you could always reject a Pareto improvement because you're holding out for a better deal. (This is the Battle of the Sexes aspect of the situation -- there are Pareto-optimal outcomes which are better or worse for different people, so, it's hard to agree on which improvement to take.)

That's where Cooperation between Agents with Different Notions of Fairness comes in. The idea in that post is that you don't take just any Pareto improvement -- you have standards of fairness -- but you don't just completely defect for less-than-perfectly-fair deals, either. What this means is that two such agents with incompatible notions of fairness can't get all the way to the Pareto frontier, but the closer their notions of fairness are to each other, the closer they can get. And, if the notions of fairness are compatible, they can get all the way.

Lessons in Slaying Moloch

0. I didn't even address this in this essay, but it's worth mentioning: not all conflicts are zero-sum. In the introduction to the 1980 edition of The Strategy of Conflict, Thomas Schelling discusses the reception of the book. He recalls that a prominent political theorist "exclaimed how much this book had done for his thinking, and as he talked with enthusiasm I tried to guess which of my sophisticated ideas in which chapters had made so much difference to him. It turned out it wasn't any particular idea in any particular chapter. Until he read this book, he had simply not comprehended that an inherently non-zero-sum conflict could exist."

1. In situations such as iterated games, there's no in-principle pull toward defection. Prisoner's Dilemma seems paradoxical when we first learn of it (at least, it seemed so to me) because we are not accustomed to such a harsh divide between individual incentives and the common good. But perhaps, as Sarah Constantine speculated in Don't Shoot the Messenger, modern game theory and economics have conditioned us to be used to this conflict due to their emphasis on single-shot interactions. As a result, Moloch comes to sound like an inevitable gravity, pulling everything downwards. This is not necessarily the case.

2. Instead, most collective action problems are bargaining problems. If a solution can be agreed upon, we can generally use weak enforcement mechanisms (social norms) or strong enforcement (centralized governmental enforcement) to carry it out. But, agreeing about the solution may not be easy. The more parties involved, the more difficult.

3. Try to keep a path open toward better solutions. Since wide adoption of a particular solution can be such an important problem, there's a tendency to treat alternative solutions as the enemy. This bars the way to further progress. (One could loosely characterize this as the difference between religious doctrine and democratic law; religious doctrine trades away the ability to improve in favor of the more powerful consensus-reaching technology of immutable universal law. But of course this oversimplifies things somewhat.) Keeping a path open for improvements is hard, partly because it can create exploitability. But it keeps us from getting stuck in a poor equilibrium.


Comparing Utilities

14 сентября, 2020 - 23:56
Published on September 14, 2020 8:56 PM GMT

(This is a basic point about utility theory which many will already be familiar with. I draw some non-obvious conclusions which may be of interest to you even if you think you know this from the title -- but the main point is to communicate the basics. I'm posting it to the alignment forum because I've heard misunderstandings of this from some in the AI alignment research community.)

I will first give the basic argument that the utility quantities of different agents aren't directly comparable, and a few important consequences of this. I'll then spend the rest of the post discussing what to do when you need to compare utility functions.

Utilities aren't comparable.

Utility isn't an ordinary quantity. A utility function is a device for expressing the preferences of an agent.

Suppose we have a notion of outcome.* We could try to represent the agent's preferences between outcomes as an ordering relation: if we have outcomes A, B, and C, then one possible preference would be A<B<C.

However, a mere ordering does not tell us how the agent would decide between gambles, ie, situations giving A, B, and C with some probability.

With just three outcomes, there is only one thing we need to know: is B closer to A or C, and by how much?

We want to construct a utility function U() which represents the preferences. Let's say we set U(A)=0 and U(C)=1. Then we can represent B=G as U(B)=1/2. If not, we would look for a different gamble which does equal B, and then set B's utility to the expected value of that gamble. By assigning real-numbered values to each outcome, we can fully represent an agent's preferences over gambles. (Assuming the VNM axioms hold, that is.)

But the initial choices U(A)=0 and U(C)=1 were arbitrary! We could have chosen any numbers so long as U(A)<U(C), reflecting the preference A<C. In general, a valid representation of our preferences U() can be modified into an equally valid U'() by adding/subtracting arbitrary numbers, or multiplying/dividing by positive numbers.

So it's just as valid to say someone's expected utility in a given situation is 5 or -40, provided you shift everything else around appropriately.

Writing ≈.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  to mean that two utility functions represent the same preferences, what we have in general is: U1(x)≈U2(x) if and only if U1(x)=aU2+b. (I'll call a the multiplicative constant and b the additive constant.)

This means that we can't directly compare the utility of two different agents. Notions of fairness should not directly say "everyone should have the same expected utility". Utilitarian ethics cannot directly maximize the sum of everyone's utility. Both of these operations should be thought of as a type error.

Some non-obvious consequences.

The game-theory term "zero sum" is a misnomer. You shouldn't directly think about the sum of the utilities.

In mechanism design, exchangeable utility is a useful assumption which is often needed in order to get nice results. The idea is that agents can give utils to each other, perhaps to compensate for unfair outcomes. This is kind of like assuming there's money which can be exchanged between agents. However, the non-comparability of utility should make this seem really weird. (There are also other disanalogies with money; for example, utility is closer to logarithmic in money, not linear.)

This could (should?) also make you suspicious of talk of "average utilitarianism" and "total utilitarianism". However, beware: only one kind of "utilitarianism" holds that the term "utility" in decision theory means the same thing as "utility" in ethics: namely, preference utilitarianism. Other kinds of utilitarianism can distinguish between these two types of utility. (For example, one can be a hedonic utilitarian without thinking that what everyone wants is happiness, if one isn't a preference utilitarian.)

Similarly, for preference utilitarians, talk of utility monsters becomes questionable. A utility monster is, supposedly, someone who gets much more utility out of resources than everyone else. For a hedonic utilitarian, it would be someone who experiences much deeper sadness and much higher heights of happiness. This person supposedly merits more resources than other people.

For a preference utilitarian, incomparability of utility means we can't simply posit such a utility monster. It's meaningless a priori to say that one person simply has much stronger preferences than another (in the utility function sense).

All that being said, we can actually compare utilities, sum them, exchange utility between agents, define utility monsters, and so on. We just need more information.

Comparing utilities.

The incomparability of utility functions doesn't mean we can't trade off between the utilities of different people.

I've heard the non-comparability of utility functions summarized as the thesis that we can't say anything meaningful about the relative value of one person's suffering vs another person's convenience. Not so! Rather, the point is just that we need more assumptions in order to say anything. The utility functions alone aren't enough.

Pareto-Optimality: The Minimal Standard

Comparing utility functions suggests putting them all onto one scale, such that we can trade off between them -- "this dollar does more good for Alice than it does for Bob". We formalize this by imagining that we have to decide policy for the whole group of people we're considering (e.g., the whole world). We consider a social choice function which would make those decisions on behalf of everyone. Supposing it is VNM rational, its decisions must be comprehensible in terms of a utility function, too. So the problem reduces to combining a bunch of individual utility functions, to get one big one.

So, how do we go about combining the preferences of many agents into one?

The first and most important concept is the pareto improvement: our social choice function should endorse changes which benefit someone and harm no one. An option which allows no such improvements is said to be Pareto-optimal.

We might also want to consider strict Pareto improvements: a change which benefits everyone. (An option which allows no strict Pareto improvements is weakly Pareto-optimal.) Strict Pareto improvements can be more relevant in a bargaining context, where you need to give everyone something in order to get them on board with a proposal -- otherwise they may judge the improvement as unfairly favoring others. However, in a bargaining context, individuals may refuse even a strict Pareto improvement due to fairness considerations.

In either case, a version of Harsanyi's utilitarianism Theorem implies that the utility of our social choice function can be understood as some linear combination of the individual utility functions.

So, pareto-optimal social choice functions can always be understood by:

  1. Choosing a scale for everyone's utility function -- IE, set the multiplicative constant. (If the social choice function is only weakly Pareto optimal, some of the multiplicative constants might turn out to be zero, totally cancelling out someone's involvement. Otherwise, they can all be positive.)
  2. Adding all of them together.

(Note that the additive constant doesn't matter -- shifting a person's utility function up or down doesn't change what decisions will be endorsed by the sum. However, it will matter for some other ways to combine utility functions.)

This is nice, because we can always combine everything linearly! We just have to set things to the right scale and then sum everything up.

However, it's far from the end of the story. How do we choose multiplicative constants for everybody?

Variance Normalization: Not Too Exploitable?

We could set the constants any way we want... totally subjective estimates of the worth of a person, draw random lots, etc. But we do typically want to represent some notion of fairness. We said in the beginning that the problem was, a utility function U(x) has many equivalent representations aU(x)+b. We can address this as a problem of normalization: we want to take a U and put it into a canonical form, getting rid of the choice between equivalent representations.

One way of thinking about this is strategy-proofness. A utilitarian collective should not be vulnerable to members strategically claiming that their preferences are stronger (larger b), or that they should get more because they're worse off than everyone (smaller a -- although, remember that we haven't talked about any setup which actually cares about that, yet).

Warm-Up: Range Normalization

Unfortunately, some obvious ways to normalize utility functions are not going to be strategy-proof.

One of the simplest normalization techniques is to squish everything into a specified range, such as [0,1]:

This is analogous to range voting: everyone reports their preferences for different outcomes on a fixed scale, and these all get summed together in order to make decisions.

If you're an agent in a collective which uses range normalization, then you may want to strategically mis-report your preferences. In the example shown, the agent has a big hump around outcomes they like, and a small hump on a secondary "just OK" outcome. The agent might want to get rid of the second hump, forcing the group outcome into the more favored region.

I believe that in the extreme, the optimal strategy for range voting is to choose some utility threshold. Anything below that threshold goes to zero, feigning maximal disapproval of the outcome. Anything above the threshold goes to one, feigning maximal approval. In other words, under strategic voting, range voting becomes approval voting (range voting where the only options are zero and one).

If it's not possible to mis-report your preferences, then the incentive becomes to self-modify to literally have these extreme preferences. This could perhaps have a real-life analogue in political outrage and black-and-white thinking. If we use this normalization scheme, that's the closest you can get to being a utility monster.

Variance Normalization

We'd like to avoid any incentive to misrepresent/modify your utility function. Is there a way to achieve that?

Owen Cotton-Barratt discusses different normalization techniques in illuminating detail, and argues for variance normalization: divide utility functions by their variance, making the variance one. (Geometric reasons for normalizing variance to aggregate preferences, O Cotton-Barratt, 2013.) Variance normalization is strategy-proof under the assumption that everyone participating in an election shares beliefs about how probable the different outcomes are! (Note that variance of utility is only well-defined under some assumption about probability of outcome.) That's pretty good. It's probably the best we can get, in terms of strategy-proofness of voting. Will MacAskill also argues for variance normalization in the context of normative uncertainty (Normative Uncertainty, Will MacAskill, 2014).

Intuitively, variance normalization directly addresses the issue we encountered with range normalization: an individual attempts to make their preferences "loud" by extremizing everything to 0 or 1. This increases variance, so, is directly punished by variance normalization.

However, Jameson Quinn, LessWrong's resident voting theory expert, has warned me rather strongly about variance normalization.

  1. The assumption of shared beliefs about election outcomes is far from true in practice. Jameson Quinn tells me that, in fact, the strategic voting incentivized by quadratic voting is particularly bad amongst normalization techniques.
  2. Strategy-proofness isn't, after all, the final arbiter of the quality of a voting method. The final arbiter should be something like the utilitarian quality of an election's outcome. This question gets a bit weird and recursive in the current context, where I'm using elections as an analogy to ask how we should define utilitarian outcomes. But the point still, to some extent, stands.

I didn't understand the full justification behind his point, but I came away thinking that range normalization was probably better in practice. After all, it reduces to approval voting, which is actually a pretty good form of voting. But if you want to do the best we can with the state of voting theory, Jameson Quinn suggested 3-2-1 voting. (I don't think 3-2-1 voting gives us any nice theory about how to combine utility functions, though, so it isn't so useful for our purposes.)

Open Question: Is there a variant of variance normalization which takes differing beliefs into account, to achieve strategy-proofness (IE honest reporting of utility)?

Anyway, so much for normalization techniques. These techniques ignore the broader context. They attempt to be fair and even-handed in the way we choose the multiplicative and additive constants. But we could also explicitly try to be fair and even-handed in the way we choose between Pareto-optimal outcomes, as with this next technique.

Nash Bargaining Solution

It's important to remember that the Nash bargaining solution is a solution to the Nash bargaining problem, which isn't quite our problem here. But I'm going to gloss over that. Just imagine that we're setting the social choice function through a massive negotiation, so that we can apply bargaining theory.

Nash offers a very simple solution, which I'll get to in a minute. But first, a few words on how this solution is derived. Nash provides two seperate justifications for his solution. The first is a game-theoretic derivation of the solution as an especially robust Nash equilibrium. I won't detail that here; I quite recommend his original paper (The Bargaining Problem, 1950); but, just keep in mind that there is at least some reason to expect selfishly rational agents to hit upon this particular solution. The second, unrelated justification is an axiomatic one:

  1. Invariance to equivalent utility functions. This is the same motivation I gave when discussing normalization.
  2. Pareto optimality. We've already discussed this as well.
  3. Independence of Irrelevant Alternatives (IIA). This says that we shouldn't change the outcome of bargaining by removing options which won't ultimately get chosen anyway. This isn't even technically one of the VNM axioms, but it essentially is -- the VNM axioms are posed for binary preferences (a > b). IIA is the assumption we need to break down multi-choice preferences to binary choices. We can justify IIA with a kind of money pump.
  4. Symmetry. This says that the outcome doesn't depend on the order of the bargainers; we don't prefer Player 1 in case of a tie, or anything like that.

Nash proved that the only way to meet these four criteria is to maximize the product of gains from cooperation. More formally, choose the outcome x which maximizes:


The d here is a "status quo" outcome. You can think of this as what happens if the bargaining fails. This is sometimes called a "threat point", since strategic players should carefully set what they do if negotiation fails so as to maximize their bargaining position. However, you might also want to rule that out, forcing d to be a Nash equilibrium in the hypothetical game where there is no bargaining opportunity. As such, d is also known as the best alternative to negotiated agreement (BATNA), or sometimes the "disagreement point" (since it's what players get if they can't agree). We can think of subtracting out U(d) as just a way of adjusting the additive constant, in which case we really are just maximizing the product of utilities. (The BATNA point is always (0,0) after we subtract out things that way.)

The Nash solution differs significantly from the other solutions considered so far.

  1. Maximize the product?? Didn't Harsanyi's theorem guarantee we only need to worry about sums?
  2. This is the first proposal where the additive constants matter. Indeed, now the multiplicative constants are the ones that don't matter!
  3. Why wouldn't any utility-normalization approach satisfy those four axioms?

Last question first: how do normalization approaches violate the Nash axioms?

Well, both range normalization and variance normalization violate IIA! If you remove one of the possible outcomes, the normalization may change. This makes the social choice function display inconsistent preferences across different scenarios. (But how bad is that, really?)

As for why we can get away with maximizing the product, rather than the sum:

The Pareto-optimality of Nash's approach guarantees that it can be seen as maximizing a linear function of the individual utilities. So Harsanyi's theorem is still satisfied. However, Nash's solution points to a very specific outcome, which Harsanyi doesn't do for us.

Imagine you and me are trying to split a dollar. If we can't agree on how to split it, then we'll end up destroying it (ripping it during a desperate attempt to wrestle it from each other's hands, obviously). Thankfully, John Nash is standing by, and we each agree to respect his judgement. No matter which of us claims to value the dollar more, Nash will allocate 50 cents to each of us.

Harsanyi happens to see this exchange, and explains that Nash has chosen a social choice function which normalized our utility functions to be equal to each other. That's the only way Harsanyi can explain the choice made by Nash -- the value of the dollar was precisely tied between you and me, so a 50-50 split was as good as any other outcome. Harsanyi's justification is indeed consistent with the observation. But why, then, did Nash choose 50-50 precisely? 49-51 would have had exactly the same collective utility, as would 40-60, or any other split!

Hence, Nash's principle is far more useful than Harsanyi's, even though Harsanyi can justify any rational outcome retrospectively.

However, Nash does rely somewhat on that pesky IIA assumption, whose importance is perhaps not so clear. Let's try getting rid of that.


Although the Nash bargaining solution is the most famous, there are other proposed solutions to Nash's bargaining problem. I want to mention just one more, Kalai-Smorodinsky (I'll call it KS).

KS throws out IIA as irrelevant. After all, the set of alternatives will affect bargaining. Even in the Nash solution, the set of alternatives may have an influence by changing the BATNA! So perhaps this assumption isn't so important.

KS instead adds a monotonicity assumption: being in a better position should never make me worse off after bargaining.

Here's an illustration, due to Daniel Demski, of a case where Nash bargaining fails monotonicity:

I'm not that sure monotonicity really should be an axiom, but it does kind of suck to be in an apparently better position and end up worse off for it. Maybe we could relate this to strategy-proofness? A little? Not sure about that.

Let's look at the formula for KS bargaining. 

Suppose there are a couple of dollars on the ground: one which you'll walk by first, and one which I'll walk by. If you pick up your dollar, you can keep it. If I pick up my dollar, I can keep mine. But also, if you don't pick up yours, then I'll eventually walk by it and can pick it up. So we get the following:

(The box is filled in because we can also use mixed strategies to get values intermediate between any pure strategies.)

Obviously in the real world we just both pick up our dollars. But, let's suppose we bargain about it, just for fun.

The way KS works is, you look at the maximum one player can get (you can get $1), and the maximum the other player could get (I can get $2). Then, although we can't usually jointly achieve those payoffs (I can't get $2 at the same time as you get $1), KS bargaining insists we achieve the same ratio (I should get twice as much as you). In this case, that means I get $1.33, while you get $0.66. We can visualize this as drawing a bounding box around the feasible solutions, and drawing a diagonal line. Here's the Nash and KS solutions side by side:

As in Daniel's illustrations, we can visualize maximizing the product as drawing the largest hyperbola we can that still touches the orange shape. (Orange dotted line.) This suggests that we each get $1; exactly the same solution as Nash would give for splitting $2. (The black dotted line illustrates how we'd continue the feasible region to represent a dollar-splitting game, getting the full triangle rather than a chopped off portion.) Nash doesn't care that one of us can do better than the other; it just looks for the most equal division of funds possible, since that's how we maximize the product.

KS, on the other hand, cares what the max possible is for both of us. It therefore suggests that you give up some of your dollar to me.

I suspect most readers will not find the KS solution to be more intuitively appealing?

Note that the KS monotonicity property does NOT imply the desirable-sounding property "if there are more opportunities for good outcomes, everyone gets more or is at least not worse off." (I mention this mainly because I initially misinterpreted KS's monotonicity property this way.) In my dollar-collecting example, KS bargaining makes you worse off simply because there's an opportunity for me to take your dollar if you don't. 

Like Nash bargaining, KS bargaining ignores multiplicative constants on utility functions, and can be seen as normalizing additive constants by treating d as (0,0). (Note that, in the illustration, I assumed d is chosen as (minimal achievable for one player, minimal achievable for the other). this need not be the case in general.)

A peculiar aspect of KS bargaining is that it doesn't really give us an obvious quantity to maximize, unlike Nash or Harsanyi. It only describes the optimal point. This seems far less practical, for realistic decision-making.

OK, so, should we use bargaining solutions to compare utilities?

My intuition is that, because of the need to choose the BATNA point d, bargaining solutions end up rewarding destructive threats in a disturbing way. For example, suppose that we are playing the dollar-splitting game again, except that I can costlessly destroy $20 of your money, so d now involves both the destruction of the $1, and the destruction of $20. Nash bargaining now hands the entire dollar to me, because you are "up $20" in that deal, so the fairest possible outcome is to give me the $1. KS bargaining splits things up a little, but I still get most of the dollar.

If utilitarians were to trade off utilities that way in the real world, it would benefit powerful people, especially those willing to exploit their power to make credible threats. If X can take everything away from Y, then Nash bargaining sees everything Y has as already counting toward "gains from trade".

As I mentioned before, sometimes people try to define BATNAs in a way which excludes these kinds of threats. However, I see this as ripe for strategic utility-spoofing (IE, lying about your preferences, or self-modifying to have more advantageous preferences).

So, this might favor normalization approaches.

On the other hand, Nash and KS both do way better in the split-the-dollar game than any normalization technique, because they can optimize for fairness of outcome, rather than just fairness of multiplicative constants chosen to compare utility functions with.

Is there any approach which combines the advantages of bargaining and normalization??

Animals, etc.

An essay on utility comparison would be incomplete without at least mentioning the problem of animals, plants, and so on.

  • Option one: some cutoff for "moral patients" is defined, such that a utilitarian only considers preferences of agents who exceed the cutoff.
  • Option two: some more continuous notion is selected, such that we care more about some organisms than others.

Option two tends to be more appealing to me, despite the non-egalitarian implications (e.g., if animals differ on this spectrum, than humans could have some variation as well). 

As already discussed, bargaining approaches do seem to have this feature: animals would tend to get less consideration, because they've got less "bargaining power" (they can do less harm to humans than humans can do to them). However, this has a distasteful might-makes-right flavor to it.

This also brings to the forefront the question of how we view something as an agent. Something like a plant might have quite deterministic ways of reacting to environmental stimulus. Can we view it as making choices, and thus, as having preferences? Perhaps "to some degree" -- if such a degree could be defined, numerically, it could factor into utility comparisons, giving a formal way of valuing plants and animals somewhat, but "not too much".

Altruistic agents.

Another puzzling case, which I think needs to be handled carefully, is accounting for the preferences of altruistic agents.

Let's proceed with a simplistic model where agents have "personal preferences" (preferences which just have to do with themselves, in some sense) and "cofrences" (co-preferences; preferences having to do with other agents).

Here's an agent named Sandy:

SandyPersonal PreferencesCofrencesCandy+.1Alice+.1Pizza+.2Bob-.2Rainbows+10Cathy+.3Kittens-20Dennis+.4

The cofrences represent coefficients on other agent's utility functions. Sandy's preferences are supposed to be understood as a utility function representing Sandy's personal preferences, plus a weighted sum of the utility functions of Alice, Bob, Cathy, and Dennis. (Note that the weights can, hypothetically, be negative -- for example, screw Bob.)

The first problem is that utility functions are not comparable, so we have to say more before we can understand what "weighted sum" is supposed to mean. But suppose we've chosen some utility normalization technique. There are still other problems.

Notice that we can't totally define Sandy's utility function until we've defined Alice's, Bob's, Cathy's, and Dennis'. But any of those four might have cofrences which involve Sandy, as well!

Suppose we have Avery and Briar, two lovers who "only care about each other" -- their only preference is a cofrence, which places 1.0 value on the other's utility function. We could ascribe any values at all to them, so long as they're both the same!

With some technical assumptions (something along the lines of: your cofrences always sum to less than 1), we can ensure a unique fixed point, eliminating any ambiguity from the interpretation of cofrences. However, I'm skeptical of just taking the fixed point here.

Suppose we have five siblings: Primus, Secundus, Tertius, Quartus, et Quintus. All of them value each other at .1, except Primus, who values all siblings at .2.

If we simply take the fixed point, Primus is going to get the short end of the stick all the time: because Primus cares about everyone else more, everyone else cares about Primus' personal preferences less than anyone else's.

Simply put, I don't think more altruistic individuals should be punished! In this setup, the "utility monster" is the perfectly selfish individual. Altruists will be scrambling to help this person while the selfish person does nothing in return.

A different way to do things is to interpret cofrences as integrating only the personal preferences of the other person. So Sandy wants to help Alice, Cathy, and Dennis (and harm Bob), but does not automatically extend that to wanting to help any of their friends (or harm Bob's friends).

This is a little weird, but gives us a more intuitive outcome in the case of the five siblings: Primus will more often be voluntarily helpful to the other siblings, but the other siblings won't be prejudice against the personal preferences of Primus when weighing between their various siblings.

I realize altruism isn't exactly supposed to be like a bargain struck between selfish agents. But if I think of utilitarianism like a coalition of all agents, then I don't want it to punish the (selfish component of) the most altruistic members. It seems like utilitarianism should have better incentives than that?

(Try to take this section as more of a problem statement and less of a solution. Note that the concept of cofrence can include, more generally, preferences such as "I want to be better off than other people" or "I don't want my utility to be too different from other people's in either direction".)

Utility monsters.

Returning to some of the points I raised in the "non-obvious consequences" section -- now we can see how "utility monsters" are/aren't a concern.

On my analysis, a utility monster is just an agent who, according to your metric for comparing utility functions, has a very large influence on the social choice function.

This might be a bug, in which case you should reconsider how you are comparing utilities. But, since you've hopefully chosen your approach carefully, it could also not be a bug. In that case, you'd want to bite the bullet fully, defending the claim that such an agent should receive "disproportionate" consideration. Presumably this claim could be backed up, on the strength of your argument for the utility-comparison approach.

Average utilitarianism vs total utilitarianism. 

Now that we have given some options for utility comparison, can we use them to make sense of the distinction between average utilitarianism and total utilitarianism?

No. Utility comparison doesn't really help us there.

The average vs total debate is a debate about population ethics. Harsanyi's utilitarianism theorem and related approaches let us think about altruistic policies for a fixed set of agents. They don't tell us how to think about a set which changes over time, as new agents come into existence.

Allowing the set to vary over time like this feels similar to allowing a single agent to change its utility function. There is no rule against this. An agent can prefer to have different preferences than it does. A collective of agents can prefer to extend its altruism to new agents who come into existence.

However, I see no reason why population ethics needs to be simple. We can have relatively complex preferences here. So, I don't find paradoxes such as the Repugnant Conclusion to be especially concerning. To me there's just this complicated question about what everyone collectively wants for the future.

One of the basic questions about utilitarianism shouldn't be "average vs total?". To me, this is a type error. It seems to me, more basic questions for a (preference) utilitarian are:

  • How do you combine individual preferences into a collective utility function?
    • How do you compare utilities between people (and animals, etc)?
      • Do you care about an "objective" solution to this, or do you see it as a subjective aspect of altruistic preferences, which can be set in an unprincipled way?
      • Do you range-normalize?
      • Do you variance-normalize?
      • Do you care about strategy-proofness?
      • How do you evaluate the bargaining framing? Is it relevant, or irrelevant?
      • Do you care about Nash's axioms?
      • Do you care about monotonicity?
      • What distinguishes humans from animals and plants, and how do you use it in utility comparison? Intelligence? Agenticness? Power? Bargaining position?
    • How do you handle cofrences?


*: Agents need not have a concept of outcome, in which case they don't really have a utility function (because utility functions are functions of outcomes). However, this does not significantly impact any of the points made in this post.


If Starship works, how much would it cost to create a system of rotable space mirrors that reduces temperatures on earth by 1° C?

14 сентября, 2020 - 21:46
Published on September 14, 2020 6:46 PM GMT

There are many proposed geoengineering solutions that could reduce temperatures on earth. Unfortunately, a lot of them have a mix of side effects and lock-in effects where the changes in temperature come years after deployment which creates risk of unintended consequences.

If we would have a constellation of space mirrors that can be rotated as we desire to let in less or more sunlight, how much would it cost to bring up enough to reduce temperatures on earth by an average of  1° C? Let's say that Elon's promise of being able to lunch a Starship that brings up 100,000kg for 1,000,000$ works out, what would it cost to produce and deploy those mirrors? 


Outcome Terminology?

14 сентября, 2020 - 21:04
Published on September 14, 2020 6:04 PM GMT

I'm writing a post about S-risks, and I need access to some clean, established terminology/background material for discussing AI-based long-term outcomes for humanity.

My current (very limited) vocabulary can be summarized with the following categories: 

  1. Outcomes which are roughly maximally bad: Hyperexistential risk/S-risk/Unfriendly AI/Existential risk
  2. Outcomes which are nontrivially worse than paperclipping-equivalents but better than approximate minimization of human utility: Hyperexistential risk/S-risk/Unfriendly AI/Existential risk
  3. Outcomes which are produced by agents essentially orthogonal to human values: Paperclipping/Unfriendly AI/Existential risk
  4. Outcomes which are nontrivially better than paperclipping but worse than Friendly AI: ???
  5. Outcomes which are roughly maximally good: Friendly AI

The problems are manifold: 

  • I haven't read any discussion which specifically addresses parts 1 or 2. I have read general discussion of parts 1 and 2 combined under the names of "Outcomes worse than death", "Hyperexistential risk", "S-risk", etc.
  • My current terminology overlaps too strongly to use to uniquely identify outcomes 1 and 2.
  • I have no terminology or background information for outcome 4.

I've done a small amount of investigation and determined less brainpower would be wasted by just asking for links.


On Niceness: Looking for Positive Externalities

14 сентября, 2020 - 21:03
Published on September 14, 2020 6:03 PM GMT

One of the most useful concepts I’ve learned from economics is the idea of an externality: the consequences of your actions on other people. This is important because, intuitively, humans are self-centred, and it’s easy to not notice the effects your actions have on others. And it almost never feels as visceral as the costs and benefits to yourself. The canonical examples are coordination problems, like climate change. Taking a plane flight has strong benefits to me, but costs everyone on Earth a little bit, a negative externality. And a lot of the problems in the world today boil down to coordination problems where our actions have negative externalities.

But, for this post, I don’t care about any of that. The important part is that externalities introduce a bias. And once you’ve noticed a bias, something that is preventing you from taking the best actions, you can correct for it! And a much more interesting bias is a bias away from positive externalities.

With my Effective Altruism hat on, the obvious positive externalities are the good your actions can do for the countless unknown strangers in need. And this is an extremely important way to correct for this bias. But for this post I want to put my ineffective altruism hat on, and talk about something more fun! The local positive externalities - being nice to the people around you. Where by niceness, I don’t mean nonsense like virtue signalling, I mean taking actions that make the people around you happier, and making their lives better.

I think we have systematic biases against being nice to our friends and those close to us, because being nice is, fundamentally, a positive externality. Being nice to people is obviously great. I think it’s intrinsically good to help the people I care about. And there’s a lot of selfish benefits to me! People are more likely to do you favours, people like you more, it’s fun to help people, you have a better reputation, etc.

Yet, in practice, most people approach niceness in a very intuitive way. Doing nice things when the idea occurs to them, in a very local, unplanned way. But, as with all things that matter in life, niceness can be optimised for. A really significant life upgrade for me was realising this, and trying to introduce a deliberate bias in favour of niceness. If I ever have anything I care about, I try to figure out how I can achieve it while also being nice to the people around me. And this is such a strong systematic bias that often this helps me achieve my original goal better! And anything that can help me find win-win situations is valuable, and to be cherished and cultivated.

Further, I think it’s important to notice the strongest biases I have against niceness. One of the most glaring, is that humans (and especially me) are loss averse. There are many actions I can take which gives high upside for somebody else, with small downside risk. Eg, recommended that somebody apply for a job, or talk to a specific person - this could be amazing, and worst case it mildly annoys them. But it’s easy to fixate on this worst case scenario, and avoid ever taking action. And I think this bias systematically holds you back from being as good a friend as you can be.

And I think niceness often emerges from your self-image. It’s easy to say “I’m not the kind of person who’s nice to other people - it feels weak and sappy”. And if your self-image holds you back from win-win situations, this is dumb and should be changed. My most effective path to this has been to get excited about niceness, and to make it a habit. Finding as many ways to shape my around it has made me more sensitive to opportunities for niceness,

This is all far easier said than done, so to hopefully provide some inspiration, here are a few of the ways I’ve applied this in practice:

  • Gratitude:
    • Gratitude and appreciation are awesome. Gratitude journals are pretty clearly shown to systematically increase happiness. By dwelling on what I value about my friends, I feel happier, and better appreciate great things about my life.
    • Further, hearing appreciation feels awesome! By expressing gratitude to people, I make myself feel better, and make them feel better. Yet people so rarely do this.
      • Note: Gratitude =/= flattery. It’s really important that it’s sincere, not performative
    • Techniques that have increased the amount of gratitude I feel and express:
      • Practice Noticing appreciation. And then complimenting somebody in the moment whenever I notice myself feeling positively towards them
        • This is great - it means I’m notably more pleasant to be around (based on social feedback), and makes me notice positive feelings much more
      • A stage in my weekly review: Go through all the interactions I had this week, and every time I notice a feeling of excitement or “I’m really glad that happened”, send that person a message thanking them, and explaining what I valued about it
      • Buying a box of 40 Christmas cards, making a list of my 40 closest friends, and writing them a card about what they mean to me, what I respect about them and how they’ve made my life better.
        • I think we rarely do things like this - longterm reflection on why we care about people, because there’s no real social convention that creates an obvious time to do it. And this is super dumb! When things are awesome win-wins, you should make your own social conventions, rather than avoiding them because they’re a bit weird, and there’s no obvious time to do it.
    • I think there’s also a skill of giving good compliments - the main thing to optimise for is signalling sincerity rather than ulterior motives
      • Be as specific as possible - if the other person is a bit insecure, it’s easy to deny a vague compliment, much harder
      • Make it clear that you don’t want anything from them, and don’t put them in an uncomfortable situation
        • I find it useful to have a next action queued up whenever I compliment somebody - it’s awkward figuring out how to react gracefully, and this removes that part from them
        • I like to give compliments eg at the end of an interaction, or in passing, and then leave shortly afterwards. Makes it clearer that it was for the sake of giving a compliment
      • Try to compliment things you think they’d value. Things people are underconfident about, and things they clearly put effort into are good sources.
        • Remember - the goal is to make them feel good, not to make yourself feel good. That’s just a convenient side-effect
    • While I’m on the topic, if you want an easy way to practice niceness, I find compliments extremely satisfying and motivating ;) And I try to ensure that there’s 0 downside risk to giving me compliments!
      • Especially specific compliments: about specific ideas that were insightful or useful in posts, and any specific ways these have changed how you thought or acted!
  • Teaching
    • I care a lot about learning and understanding complex ideas, and converting tacit knowledge into clear and precise concepts
    • One of the most successful ways to do this is by explaining it to other people!
      • This forces me to put things into words
      • This highlights the parts I don’t understand
      • Ideally, the student can ask insightful questions and help clarify my understanding
      • By putting complex details into a form I can convey, I have to extract out the most important parts, because it’s super annoying to just dictate course notes at somebody
    • This is also valuable, because this trains my skill of good communication and explanation - I’ve gotten dramatically better at this over time, and I currently consider it one of my key employable skills
    • This can be made actionable: If I’m learning something new, I find someone who’d be interested in the ideas, and arrange to teach it to them
      • Eg, a great way to revise a course is to teach it from scratch to a friend
      • This feels a bit weird to suggest, but people respond really well!
      • This even works with a peer doing the same courses as you - you each focus on different halves of a course, or two different courses, and teach your half to the other
  • Publishing resources
    • I am a very big fan of publishing resources that I’ve made
      • Putting things online is amazing - my talks each took on the order of 15 hours to write and plan, and total watch time is on the order of 10 times that. There’s amazing leverage
    • Given that I’ve already made the resource, this is basically a free win - others can benefit, I can get feedback, I feel happy that I’m helping people
      • It is way more satisfying to have made a set of notes that I think is genuinely good quality and something others value, than it is to just have a random PDF sitting on my hard drive that I’ll never look at again
    • Further - knowing that I’m going to, say, publish my notes holds me to a higher standard. It feels like I’m teaching the ideas to somebody else, I notice holes more, and I feel more motivated to find clearer explanations
      • At the cost of taking more time and effort!
  • Organising events
    • Committing to an event, like giving a talk, is an amazing motivator. I feel beholden to make it to a good standard, and this makes me a lot more focused and creative.
      • And, by making the event as awesome as possible, I get a lot of satisfaction out of making it exactly to my standards of what a good event should be - the feeling of autonomy.
    • I personally am pretty extroverted and get joy out of feeling like the centre of attention - organising events is an excellent way to satisfy this in a way that also adds value to others
    • On this note - I’ll be giving a remote talk on Machine Learning intuitions at 3:30pm GMT+1 on Friday 3rd July - all welcome!
  • Social initiative
    • I really value my friends, and especially spending quality one-on-one time together.
    • But it’s easy for this to just not happen, when there’s nothing there to prompt spontaneity, or to prompt me to organise something. And so there are a lot of people in my life who I value, but I never get round to speaking to - it never feels urgent. Eg people who live in other countries, and who I don’t run into by chance.
      • This is especially holds during social distancing! Everyone is distant.
    • The high-level point here, is that taking the social initiative is a form of emotional labour. It has benefits to both of you, but it’s hard, and it takes organisation and effort.
      • Fortunately, as with most hard things, this can be systematised!
        • Underlying point: The goal of niceness isn’t to be virtuous inside my head, it’s to make other people’s lives better. If I can achieve this without trying as hard, that’s amazing.
      • So I currently have a spreadsheet tracking all the people I value, and who I know enjoy spending time with me, and with reminders to regularly reach out to catch up. I’ve made it a habit to regularly check this spreadsheet and reach out, and I use calendly.com to take care of all of the scheduling with no mental effort from me.
      • This is a great win-win - I incur the emotional labour on myself of taking the social initiative, but by systematising it, it doesn’t actually take that much effort!
    • This applies similarly with meeting new people - it’s easy to meet somebody cool and then never remain in touch. And reaching out and suggesting meeting again is emotional labour. But friendships are a major mutually beneficial trade.
      • Well over half of my current strong friendships wouldn’t have happened if I didn’t make an effort to reconnect with people I met once and liked.
      • There’s much higher upside than downside with somebody new - a strong friendship can add value for the rest of our lives, an annoying message or mediocre meeting has a small, one-off cost. But my intuitions are very, very bad at realising this.
    • This applies all the more so to organising social events - I quite enjoy hosting low-effort parties, where I just invite a range of friends to my room on one evening, with no further planning required. This is pretty relaxed for me and creates a pleasant evening, and provides an event
      • Alas, this is much harder during social distancing, though I am a big fan of gather.town
    • Caveat: This one comes with more downside risk than most of my recommendations, and it’s important to be aware of this. I think the upside obviously outweighs this, but it’s good to minimise downside risk.
      • Give people outs, and make it clear that saying no, or ignoring messages is fine - I find it useful to send people a calendly.com link, because that leaves all of the agency with them.
      • Judging how much other people like me, and trying to only take the initiative with people where things feel mutual.
    • Caveat: I am a big fan of systems, and spreadsheets, but this is clearly not for everyone. I hope the high-level point stands, beyond the specific details of how I implement these ideas.
  • Recommendations
    • When I learn an interesting idea, or read an article, it takes 0 effort to think through friends who might enjoy it, and pass it on
      • In general - filtering for good content is hard, but I know my friends well, and can guess what they might enjoy
      • Even if I’m not sure they’d like it, it’s useful to pass things on - this helps me build better models of friends, and recommend better things in future!
      • This benefits me - I can hear more thoughts and perspectives on interesting ideas!
      • And this sets a norm that invites reciprocation!
    • This applies all the more so to bigger things - jobs worth applying to, other people they should talk to
      • There’s amazing upside risk of introducing somebody to somebody else, and incredibly low effort - I think this is plausibly some of the highest impact things I’ll ever do for improving my friends’ lives
    • In practice, I have a mental reflex where every time I see something interesting, I ask “who do I know who might enjoy/gain value from this?”
    • I find this hard to implement - I’m very conscious of bothering others. A useful hack: Mentally frame it as offering them an opportunity, which they are free to take or leave. Receiving opportunities has (essentially) 0 downside.
  • Overcoming the bystander effect
    • Bystander apathy is a really common and insiduous effect - there is something that everyone wants to happen but nobody wants to be the one to do it.
    • Often this happens to such a degree that the benefit just to me is enough to justify the effort.
    • Related to the idea of Actually Doing Things, I have found it useful to develop the reflex of noticing bystander apathy in my environment, and actively doing the thing. And this happens all of the time.
      • Eg, ask a question when there’s a confusing point in a talk
      • Eg, give somebody the bit of uncomfortable but vital feedback
      • Eg, notice tiny tragedies of the commons, like an empty jug of water that nobody wants to refill, and just do it.
      • Eg, notice when everyone feels uncomfortable being the first to, say, dance at a party, and just do it.

The theme of upside vs downside risk has kept recurring - this is a very important thing to bear in mind when trying to improve other people’s lives. Your goal is not to do what you think is best, it’s to help others. This includes respecting their preferences, and respecting their autonomy. It’s key that you listen to feedback, be open to the possibility that your actions are systematically unhelpful, and work to build better models of your friends and their preferences. In an ideal world I’d only take the actions that are net good, and avoid all of the ones that are net bad, but in a limited information world this is impossible. And empirically, actually trying far outweighs not trying at all. But you still want to get as net good as possible!

A final point: I think niceness often emerges from your self-image. It’s easy to say “I’m not the kind of person who’s nice to other people - it feels weak and sappy”. And if your self-image holds you back from win-win situations, this is dumb and should be changed. My most effective path to this has been to get excited about niceness, and to make it a habit. Finding as many ways to shape my life around it has made me more sensitive to opportunities for niceness, and easier to get over the resistance and to take action. It’s easy to agonise about

So, if any of those ideas resonated with you, but you feel some resistance - it doesn’t feel perfect, there is some way this could go wrong, it feels a bit weird, etc - don’t ask yourself “is this specific action a good idea”. Ask yourself “will taking this action bring me closer to the kind of person I want to be”

And if you need an extra incentive, a very accessible nice action would be telling me about anything you’ve done as a result of this post!


[Link] Five Years and One Week of Less Wrong

14 сентября, 2020 - 19:49
Published on September 14, 2020 4:49 PM GMT

This is a link post for Five Years and One Week of Less Wrong. I was surprised to see that it was never cross-posted to LW in the first place. I wanted it to be here so that I could put it under the new Intellectual Progress via LessWrong tag.

Some excerpts:

I wrote a post a while ago called Read History Of Philosophy Backwards. I theorized that as old ways of thinking got replaced by newer ways, eventually people forgot the old ways even existed or were even coherent positions people could hold. So instead of reading Hobbes to tell you that people can form governments for their common advantage – which you already know – read him to tell you that there was a time when no one believed this was true and governments were natural structures ordained by God.

It makes sense that over five hundred years, with births and deaths and so on, people would forget they ever held strange and incomprehensible positions. It’s more surprising that it would happen within the course of a single person’s philosophical development. But this is what I keep hearing from people in the Less Wrong community.

“I re-read the Sequences”, they tell me, “and everything in them seems so obvious. But I have this intense memory of considering them revelatory at the time.”

This is my memory as well.


So I thought it would be an interesting project, suitable for the lofty milestone of five years plus one week, to go back and try to figure out how far we have progressed without noticing that we were progressing.


It was around the switch to Less Wrong that someone first brought up the word “akrasia” (I think it was me, but I’m not sure). I remember there being a time when I was very confused and scandalized by the idea that people might engage in actions other than those rationally entailed by their beliefs. This seems really silly now, but at the time I remember the response was mostly positive and people upvoted me a lot and said things like “Huh, yeah, I guess people might engage in actions other than those rationally entailed by their beliefs! Weird! We should worry about this more!” For a while, we were really confused about this, and a really popular solution (WHICH I ALWAYS HATED) was to try to imagine the mind as being made up of multiple agents trying to strike a bargain. Like, your conscious mind was an agent, your unconscious mind was an agent, your sex drive was an agent, and so on. Ciphergoth was the first person to help us get out of this by bringing up hyperbolic discounting (there was a time Less Wrong didn’t know about hyperbolic discounting!)


It wasn’t until well into the Less Wrong era that our community started to become aware of the problems with the scientific process. This wasn’t because we were behind the times but because the field was quite new; Ioannides didn’t publish his landmark paper until 2005, and it languished in specialized circles until the Atlantic picked it up in 2010. But as early as December 2009, Allan Crossman working off a comment of Eliezer’s wrote Parapsychology: The Control Group For Science.


It continues to puzzle me that there was a time when I didn’t know what a Schelling point was. I imagine myself just sort of wandering through life, not having any idea what was going on or why.


I’ll end with something that recently encouraged me a lot. Sometimes I talk to Will Newsome, or Steve Rayhawk, or Jennifer RM, or people like that in the general category of “we all know they are very smart but they have no ability to communicate their insights to others”. They say inscrutable things, and I nod and pretend to understand because it’s less painful than asking them to explain and sitting through an equally inscrutable explanation. And recently, the things that Will and Steve and Jennifer were saying a couple of years ago have started making perfect sense to me. The things they’re saying now still sound like nonsense, but now I can be optimistic that in a few years I’ll pick up those too.


Free Money at PredictIt: 2020 General Election

14 сентября, 2020 - 17:40
Published on September 14, 2020 2:40 PM GMT

Previously: Free Money at PredictIt?

It’s time for another look at PredictIt. Is there free money? What are our best options for free money? 

The short answer is that there is free money if and only if you have available capital at PredictIt. 

There is no free money if your plan is to deposit to make the wager and then withdraw. That hits you with a 5% withdrawal fee, wiping out your profits. 

Let’s look at the major markets first, then scour for minor ones.

As with the last such post, despite the fact that we can’t discuss these prices without discussing the potential for a stolen election, let’s be clear: No advocacy for or against any candidate or party in the comments. Any such comments will be deleted reign-of-terror style. That’s not what this is about.

General Election 

Prices are where you would sell if you traded right away by hitting the bid.

Joe Biden 58

Donald Trump 44

Kamala Harris 3

Hillary Clinton 2

Mike Pence 1

That adds up to 108. You pay 10% on winnings. If you take the relative prices here at face value, you’d pay roughly 5.9 cents in fees, leaving a profit of 2.1 cents without need to tie up capital. 

You also get a freeroll to win all bets if none of those five win. That’s probably under a 1% shot but every little bit helps. It’s good to win the weird outcomes, and win them big. Note that many people are betting on candidates, including at other sites, and paying >100% combined for Biden and Trump. Not only does the house always win if they choose to balance their books, the house can sweep.

The labor and cognitive costs of doing this arb if you’re not already set up aren’t worth it as such on their own, but it’s good to note this is there. If you derive satisfaction from taking free money, I approve.

I’m not taking this because I already took it during the primary, and thus can’t take it again. 

What do I think of the baseline claim here, that considered as a two-way race Biden is 57% to win, or BetFair’s 54% for that same question? Note that the secondary candidates favor the Democrats in both cases, so the two-way races are more like 59% and 55% respectively for the Democratic side.

That depends on what exactly you mean by ‘win the election.’ The rules say ‘the winner of the presidential election’ but that really doesn’t clear it up this year. Whereas the BetFair rules say ‘next president.’ Which if we interpret literally makes Mike Pence at 175:1 a screaming buy! If Trump dies in office or resigns or otherwise leaves before ending his term, then Pence is the ‘next president’ without winning the election, and that’s definitely a >1% chance. It also makes it even better to sell the Democratic candidates, since they can win the election and lose the wager. Always read the rules carefully!

The real question for PredictIt is what happens in a disputed election where both Trump and Biden claim victory. Is it who ends up serving the term as president? Is it something else? I’d want to know.

The only way the odds here make sense is putting a substantial chance on an outright stolen or fraudulent election. That’s not electoral college versus popular vote split. That’s not ordinary standard vote suppression. Nor is vigorous litigation of close elections enough either. This would need to be Stalin-level asking of who counts the votes. Hacking voting machines, destroying or ignoring uncounted ballots, declaring victory in spite of the vote count and sending your own electors and stuff like that. 

It’s hard to properly price that. Without it, the 538 model’s current 76% for Biden doesn’t sound unreasonable to me. Is there more or less than a one in six chance of Trump successfully outright stealing the election? 

They don’t offer a market on that one. It’s the missing market, the most interesting one of all in many ways. I am a skeptic that the chances are anywhere near that high. Don’t get me wrong. The chances are still way way way too high, especially since there’s also the scenarios where he tries and fails and that’s no picnic either. I’m taking these scenarios seriously enough that I’m staying in Warwick with fully stocked up emergency supplies on election night, and only after a concession planning my return to New York City. 

It makes sense to take the arbitrage here rather than back a side. If you want to back a side, you can do so elsewhere at better prices.

Compare to the pure two-way market, which is 61-43, for the same implied price of 59% for democrats, but without an opportunity for Free Money. This is another way of illustrating that the true free money here is on the secondary candidates, especially Clinton. Selling Biden and Trump is only a way to then free up capital.

Presidential Popular Vote Margin of Victory

Again, prices are where you can sell these on demand.

Dem 10.5%+ 15

Dem 9-10.5% 10

Dem 7.5-9% 12

Dem 6-7.5% 12

Dem 4.5-6% 12

Dem 3-4.5% 11

Dem 1.5-3% 9

Dem 0-1.5% 7

GOP 0-1.5% 7

GOP 1.5-3% 4

GOP 3-4.5% 3

GOP 4.5-6% 2

GOP 6-7.5% 2

GOP 7.5-9% 1

GOP 9-10.5% 1

GOP 10.5%+ 3

That adds up to 111, so you can definitely take some free money if you’re interested, or you can try to be selective. Note that this curve centers nicely around Biden by about 6.5%. 

This roughly agrees with Trump’s 20% in 2020 Predictions | Will Trump win the popular vote in 2020?

If you look at the distribution, Democrats by 10.5%+ seems cheap. Following the slope on the other side, it’s clear this should be priced much higher. Presumably people think this is because there’s no way, there’s too much partisanship. I can’t agree. The bottom could easily fall out in any number of ways. That doesn’t mean that 15% is cheap, but I’m skeptical that this general level of variance and this median are right, and there’s only a 15% chance of a blowout. 538 agrees, and sees a 30% chance that Biden wins by 10 or more, whereas here you can get 9%+ for only 25%. 

Note also that a true Biden blowout isn’t that likely to be ‘brought back’ by fraud. If Biden wins by this much, the election can’t be stolen, so doing brazen things to reduce the apparent margin also seems not worthwhile. 

The GOP by 10.5%+ also sticks out. Why is that so big? To me this one makes relative sense, again on the outright fraud principle. There are two ways to steal an election. You can pull a (alleged) 1960, and not steal one more vote than you have to. Or you can pull a Stalin or Hussein, and claim a huge landslide not caring that no one believes you. I wouldn’t go out of my way to sell that possibility. If anything, the GOP by 6-9% seems decidedly less likely than 10.5%+, to me. Something very strange would have to happen to get things to move that much, so at that point, the broader range is a lot more attractive to me.

Thus, given how crazy this market could get later, and given I already tied up my funds, I’m going to take the arbitrage here, at least not yet. I might take it later, but for now I want to reserve the right to make a better play. 

Next question is, what does this market really say about Biden’s chances?

In a fair election, 538 thinks Biden has an 11% chance of winning the popular vote but losing the electoral college. If Biden was worse off generically, that number would only go slightly higher, say 13%. However, if the GOP decides to steal only in selected states, and only a 1960-style amount, or do things like halt counting before mail ballots are counted, that number could go much higher.  

This chart has a 55% chance (after normalization) for Biden to win by 4.5% or more, and an 18% chance of winning by 1.5% to 4.5%. I think it’s reasonable to say he wins almost all the times in the first bucket, and about half the time in the second bucket, so it’s implying 64% or so chance of victory. That’s substantially different from the presidential market, so either the two have diverged, or a bunch of probability mass is in ‘Trump brazenly steals only the tipping point states,’ or the evaluation here by the Federal Election Commission could go to Biden while Trump ‘wins the election’ anyway through theft. 

There are a lot of ways to play this.

Electoral College Margin of Victory

GOP 280+ 4

GOP 210-279 3

GOP 150-209 5

GOP 100-149 9

GOP 60-99 10

GOP 30-59 7

GOP 10-29 5

GOP 0-9 4

Dems 1-9 2

Dems 10-29 4

Dems 30-59 6

Dems 60-99 9

Dems 100-149 16

Dems 150-209 12

Dems 210-279 9

Dems 280+ 7

That adds to 110, which is again an opportunity for free money without tying up capital. It’s also, again, another chance to get money down on a side while expressing an additional opinion. The reason not to take this is in case you want to save it for later.

The Democratic wins add to 63, the Republican wins to 47. Thus, this predicts a 57-43 distribution, slightly better for the Republican side. 

The 538 model’s best prediction is Democrats by 122, which is right in the middle of the highest probability group, but not at the median of the market distribution.

Note that these groupings are not the same size. If we normalize for that, we see a very broad distribution of outcomes all seeming similarly likely. 

Timing of the Election Call

Hats off to PredictIt for the definition. If CNN and Fox News both call the election for the same person, that’s a pretty good proxy for it being over. That doesn’t mean Trump might not dispute it anyway (or even Biden) but it’s as good a definition as was available.

As always, I only include prices you can sell at. Assume it costs one extra to buy instead of sell.

November 3 21

November 4 31

November 5 10

November 6-7 7

November 8-9 5

November 10-16 7

November 17-23 5

November 24-30 5

December 1-14 7

After December 14 11

That adds to 109. Again, enjoy your cash.

That is one scary chart. There’s less than a 50% chance that the election will be resolved even a full day after the election. There’s a 10+% chance it won’t be solved over a month later, which likely means it’s going to congress or worse – and again, Trump could well fight on even after Fox News gives up. 

I don’t know that much about when various votes are likely to come in from mail ballots, but neither does anyone else. The one seeming ‘error’ here is that November 8-9 is 5% for a 2-day window, then you have a 7% chance for a 7-day window, then the next 7-day windows are 5% each. Unless there’s a specific reason, that’s a very weird distribution. 

A good question to ask is, how does this line up with the margin of victory?

If the margin of victory is >6% on either side, I’d assume the networks would be able to make the call on November 3-4. That’s a 55% chance according to that market. And there’s a decent chance they can make the call with a smaller margin. If the GOP is legitimately winning the popular vote outright but less than 6%, that’s arguably a >10% chance, and realistically Biden almost certainly is toasty enough that CNN should give up on November 4, although they plausibly hold out past midnight on election night. 

Consider that this market has only a 19% normalized chance (buy costs 22%) that there’s a call on election night, but there’s a higher chance than that for Biden to win by more than 9%. If he does that, I have a hard time believing there’s no call by midnight.

Thus, my gut reaction is that this market is not totally crazy, but it’s somewhat too skeptical of resolving things in the first two days. If anything, I’d think it’s too optimistic then about resolving things quickly or on November 5, provided they’re still in the air on the 4th, but there’s a reasonable argument for ‘once Biden takes the vote lead in the tipping point state it is over, but Fox News won’t call the race until he does take that lead, and it takes a few days to count enough mail in ballots.’ But 10% for that one day, or about 20% of the time given we get that far, does seem like a lot. 

Thus, if I was thinking about what to sell and what to keep, my inclination would be to keep the nightmare scenarios and the quick resolutions, and sell the stuff in between. 

What are some other juicy targets?

Will Biden drop out by 11/1?

You can buy the ‘No’ at 90%. That’s crazy. He’s not going anywhere. 

One way to know it’s crazy is that time is going by, and the number isn’t being discounted. When I got into the No at 10% a week ago, I got the same price. In late August, the price was the same. On July 2 it was 9%! 

I made a good trade. But there was a much better trade available. Which was to buy Biden dropping out in early July, then sell it now for the same price or higher. 

Between now and then, Biden has looked relatively healthy, he has established and maintained a solid poling lead, and has had no scandals that could possibly push him towards dropping out. There’s no way his chances of dropping are more than half what they were in early July, let along higher. 

2020 Election Predictions | State with smallest MOV in 2020?

I won’t list the odds except to note that they add to 111, and you get some bonus states as a freeroll. Nothing in the relative rankings looks crazy to me. My guess is the value is on selling the states trading high and the ones trading super low, and keeping the ones in the middle, if you don’t want to just take free money.

Will the winner of the popular vote also win the Electoral College?

This market has a whopping 29% chance that the winner of the electoral college didn’t win the popular vote. That would add up to a 49% chance for Trump to win. This seems like the best way to bet on Biden in a two-way. 


No. She won’t. Yet somehow they won’t give up thinking she will, and keep not discounting this much for all the time that goes by. I bet the no on this a long, long time ago, early in the primary, and I’m only 1% to the good right now. There’s a lesson in that. You can still buy the No at 95%, and it’s cheap at 99% if you don’t care about tying up capital.

Will Nancy Pelosi become Acting US President on January 20?

This is people living out their fantasies. I’m not sure if it’s outright impossible, but I do know this is not an 8% chance. Wow.

2020 Election Predictions | Will a woman be president?

Think Kamala Harris at 3% is overpriced? Here you can sell her at 6%. Technically you also have to sell all other women, but who is second in probability? Jo Jorgensen? 

Will Michelle Obama run for president in 2020?

Get it through your thick skulls, everyone, that Michelle Obama hates politics and will never run. Yes, she would win, so I get why you dream, and yes everyone says they won’t run until they do, but no she won’t ever run. You can only get 3% on this at this point, so probably not worth the capital. Again, people dreaming. Still, it’s more plausible than Hillary Clinton.

2020 Election Predictions | Will Donald Trump drop out before November 1?

It’s 4%, same as a few weeks ago. Not budging. Not worth touching, but another dumb line.

2020 Election Predictions | Will John Kasich run in 2020?

Still can sell 1% if you’d like! Let no one say interest on your money is dead. Same with Paul Ryan.

2020 Election Predictions | 2020 Iowa Winner Elected President

Another 1% still available, because people want to free up capital and/or forget that this only pays out if the democratic winner wins, so it’s actually 0% to be Yes.

There are also a bunch of states available. I’m not going to go into them all, but there are some good overpriced long shots to sell if you’d like.

Impeachment Predictions | Will the Senate convict Donald Trump in his first term?

4% is still available. That’s on top of 11% for him to resign  in his first term. Much better than his 13% to not complete his first term, since that’s only two of the ways he can leave office. 

There are a lot of Senate races and House races. I’m not up on the situations enough to know the right stuff, and I want to get this out right after writing so the prices don’t change. 

Similarly, I see a bunch of value in the US Government section and the World section, but no free money. 

Adding it all up, it looks like there are a decent number of full-arbitrage markets, plus a large number of ways to earn decent capital returns by betting on a No. The issue, as always, is that either you withdraw the money after the election, or you have to keep it there. I’ve been rolling it, leaving the money on the site, but I only have a few thousand. That leaves me short of being able to sell everything I want to sell at a moment like this. But it was never about actually maximizing every dollar as such. It’s about using the whole thing as a learning exercise. 

Have fun, everyone, but don’t take these market prices too seriously. 


What happens if you drink acetone?

14 сентября, 2020 - 17:22
Published on September 14, 2020 2:22 PM GMT

Question: Should you drink acetone?

Answer: No.

But, out of interest, what if you did? This question is asked repeatedly on the web, with with many answers smugly stating that even tiny amounts of acetone will instantly kill you, you idiot. But they provide no evidence.

Fact #1: Acetone bottles are scary looking

Certainly, this doesn’t look like something you'd want to put in your body:

Fact #2: Your body naturally produces and disposes of acetone.

Acetone naturally occurs in plants. Your liver produces acetone when metabolizing fat. If you fast, have diabetes, or exercise very hard, you produce more acetone. If you follow a ketogenic diet, you produce more. (Acetone is a “ketone”!) Small amounts of acetone are naturally present in your blood and urine, the latter being how you get rid of it.

Fact #3: Diabetes can cause your breath to smell like acetone.

Insulin is needed to break down glucose and provide energy to cells. Diabetics have trouble either producing or using insulin. Thus, their bodies may burn fat instead. Burning lots of fat produces lots of acetone, enough to impact the breath. (This is a serious problem if it occurs.)

Fact #4: Drinking acetone will make you not think so good no more.

Fisher Scientific’s MSDS gives the following effects for acetone:

Ingestion: May cause gastrointestinal irritation with nausea, vomiting and diarrhea. May cause systemic toxicity with acidosis. May cause central nervous system depression, characterized by excitement, followed by headache, dizziness, drowsiness, and nausea. Advanced stages may cause collapse, unconsciousness, coma and possible death due to respiratory failure.

Sounds serious! Except, oh wait, I made a “mistake”. That was the list of effects for ethanol. Here are the effects for acetone:

Ingestion: May cause irritation of the digestive tract. May cause central nervous system depression, characterized by excitement, followed by headache, dizziness, drowsiness, and nausea. Advanced stages may cause collapse, unconsciousness, coma and possible death due to respiratory failure. Aspiration of material into the lungs may cause chemical pneumonitis, which may be fatal.

Remind you of anything?

Fact #5: Acetone is probably marginally more toxic than ethanol.

In animals, the Oral LD50 for acetone ranges from 3 g/kg in mice to 5.8 g/kg in rats. For ethanol it is around 7.3 g/kg for both mice and rats.

Fact #6: Acetone is Generally Recognized as Safe (GRAS) by the FDA.

For better or worse, food manufacturers can put acetone in food and sell it to you without testing for safety. This seems to be common with spice oleoresins (concentrated forms of spices).

Fact #7: Some insane internet people drank acetone and didn’t die.

In a thread on bluelight, Psychedelic Jay reports:

So far 1 ml of pure acetone in 10 ml of water. Effects: Slight sedation, easy going sense of euphoria, very similar but smoother than ethanol intoxication. Heart rate increased by 6-10 beats a minute… Blood pressure exactly the same…

While pino says:

So one night, I took 20ml strongly diluted, a dose which shouldn’t kill you. The taste was masked by mixing it with fruit juice, which made it actually pleasantly to sip. Slightly fruity. In about half an hour, a pleasant warm sedation spread over my body. It felt like a clean alcohol intoxication. Nothing to strong, but very relaxing. I guess it took me for an hour of 10. There is no hangover.

Both of these are consistent with the idea that acetone has effects that are similar to alcohol. All the other comments in that thread, of course, say “what, are you crazy?”.

Fact #8: You shouldn’t drink acetone.

There’s no reason to do so. It’s (presumably) disgusting. It’s very flammable. The effects haven't been studied nearly as much as alcohol's. And I could be wrong about all of this.

But suppose acetone had exactly the same effects as ethanol. Yes, that would mean that "acetone is as safe as alcohol". But it would also mean that "alcohol is as dangerous as acetone". That’s probably the wiser interpretation.


My computational framework for the brain

14 сентября, 2020 - 17:19
Published on September 14, 2020 2:19 PM GMT

By now I've written a bunch of blog posts on brain architecture and algorithms, not in any particular order and generally interspersed with long digressions into Artificial General Intelligence. Here I want to summarize my key ideas in one place, to create a slightly better entry point, and something I can refer back to in certain future posts that I'm planning. If you've read every single one of my previous posts (hi mom!), there's not much new here.

In this post, I'm trying to paint a picture. I'm not really trying to justify it, let alone prove it. The justification ultimately has to be: All the pieces are biologically, computationally, and evolutionarily plausible, and the pieces work together to explain absolutely everything known about human psychology and neuroscience. (I believe it! Try me!) Needless to say, I could be wrong in both the big picture and the details (or missing big things). If so, writing this out will hopefully make my wrongness easier to discover!

Pretty much everything I say here and its opposite can be found in the cognitive neuroscience literature. (It's a controversial field!) I make no pretense to originality (with one exception noted below), but can't be bothered to put in actual references. My previous posts have a bit more background, or just ask me if you're interested. :-P

So let's start in on the 7 guiding principles for how I think about the brain:

1. Two subsystems: "Neocortex" and "Subcortex"

This is the starting point. I think it's absolutely critical. The brain consists of two subsystems. The neocortex is the home of "human intelligence" as we would recognize it—our beliefs, goals, ability to plan and learn and understand, every aspect of our conscious awareness, etc. etc. (All mammals have a neocortex; birds and lizards have an homologous and functionally-equivalent structure called the "pallium".) Some other parts of the brain (thalamus, hippocampus, basal ganglia) help the neocortex do its calculations, and I lump them into the neocortex subsystem. I'll use the term subcortex for the rest of the brain (midbrain, amygdala, etc.).

  • Aside: Is this the triune brain theory? No. Triune brain theory is, from what I gather, a collection of ideas about brain evolution and function, most of which are wrong. One aspect of triune brain theory is putting a lot of emphasis on the distinction between neocortical calculations and subcortical calculations. I like that part. I'm keeping that part, and I'm improving it by expanding the neocortex club to also include the thalamus, hippocampus, lizard pallium, etc., and then I'm ignoring everything else about triune brain theory.
2. Cortical uniformity

I claim that the neocortex is, to a first approximation, architecturally uniform, i.e. all parts of it are running the same generic learning algorithm in a massively-parallelized way.

The two caveats to cortical uniformity (spelled out in more detail at that link) are:

  • There are sorta "hyperparameters" on the generic learning algorithm which are set differently in different parts of the neocortex—for example, different regions have different densities of each neuron type, different thresholds for making new connections (which also depend on age), etc. This is not at all surprising; all learning algorithms inevitably have tradeoffs whose optimal settings depend on the domain that they're learning (no free lunch).
    • As one of many examples of how even "generic" learning algorithms benefit from domain-specific hyperparameters, if you've seen a pattern "A then B then C" recur 10 times in a row, you will start unconsciously expecting AB to be followed by C. But "should" you expect AB to be followed by C after seeing ABC only 2 times? Or what if you've seen the pattern ABC recur 72 times in a row, but then saw AB(not C) twice? What "should" a learning algorithm expect in those cases? The answer depends on the domain—how regular vs random are the environmental patterns you're learning? How stable are they over time? The answer is presumably different for low-level visual patterns vs motor control patterns etc.
  • There is a gross wiring diagram hardcoded in the genome—i.e., set of connections between different neocortical regions and each other, and other parts of the brain. These connections later get refined and edited during learning. These speed the learning process by bringing together information streams with learnable relationships—for example the wiring diagram seeds strong connections between toe-related motor output areas and toe-related proprioceptive (body position sense) input areas. We can learn relations between information streams without any help from the innate wiring diagram, by routing information around the cortex in more convoluted ways—see the Ian Waterman example here—but it's slower, and may consume conscious attention.
3. Blank-slate neocortex

(...But not blank-slate subcortex! More on that below.)

I claim that the neocortex starts out as a "blank slate": Just like an ML model with random weights, the neocortex cannot make any correct predictions or do anything useful until it learns to do so from previous inputs, outputs, and rewards.

(By the way, I am not saying that the neocortex's algorithm is similar to today's ML algorithms. There's more than one blank-slate learning algorithm! See image.)

A "blank slate" learning algorithm, as I'm using the term, is one that learns information "from scratch"—an example would be a Machine Learning model that starts with random weights and then proceeds with gradient descent. When you imagine it, you should not imagine an empty void that gets filled with data. You should imagine a machine that learns more and better patterns over time, and writes those patterns into a memory bank—and "blank slate" just means that the memory bank starts out empty. There are many such machines, and they will learn different patterns and therefore do different things. See next section, and see also the discussion of hyperparameters in the previous section.

Why do I think that the neocortex starts from a blank slate? Two types of reasons:

  • Details of how I think the neocortical algorithm works: This is the main reason for me.
    • For example, as I mentioned here, there's a theory I like that says that all feedforward signals (I'll define that in the next section) in the neocortex—which includes all signals coming into the neocortex from the outside it, plus many cortex-to-cortex signals—are re-encoded into the data format that the neocortex can best process—i.e. a set of sparse codes, with low overlap, uniform distribution, and some other nice properties—and this re-encoding is done by a pseudorandom process! If that's right, it would seem to categorically rule out anything but a blank-slate starting point.
    • More broadly, we know the algorithm can learn new concepts, and new relationships between concepts, without having any of those concepts baked in by evolution—e.g. learning about rocket engine components. So why not consider the possibility that that's all it does, from the very beginning? I can see vaguely how that would work, why that would be biologically plausible and evolutionarily adaptive, and I can't currently see any other way that the algorithm can work.
  • Absence of evidence to the contrary: I have a post Human Instincts, Symbol Grounding, and the Blank-Slate Neocortex where I went through a list of universal human instincts, and didn't see anything inconsistent with a blank-slate neocortex. The subcortex—which is absolutely not a blank slate—plays a big role in most of those. (More on this in a later section.) Likewise I've read about the capabilities of newborn humans and other animals, and still don't see any problem. I accept all challenges; try me!
4. What is the neocortical algorithm?

4.1. "Analysis by synthesis" + "Planning by probabilistic inference"

"Analysis by synthesis" means that the neocortex searches through a space of generative models for a model that predicts its upcoming inputs (both external inputs, like vision, and internal inputs, like proprioception and reward). "Planning by probabilistic inference" (term from here) means that we treat our own actions as probabilistic variables to be modeled, just like everything else. In other words, the neocortex's output lines (motor outputs, hormone outputs, etc.) are the same type of signal as any generative model prediction, and processed in the same way.

Here's how those come together. As discussed in Predictive Coding = RL + SL + Bayes + MPC, and shown in this figure below:

  • The neocortex favors generative models that have been making correct predictions, and discards generative models that have been making predictions that are contradicted by input data (or by other favored generative models).
  • And, the neocortex favors generative models which predict future reward, and discards generative models that predict future negative reward.

This combination allows both good epistemics (ever-better understanding of the world), and good strategy (planning towards goals) in the same algorithm. This combination also has some epistemic and strategic failure modes—e.g. a propensity to wishful thinking—but in a way that seems compatible with human psychology & behavior, which is likewise not perfectly optimal, if you haven't noticed. Again, see the link above for further discussion.

Criteria by which generative models rise to prominence in the neocortex; see Predictive Coding = RL + SL + Bayes + MPC for detailed discussion.
  • Aside: Is this the same as Predictive Coding / Free-Energy Principle? Sorta. I've read a fair amount of "mainstream" predictive coding (Karl Friston, Andy Clark, etc.), and there are a few things about it that I like, including the emphasis on generative models predicting upcoming inputs, and the idea of treating neocortical outputs as just another kind of generative model prediction. It also has a lot of other stuff that I disagree with (or don't understand). My account differs from theirs mainly by (1) emphasizing multiple simultaneous generative models that compete & cooperate (cf. "society of mind", multiagent models of mind, etc.), rather than "a" (singular) prior, and (2) restricting discussion to the neocortex subsystem, rather than trying to explain the brain as a whole. In both cases, this may be partly a difference of emphasis & intuitions, rather than fundamental. But I think the core difference is that predictive coding / FEP takes some processes to be foundational principles, whereas I think that those same things do happen, but that they're emergent behaviors that come out of the algorithm under certain conditions. For example, in Predictive Coding & Motor Control I talk about the predictive-coding story that proprioceptive predictions are literally exactly the same as motor outputs. Well, I don't think they're exactly the same. But I do think that proprioceptive predictions and motor outputs are the same in some cases (but not others), in some parts of the neocortex (but not others), and after (but not before) the learning algorithm has been running a while. So I kinda wind up in a similar place as predictive coding, in some respects.

4.2. Compositional generative models

Each of the generative models consists of predictions that other generative models are on or off, and/or predictions that input channels (coming from outside the neocortex—vision, hunger, reward, etc.) are on or off. ("It's symbols all the way down.") All the predictions are attached to confidence values, and both the predictions and confidence values are, in general, functions of time (or of other parameters—I'm glossing over some details). The generative models are compositional, because if two of them make disjoint and/or consistent predictions, you can create a new model that simply predicts that both of those two component models are active simultaneously. For example, we can snap together a "purple" generative model and a "jar" generative model to get a "purple jar" generative model. They are also compositional in other ways—for example, you can time-sequence them, by making a generative model that says "Generative model X happens and then Generative model Y happens".

PGM-type message-passing: Among other things, the search process for the best set of simultaneously-active generative model involves something at least vaguely analogous to message-passing (belief propagation) in a probabilistic graphical model. Dileep George's vision model is a well-fleshed-out example.

Hierarchies are part of the story but not everything: Hierarchies are a special case of compositional generative models. A generative model for an image of "8" makes strong predictions that there are two "circle" generative models positioned on top of each other. The "circle" generative model, in turn, makes strong predictions that certain contours and textures are present in the visual input stream.

However, not all relations are hierarchical. The "is-a-bird" model makes a medium-strength prediction that the "is-flying" model is active, and the "is-flying" model makes a medium-strength prediction that the "is-a-bird" model is active. Neither is hierarchically above the other.

As another example, the brain has a visual processing hierarchy, but as I understand it, studies show that the brain has loads of connections that don't respect the hierarchy.

Feedforward and feedback signals: There are two important types of signals in the neocortex.

A "feedback" signal is a generative model prediction, attached to a confidence level, which includes all the following:

  • "I predict that neocortical input line #2433 will be active, with probability 0.6".
  • "I predict that generative model #95738 will be active, with probability 0.4".
  • "I predict that neocortical output line #185492 will be active, with probability 0.98"—and this one is a self-fulfilling prophecy, as the feedback signal is also the output line!

A "feedforward" signal is an announcement that a certain signal is, in fact, active right now, which includes all the following:

  • "Neocortical input line #2433 is currently active!"
  • "Generative model #95738 is currently active!"

There are about 10× more feedback connections than feedforward connections in the neocortex, I guess for algorithmic reasons I don't currently understand.

In a hierarchy, the top-down signals are feedback, and the bottom-up signals are feedforward.

The terminology here is a bit unfortunate. In a motor output hierarchy, we think of information flowing "forward" from high-level motion plan to low-level muscle control signals, but that's the feedback direction. The forward/back terminology works better for sensory input hierarchies. Some people say "top-down" and "bottom-up" instead of "feedback" and "feedforward" respectively, which is nice and intuitive for both input and output hierarchies. But then that terminology gets confusing when we talk about non-hierarchical connections. Oh well.

(I'll also note here that "mainstream" predictive coding discussions sometimes talk about feedback signals being associated with confidence intervals for analog feedforward signals, rather than confidence levels for binary feedforward signals. I changed it on purpose. I like my version better.)

5. The subcortex steers the neocortex towards biologically-adaptive behaviors.

The blank-slate neocortex can learn to predict input patterns, but it needs guidance to do biologically adaptive things. So one of the jobs of the subcortex is to try to "steer" the neocortex, and the subcortex's main tool for this task is its ability to send rewards to the neocortex at the appropriate times. Everything that humans reliably and adaptively do with their intelligence, from liking food to making friends, depends on the various reward-determining calculations hardwired into the subcortex.

6. The neocortex is a black box from the perspective of the subcortex. So steering the neocortex is tricky!

Only the neocortex subsystem has an intelligent world-model. Imagine you just lost a big bet, and now you can't pay back your debt to the loan shark. That's bad. The subcortex needs to send negative rewards to the neocortex. But how can it know? How can the subcortex have any idea what's going on? It has no concept of a "bet", or "debt", or "payment" or "loan shark".

This is a very general problem. I think there are two basic ingredients in the solution.

Here's a diagram to refer to, based on the one I put in Inner Alignment in the Brain:

Schematic illustration of some aspects of the relationship between subcortex & neocortex. See also my previous post Inner Alignment in the Brain for more on this.


6.1 The subcortex can learn what's going on in the world via its own, parallel, sensory-processing system.

Thus, for example, we have the well-known visual processing system in our visual cortex, and we have the lesser-known visual processing system in our midbrain (superior colliculus). Ditto for touch, smell, proprioception, nociception, etc.

While they have similar inputs, these two sensory processing systems could not be more different!! The neocortex fits its inputs into a huge, open-ended predictive world-model, but the subcortex instead has a small and hardwired "ontology" consisting of evolutionarily-relevant inputs that it can recognize like faces, human speech sounds, spiders, snakes, looking down from a great height, various tastes and smells, stimuli that call for flinching, stimuli that one should orient towards, etc. etc., and these hardwired recognition circuits are connected to hardwired responses.

For example, babies learn to recognize faces quickly and reliably in part because the midbrain sensory processing system knows what a face looks like, and when it sees one, it will saccade to it, and thus the neocortex will spend disproportionate time building predictive models of faces.

...Or better yet, instead of saccading to faces itself, the subcortex can reward the neocortex each time it detects that it is looking at a face! Then the neocortex will go off looking for faces, using its neocortex-superpowers to learn arbitrary patterns of sensory inputs and motor outputs that tend to result in looking at people's faces. 

6.2 The subcortex can see the neocortex's outputs—which include not only prediction but imagination, memory, and empathetic simulations of other people.

For example, if the neocortex never predicts or imagines any reward, then the subcortex can guess that the neocortex has a grim assessment of its prospects for the future—I'll discuss that particular example much more in an upcoming post on depression.

To squeeze more information out of the neocortex, the subcortex can also "teach" the neocortex to reveal when it is thinking of one of the situations in the subcortex's small hardwired ontology (faces, spiders, sweet tastes, etc.—see above). For example, if the subcortex rewards the neocortex for cringing in advance of pain, then the neocortex will learn to favor pain-prediction generative models that also send out cringe-motor-commands. And thus, eventually, it will also start sending weak cringe-motor-commands when imagining future pain, or when empathically simulating someone in pain—and the subcortex can detect that, and issue hardwired responses in turn.

See Inner Alignment in the Brain for more examples & discussion of all this stuff about steering.

Unlike most of the other stuff here, I haven't seen anything in the literature that takes "how does the subcortex steer the neocortex?" to be a problem that needs to be solved, let alone that solves it. (Let me know if you have!) ...Whereas I see it as The Most Important And Time-Sensitive Problem In All Of Neuroscience—because if we build neocortex-like AI algorithms, we will need to know how to steer them towards safe and beneficial behaviors!

7. The subcortical algorithms remain largely unknown

I think much less is known about the algorithms of the subcortex (midbrain, amygdala, etc.) than about the algorithms of the neocortex. There are a couple issues:

  • The subcortex's algorithms are more complicated than the neocortex's algorithms: As described above, I think the neocortex has more-or-less one generic learning algorithm. Sure, it consists of many interlocking parts, but it has an overall logic. The subcortex, by contrast, has circuitry for detecting and flinching away from an incoming projectile, circuitry for detecting spiders in the visual field, circuitry for (somehow) implementing lots of different social instincts, etc. etc. I doubt all these things strongly overlap each other, though I don't know that for sure. That makes it harder to figure out what's going on.
    • I don't think the algorithms are "complicated" in the sense of "mysterious and sophisticated". Unlike the neocortex, I don't think these algorithms are doing anything where a machine learning expert couldn't sit down and implement something functionally equivalent in PyTorch right now. I think they are complicated in that they have a complicated specification (this kind of input produces that kind of output, and this other kind of input produces this other kind of output, etc. etc. etc.), and this specification what we need to work out.
  • Fewer people are working on subcortical algorithms than the neocortex's algorithms: The neocortex is the center of human intelligence and cognition. So very exciting! So very monetizable! By contrast, the midbrain seems far less exciting and far less practically useful. Also, the neocortex is nearest the skull, and thus accessible to some experimental techniques (e.g. EEG, MEG, ECoG) that don't work on deeper structures. This is especially limiting when studying live humans, I think.

As mentioned above, I am very unhappy about this state of affairs. For the project of building safe and beneficial artificial general intelligence, I feel strongly that it would be better if we reverse-engineered subcortical algorithms first, and neocortical algorithms second.


Well, my brief summary wasn't all that brief after all! Congratulations on making it this far! I'm very open to questions, discussion, and criticism. I've already revised my views on all these topics numerous times, and expect to do so again. :-)