Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 6 минут 32 секунды назад

Modeling Risks From Learned Optimization

12 октября, 2021 - 23:54
Published on October 12, 2021 8:54 PM GMT

This post is part 5 in our sequence on Modeling Transformative AI Risk. We are building a model to understand debates around existential risks from advanced AI. The model is made with Analytica software, and consists of nodes (representing key hypotheses and cruxes) and edges (representing the relationships between these cruxes), with final output corresponding to the likelihood of various potential failure scenarios. You can read more about the motivation for our project and how the model works in the Introduction post. The previous post in the sequence, Takeoff Speeds and Discontinuities, described the different potential characteristics of a transition from high-level machine intelligence [1] to superintelligent AI.

We are interested in feedback on this post, especially in places where the model does not capture your views or fails to include an uncertainty that you think could be an important crux. Similarly, if an explanation seems confused or confusing, flagging this is useful – both to help us clarify, and to ensure it doesn’t reflect an actual disagreement.

This post explains how risks from learned optimization are incorporated in our model. The relevant part of the model is mostly based on the Risks from Learned Optimization sequence and paper (henceforth RLO). Although we considered responses and alternate perspectives to RLO in our research, these perspectives are not currently modeled explicitly.

For those not familiar with the topic, a mesa-optimizer is a learned algorithm that is itself an optimizer. According to RLO, inner alignment is the problem of aligning the objective of a mesa-optimizer with the objective of its base optimizer (which may be specified by the programmer). A contrived example supposes we want an algorithm that finds the shortest path through any maze. In the training data, all mazes have doors that are red, including the exit. Inner misalignment arises if we get an algorithm that efficiently searches for the next red door the capabilities are robust because the search algorithm is general and efficient, but the objective is not robust because it finds red doors rather than the exit.

The relevant part of our model is contained in the Mesa-Optimization module:

Right-click and select "open image in new tab" (or similar) to see images full-size

The output of the Mesa-Optimization module is an input to the Incorrigibility module. The logic is that inner misalignment is one way that high-level machine intelligence (HLMI) could become incorrigible, which in turn counts strongly against being able to Correct course as we go in the development of HLMI.

Module OverviewOverview of the Mesa-Optimization module. We recommend reading top-to-bottom, left-to-right.

The top-level logic of the Mesa-Optimization module is: HLMI has an inner alignment failure if

  1. The HLMI contains a mesa-optimizer, AND
  2. Given (1), the mesa-optimizer is pseudo-aligned, i.e. it acts aligned in the training setting but its objective is not robust to other settings, AND
  3. Given (2), the pseudo-alignment is not sufficient for intent alignment, i.e. is not safe enough to make HLMI corrigible, AND
  4. Given (3), we fail to stop deployment of the unsafe system.

In the following sections, we explain how each of these steps are modeled.

HLMI contains a mesa-optimizer

The output of this section is HLMI contains a mesa-optimizer, which depends on three nodes. The left node, HLMI is trained with a base optimizer, means that a training algorithm optimized a distinct learned algorithm, and that learned algorithm forms all or part of the HLMI system. A crux here is what Type of HLMI you expect, which comes out of the pathways discussed in the Paths to HLMI post. For instance, HLMI via current deep learning methods or evolutionary methods will involve a base optimizer, but this is not true of other pathways such as whole-brain emulation.

The middle node, Argument for mesa-optimization, represents the argument from first principles for why mesa-optimization would occur in HLMI. It is mainly based on the post Conditions for Mesa-Optimization. This is broken down further into three nodes. Mesa-optimization is more efficient than highly tuned algorithms distils some claims about the advantages of mesa-optimization compared to systems without mesa-optimization, including that it offers better generalisation through search. You could reject that claim on the grounds that sample efficiency will be high enough, such that HLMI consists of a bunch of algorithms that are highly tuned to different domains. You could also argue that some types of HLMI are more prone to be highly tuned than others. For instance, evolutionary algorithms might be more likely to mesa-optimize than machine learning, and machine learning might be more likely to mesa-optimize than hybrid ML-symbolic algorithms.

The middle node, Training task is generic, resolves as positive if the learned algorithm is not directly optimized for domain-specific tasks. This scenario is characteristic of pre-training in modern machine learning. For example, the task of predicting the next word in a large text dataset is generic, because the dataset could contain all kinds of content that is relevant to many different domain-specific tasks. In contrast, one could use a more domain-specific dataset (e.g. worded math problems) or objective function (e.g. reward for the quality of a news article summary). This crux is important if mesa-optimizers generalise better than other kinds of algorithms, because then a generic training task would tend to select for mesa-optimizers more. GPT-3 gives credence to this idea, because it demonstrates strong few-shot learning performance by simply learning to predict the next word in a text. However, it's uncertain if GPT-3 meets the definition of a mesa-optimizer.

Finally, Inductive biases in base optimizer lumps together some factors about inductive bias in the HLMI’s architecture and training algorithm which affect the chance of mesa-optimization. For example, the extent that mesa-optimization exists in the space of possible models (i.e. algorithmic range), and the ease by which the base optimizer finds a mesa-optimizer (i.e. reachability). Inductive bias is a big factor in some people's beliefs about inner alignment, but there is disagreement about just how important it is and how it works (see e.g. Inductive biases stick around, including comments).

Analogies for mesa-optimization

Finally, evidence for the node HLMI contains a mesa-optimizer is also drawn from history and analogies. We modeled this evidence in a submodule, shown below. The submodule is structured as a Naive Bayes classifier: it models the likelihood of the evidence, given the hypotheses that an HLMI system does or does not contain a mesa-optimizer. The likelihood updates the following prior: if you only knew the definition of mesa-optimizer, and hadn't considered any specific cases, arguments or evidence for it, what is the probability of an optimizing system containing a mesa-optimizer?

We considered three domains for the evidence: machine learning today, firms with respect to economies, and humans with respect to natural selection. A few more specific nodes are included under Examples from ML today because there has been some interesting discussion and disagreement in this area: 

  • The "spontaneous emergence of learning algorithms" from reinforcement learning algorithms has been cited as evidence for mesa-optimization, but this may not be informative.
  • Meta-learning is a popular area of ML research, but since it applies a deliberate and outer-loop optimization, it doesn't clearly affect the likelihood of spontaneous mesa-optimization.
  • GPT-3 does few-shot learning in order to perform novel tasks. This post gives an argument for why it is incentivised for – or may already be – mesa-optimizing.
  • It is unclear and debated how much AlphaZero marginally benefits from Monte Carlo Tree Search, which is a form of mechanistic optimization, compared to just increasing the model size. In turn it is unclear how much evidence AlphaZero provides for getting better generalisation through search, which is argued as an advantage of mesa-optimization.
The mesa-optimizer is pseudo-aligned

The possibility of pseudo-alignment depends on just two nodes in our model. The structure is simple mainly because pseudo-alignment of any kind seems much more likely than robust alignment, so we haven't noticed much debate on this point. In general there are many more ways to perform well on the training distribution which are not robustly aligned with the objective, i.e. they would perform significantly worse on that objective under some realistic shift in the distribution. And in practice today, this kind of robustness is a major challenge in ML (Concrete Problems in AI Safety section 7 gives an overview, though it was published back in 2016).

The dependency on the left is a module, Analogies for pseudo-alignment, which is structured identically to Analogies for mesa-optimization (i.e. with a Naive Bayes classifier, and so on), but the competing hypotheses are pseudo-alignment and robust alignment, and the analogies are simply "ML systems today", "Firm", and "Human".s

The second node influencing pseudo-alignment is Mesa-optimizer uses modeling. The concept of "modeling" vs. "internalization" introduced in the RLO paper (section 4.4) is relevant to pseudo-alignment. Internalization implies robust alignment, whereas modeling means the mesa-optimizer is pointing to something in its input data and/or its model of the world in order to act aligned. We explain this node and the implications of "modeling" in more detail in the next section.

Pseudo-alignment is not safe enough

At the top level of this subsection, we include three reasons why pseudo-alignment might not be safe enough to count as a failure of inner alignment. Firstly, there is a crux of whether corrigibility has a broad basin of attraction. This refers to Paul Christiano's claim that "A sufficiently corrigible agent will tend to become more corrigible and benign over time. Corrigibility marks out a broad basin of attraction towards acceptable outcomes." If Christiano’s claim about corrigibility is true, this increases the overall chance that a pseudo-aligned algorithm becomes safe enough before it locks in a path to catastrophe. 

A second crux for the safety of pseudo-alignment is how malign we expect a mesa-objective to be by default (by malign, we just mean non-benign, or harmful in effect it doesn't have to be inherently malicious). It's uncertain what mesa-objectives are generally like, because they arise internally from the base optimizer, and there is currently scant empirical evidence of mesa-optimization. It's reasonable to expect a proxy objective to be much closer to the base objective than to a random objective, so the danger partly depends on the base objective. Perhaps mesa-objectives will just be weird in benign ways, e.g. being irrational, or very local. On the other hand, it seems that simple, unbounded, coherent objectives would tend to have lower description length than objectives lacking those attributes, so those objectives may be favoured by simplicity bias

Before moving on to the third reason, deceptive alignment, let's review the concept of modeling vs. internalization introduced in the RLO paper (section 4.4). Modeling is where a mesa-optimizer acts aligned with a base objective (in training) by representing the base objective within its world model (using relevant information from its input data) and then deliberately acting aligned with this representation. By understanding the base objective, the mesa optimizer can behave more aligned, whether or not its own objective is different. (In this case, “modeling” is similar to a trial employee working hard because their mental model of the company indicates that the company wants them to work hard.) The alternative manner to modeling in which a mesa-optimizer can act aligned with a base objective is internalization: the base optimizer adjusts the mesa-objective to be more and more aligned until convergence. (In this case, internalization is similar to a trial employee working hard because they intrinsically want to work hard.)

Modeling-based mesa-optimizers are thought to end up deceptively or corrigibly aligned as a result of training. Deceptive alignment means the learned algorithm optimizes for the base objective during training for instrumental reasons (namely, to deceive the learning algorithm or programmers into thinking it’s aligned). Returning to the example of our trial employee, a deceptively aligned trial employee might just want a salary and to slack off, so he will work hard while in trial to be offered the job, and then after being offered the job only work the minimal amount to not get fired. So a deceptively aligned algorithm deviates when it becomes favourable, e.g. it figures out that training has ended, removing the threat of being penalised.

The main alternative to a deceptive mesa-optimizer that we consider is a corrigible mesa-optimizer. A corrigible mesa-optimizer optimizes for a pointer to the part of its world model that includes the base objective (both in training and deployment). The corrigibility comes from the fact that the base optimizer can modify this pointer to be more accurate. Here, our trial employee cares about being "the perfect employee" both in trial and once offered a job, but they are uncertain how to measure that. So they might observe employees that have succeeded (the pointer) and learn to imitate that. Note that this is subtly different from the internalizing employee. The internalizing employee doesn't care about the company in itself, but values working hard for its own sake (and thus happens to be aligned with the company), whereas the corrigibly modeling employee wants to do whatever they believe the company wants of them, which is to work hard. 

Coming back to our model, the third reason that pseudo-alignment might not be safe enough is if the Mesa-optimizer is deceptively aligned.

We broke down deceptive alignment into two main conditions. First, a precondition that the mesa-optimizer is using modeling. The key question influencing this precondition is how easy it is for a mesa-optimizer to model rather than internalize. We represent this as the likelihood ratio of eventually settling on modeling versus internalization. One argument for modeling being easier is that, within the training data for the HLMI, there will likely be a huge amount of rich input data relevant for determining the base objective, and it will therefore be easier for the HLMI to capture this objective function by referring to the relevant parts of this data, than it will be to build a representation of the objective from scratch. However, this argument depends on just how difficult it is to construct representations that are useful for modeling the objective, which in turn depends both on the nature of the data and the objective. We can also look to analogies here, such as imprinting in animals. For example, rather than have a detailed representation of a mother goose, a gosling imprints on some initial stimulus in its environment, whether that be an actual goose or, say, a human's boots, and then treats the imprinted object as its mother (e.g. it follows the object around).

Feeding into the ease of modeling vs. internalization node is Reduction factor due to R&D for objective robustness (objective robustness is terminology from Clarifying inner alignment terminology). This node quantifies the effect that safety research has (as a constant multiplier) on reducing the Ease of modeling vs. internalizing the base objective, assuming that modeling is dangerous and better to avoid.

The second condition for deceptive alignment is simply: given that a mesa-optimizer uses modeling, it is deceptive. This possibility is broken down into a submodule of nodes, explained in the next section.

Deceptive alignment

The inside of the "Modeling-based mesa-optimizer is deceptive" submodule. 

There are three factors feeding into the modeling-based mesa-optimizer is deceptive node. First is Relative number of ways to get a deceptive rather than corrigible mesa-optimizer. Think of this as a base rate for deceptive (rather than corrigle) mesa-optimization. There is a counting argument for deception here: since deception is instrumentally motivated, it is compatible with almost any proxy goal, whereas for corrigibility, the base optimizer has to find a very specific pointer to relate to the base objective.

The second factor is the Ease of finding deceptive (rather than corrigible) mesa-optimizers. Similar to reachability, this node is represented by the relative likelihood that the base optimizer outputs a deceptive mesa-optimizer at some point in training. 

The post Does SGD Produce Deceptive Alignment? details some arguments for and against deceptive algorithms being easier to find than corrigible alignment (specifically for stochastic gradient descent, but much of it is more general). Becoming deceptive requires caring about future instances, so that there is an incentive to deceive now for later gain. Becoming deceptive also requires reasoning about the training process, so that the mesa-optimizer expects the threat of modification to go away and for deception to pay off. On the other hand, becoming corrigible requires a robust pointer to a good model of the base objective otherwise the mesa-optimizer will just optimize for whatever proxy it has. Furthermore, even if there is information related to the base objective in the input data, the mesa-optimizer will need to have uncertainty about precisely what objective it is pointing to. The conditions for deception are claimed to be easier [2]. However, there seems to be a lot of uncertainty here which might hinge on the nature of the base objective and how the base optimizer works.

Going one level down from the Ease of finding deceptive (rather than corrigible) mesa-optimizers node, we have Modeling-based mesa-optimizer cares about future instances. This means the mesa-optimizer is motivated by what happens in future instances of training or in deployment (instead of just its current round of training). Caring about future instances is a necessary condition for deceptive alignment, but not corrigible alignment if an algorithm does not consider future instances, then it will not see the benefit of defecting later in time, and therefore has no instrumental reason to deceive. The node for Reduction factor due to R&D for myopic cognition refers to the idea that methods to induce myopic cognition, such as approval-directed agents, could reduce the chance of modeling-based mesa-optimizers caring about future instances (myopic cognition is discussed in this post).

The third top-level node here is Relative likelihood that deceptive vs. corrigible mesa-optimizer persists that is, the likelihood that once a mesa-optimizer becomes deceptive in one instance, it survives and remains deceptive indefinitely (relative to the likelihood for a corrigible mesa-optimizer). Deceptive algorithms may be favoured due to their ability and incentive to adapt to the base objective, because they can reason about the objective and they care about future instances. However, protecting a proxy objective from modification may be difficult. This too seems like an important uncertainty that depends on empirical facts about the base optimizer, e.g. how stochastic gradient descent works with modern neural nets.

Finally, we include two areas of research that could reduce the likelihood of deceptive alignment. One is research to improve objective robustness, such as Relaxed adversarial training for inner alignment, where one of the aims is avoiding deception. A key part of making relaxed adversarial training work is transparency tools to guide the training process. This comment argues why that is more helpful than inspecting the algorithm after deception may have already occurred. Research into transparency may help to prevent deceptive alignment in other ways, so this is kept as a separate node.

We fail to stop deployment

The last part of the Mesa-Optimization module is about whether the overseers of an HLMI project will pull back before it's too late, given there is an HLMI with an unsafe mesa-optimizer that hasn’t been deployed yet (or trained to sufficiently advanced capabilities to “break out”). The way they could pull back in such a situation is if they are aware that it's unsafe, AND they coordinate to stop the threat. That logic is inverted to an OR in the module, to fit with the top-level output of HLMI has inner alignment failure.

Mesa-optimizer is deceptively aligned is one of the strongest factors in knowing whether a mesa-optimizer is unsafely pseudo-aligned, because deception works against overseers figuring this out. Meanwhile, R&D for transparency & interpretability may help to detect unsafe pseudo-aligned algorithms. On the other hand, deceptive algorithms may just exploit weaknesses in the transparency tools, so it is even possible for the reduction factor to be less than 1 (i.e. it increases the chance that we fail to see the danger).


In this post, we have examined important cruxes and uncertainties about risks from learned optimization, and how they relate to each other. Our model is mostly based on the Risks from Learned Optimization sequence, and considers whether mesa-optimization will occur in HLMI at all, whether pseudo-alignment occurs and is dangerous, and whether the mesa-optimizer will be deployed or "break out" of a controlled environment. Some uncertainties we have identified relate to the nature of HLMI, connecting back to Paths to HLMI and to analogies with other domains. Other uncertainties relate to the training task, e.g. how generic the task is, and whether the input data is large and rich enough to incentivise modeling the objective. Other uncertainties are broadly related to inductive bias, e.g. whether the base optimizer tends to produce mesa-optimizers with harmful objectives.

We are interested in any feedback you might have, including how this post affected your understanding of the topic and the uncertainties involved, your opinions about the uncertainties, and important points that our model may not capture.

The next post in this series will look at the effects of AI Safety research agendas.


[1] We define HLMI as machines that are capable, either individually or collectively, of performing almost all economically-relevant information-processing tasks that are performed by humans, or quickly (relative to humans) learning to perform such tasks. We are using the term “high-level machine intelligence” here instead of the related terms “human-level machine intelligence”, “artificial general intelligence”, or “transformative AI”, since these other terms are often seen as baking in assumptions about either the nature of intelligence or advanced AI that are not universally accepted.

[2] Parts of this argument are more spelled out in the FLI podcast with Evan Hubinger (search the transcript for "which is deception versus corrigibility").


Thanks to the rest of the MTAIR Project team, as well as the following individuals, for valuable feedback and discussion that contributed to this post: Evan Hubinger, Chris van Merwijk, and Rohin Shah.


Book Review: Feeling Great by David Burns

12 октября, 2021 - 23:15
Published on October 12, 2021 4:32 PM GMT

I watched Paw Patrol: The Movie because I have a four year old I love.  In it, the police puppy Chase was scared of jumping over a gaping chasm between two buildings, so his owner the boy Ryder encourages him by telling him he's the bravest dog he knows, and that type of thing.  In the movie, this bit of cheerleading works, and the puppy makes the leap, performs extraordinary feats with his high opinion of himself restored.  

What David Burns argues in Feeling Great, which I find so radical, is that in this situation:

  1. The cheerleading wouldn't really work.  For one thing, the puppy knows that's a ridiculous claim.  How is he the single bravest dog the boy Ryder has ever met?  Define brave.  Bravest every moment continuously with no other dog cutting in with a bit of bravery of its own ever?  Really the bravest compared to even the other puppies on the Paw Patrol team (Marshall the firefighter puppy, Skye the pilot puppy, et al.)?  Would Ryder say that were the other puppies within earshot?  If not, how can Chase take Ryder's cheerleading seriously?  Cheerleading with affirmations, whether from others or as self-dialogue, in the form of "You're/I'm so brave, beautiful, smart, etc." doesn't really work because it feels phony, since we can always find counter examples.
  2. The cheerleading sets the police puppy Chase up for a turbulent inner life, in which how he feels is determined by his fluctuations on the bravery scale, as compared to other external dogs, or even to his own evaluations of his bravery.  On days he feels/acts less brave, ought he feel bad about himself?  Or ought he maintain an unwaveringly exact amount of self-bravery evaluation? 

One of Feeling Great's philosophical points, separate from the new way of doing therapy (already reviewed by Steven Byrnes) is to embrace that there is no such thing as a brave puppy, by way of Ludwig Wittgenstein.  That the words brave puppy are just sounds.  A real puppy is not worried about whether he is brave, but is just enjoying life chasing squirrels or saving humans.  Those actions can be well or ill done, but a puppy cannot be well or ill performed/to be.  Many problems of psychology are essentially linguistic problems.  

Another of Feeling Great's revolutionary and creative philosophical points is the argument to accept ourselves as ordinary, not extraordinary.  And that paradoxically, by giving up the need to be special, life becomes special.  His message is the opposite of self-help/self-improvement.  

David Burns on his cat Obie and himself:

"Clearly, Obie was not special.  He was just an ordinary, homeless, desperate cat who appeared at our kitchen door on the verge of death, hoping for some food.  And although he became a healthy, proud, and gorgeous boy, he was not a pure-bred and couldn't win any cat shows.  

And I'm not special either.  I'm just an old fart now.  But when I was with my buddy, Obie -- just hanging out and not doing much -- that was the greatest experience in the world.  Obie taught me that when you no longer need to be "special," life becomes special."

David Burns dedicated Feeling Great to his cat Obie, his "best friend and teacher."  

On his student at Stanford Medical School, "Dr. Matthew May when he was a psychiatric resident.  Matt was exceptionally skillful and a joy to work with":

"One night, we were driving back to my house from a supervision session when we came to a stop sign.  Matt looked at me with a very sincere look in his eyes and said, "Dr. Burns, I just want you to know that every day I'm trying really hard to become a better person."  

I gave him an equally sincere look and said, "Matt, I really hope you get over that pretty soon!"

He suddenly got it and broke into laughter.  That was his moment of enlightenment."

I hope you also get it now or perhaps soon.  Because when your ego dies -- and you discover that you are not "special" and that you no longer need to be special -- life can become pretty incredible."

David Burns advocates "the painful acceptance of the fact that you're not actually special and the fantastically liberating discovery that you don't need to be."

Back to the police puppy Chase on that ledge, doubt-wracked.  I think Feeling Great's message to him would be how to jump over the chasm to save his owner is a real problem, but compounding that with self-questions of whether he is brave or not, special or not, how brave, how special, are linguistic problems that create psychological problems.  To embrace being just an ordinary dog, would shed his feelings of "inadequacy, guilt, shame, inferiority, and worthlessness."  That he "will lose nothing but your suffering and your "self," and you will gain the world."

David Burns's original book Feeling Good (1980) was cognitive behavior therapy.  His new Feeling Great (2020) is philosophy, plus cognitive behavior therapy.  Instead of finding what's wrong with ourselves and improving, he turns and teaches us how to find those exact flaws as what's best about us, and then to go a step further, and not only accept ourselves, but accept ourselves as just ordinary.   

It isn't laying down the Law of Jante, prescribing that we shouldn't believe that we are special.  Rather David Burns writes of sitting with open hands, instead of proselytizing.  That he isn't interested in telling people how to live their lives, but helping them when they go to him with a problem.  Basing our "selves" on being brave/special/etc. has such a great cost, that if say Chase the police puppy were to choose to give up that volatile basis for how he feels about himself, then David Burns can show him the way out of the woods.  But if Chase wants to go on trying to be brave/special, then more power to him, and David Burns wishes him well.  

I find these ideas heady, provocative, and deeply influential.  And I can't help but write them into my novel (this is my moment to shill), a serial wuxia adventure on Substack.  


Book Review: Philosophical Investigations by Wittgenstein

12 октября, 2021 - 23:14
Published on October 12, 2021 8:14 PM GMT

Ludwig Wittgenstein is often considered one of the greatest philosophers of the 20th century and is certainly one of its most fascinating figures. Here's some facts to illustrate:

  • Wittgenstein was the son of the second-wealthiest family in Austria-Hungry. He gave away his fortune soon after he inherited it, apocraphally, to his siblings because it would do them no harm as they were already wealthy.
  • After finishing the Tracatatus, he quit to become a school teacher, on the basis that he'd already dissolved all the problems of philosophy. Nearly ten years later he returned and frantically set about refuting his previous position. His positions changed so much that people are often recommended to consider Early Wittgenstein and Latter Wittgenstein as two separate figures.
  • Wittgenstein fought in WW1 and was decorated for his bravery, standing at his post among heavy shelling. He somehow managed to write Tracatatus during this period.
  • In another apocrapha story, Karl Popper was invited to give a talk at the Cambridge University Moral Sciences Club which Wittgenstein was chairing. Wittgenstein and Popper grew increasingly argumentative over time and Wittgenstein started gesturing with a poker to accentuate his points. Eventually Wittgestein demanded that Popper provided an example of a moral rule, which Popper countered, "Not to threaten visiting lecturers with pokers" leading Wittgenstein to storm out in frustration.

I enjoyed reading this book as Wittgenstein is eminently quotable and he writes with crystal clear language. However, the book is incredibly challenging due to the lack of structure, his preference to be implicit rather than explicit and the challenge of understanding how any particular sentence fits into to the overall picture.

Here are some reasons why I consider this book of relevance to this community:

  • Wittgenstein introduced the notion of words capturing a family resemblance - which roughly corresponds to Eliezer's description of words as clusters in thingspace.
  • Eliezer has written about the importance of dissolving the question, whilst Wittgenstein argued that most problems in philosophy were merely linguistic confusions and needed to be dissolved. I don't know if Wittgenstein was the originator of this concept, but he seems to have made it more prominent.
  • Wittgenstein introduced the concept of language-games which I consider to be one of the best metaphors for understanding how language works.
  • Wittgenstein argued for the importance of "meaning as use" which I see as effectively the linguistic equivalent of revealed preferences from economics and as prefiguring Conceptual Engineering.
  • In my opinion, Wittgenstien provides some deep insights into logic.

Please note that despite my best attempt to separate out what Wittgenstein was arguing and how I interpreted him, there will inevitably be areas where I end up reinterpreting without meaning to do so. I especially want to emphasise that the relevance sections I've added are just intended to provide one interpretation of how these ideas are relevant.


Given that a large part of the Philosophical Investigations is aimed at the position he defends in Tractatus, it makes sense to begin by giving a brief overview of his arguments here (I haven't read this book, so I am writing this based on secondary material). He even notes in the preface of the latter work that it will be hard to understand The Investigations except by contrast with the Tractatus.

Wittgenstein attempts to give an account of language - in particular, of what can and can't be said. He is trying to persuade us that there are certain things that we can't speak precisely about, and that we should therefore give up on trying to find a philosophical account of these ("Whereof one cannot speak, thereof one must be silent").

The account he provides of truth is known as the Picture Theory of Meaning. This asserts that a proposition is true when the picture representing that proposition corresponds to reality. Pictures consist of elements standing in relations with the elements representing real-world objects which stand in a particular state of affairs. In particular, Wittgenstein claims that we can't sensibly speak of things that can't be represented by these kinds of pictures.

This is not to imply that everything that can't be spoken of is unimportant or non-existent. In fact, he even suggests that these might be the most important things of all:

There are, indeed, things that cannot be put into words. They make themselves manifest. They are what is mystical.

Picture theory is normally taken to be assuming a Correspondence Theory of Truth where Correspondence Theory claims that a proposition is true when it corresponds in some way to how the external world is. One way to make this theory clearer is to consider the most prominent alternative - Coherentism. This claims that insofar as there is a truth to be aspired to, the best we can do is to ensure that our propositions are coherent, rather than trying to make them correspond to some kind of external reality.

Wittgenstein's Picture Theory adopts a kind of logical atomism which Wikipedia describes as follows:

The world consists of ultimate logical "facts" (or "atoms") that cannot be broken down any further, each of which can be understood independently of other facts.

(See the Stanford Encylopedia of Philosophy for a more technical description).

These atomic propositions assert the existence of atomic states of affairs which are combinations of simple objects. Wittgenstein doesn't provide examples of these simple objects or states of affair, nor define what they are precisely. A. C. Grayling suggests that having provided the general framework, Wittgenstein might have seen what was left as mere details to be filled in by others. For example, at one point in time, we might have been tempted to make these statements about atoms, but perhaps now we'd be tempted to make these statements about quantum wavefunctions. Arguably, these are mere details, rather than the core of the theory and scientists can fill in the appropriate simple objects depending on the latest scientific theories.

Perhaps the biggest flaw of this account is that Picture Theory, as Wittgenstein has characterised it, cannot represent itself. This paradox is noted in the metaphor known as Wittgenstein's Ladder:

My propositions serve as elucidations in the following way: anyone who understands me eventually recognizes them as nonsensical, when he has used them—as steps—to climb beyond them. (He must, so to speak, throw away the ladder after he has climbed up it.). He must transcend these propositions, and then he will see the world aright

The standard reading of this is to take him as having utilised nonsense to demonstrate an ineffable truth that nevertheless remains after the nonsense of the surface-level statements has been exposed. That is the sentences are there as something to be worked through, rather than as assertions to be taken literally.

While this is far from a perfect analogy, the Sokal Hoax involved Alan Sokal submitting nonsense to a journal of cultural studies in order to critique postmodernism. For many people, this constituted an effective critique of postmodernism, despite the fact that he didn't address the issue head on. Nor did it matter that the submission was nonsense, but this was in fact core to how the critique operated.

(In contrast, the resolute reading asserts that when Wittgenstein asserts that his propositions are nonsense, he means precisely that. That is, that Tracatus is just plain nonsense, rather than some kind of elevated nonsense designed to reveal deep truths. And that end result is not some ineffable truths but a realisation that the project Wittgenstein embarked upon was inevitably destined to fail.)

While there is more I could say about this, I'll leave it here as Tractatus isn't the focus of this post. I will however briefly note that I have posited my own solution to the flaw that is picture theory's inability to picture itself.

Philosophical InvestigationsAugustine model of language

The Philosophical Investigations starts with a passage from Augustine:

"When they (my elders) named some object, and accordingly moved towards something, I saw this and I grasped that the thing was called by the sound they uttered when they meant to point it out. Their intention was shewn by their bodily movements, as it were the natural language of all peoples: the expression of the face, the play of the eyes, the movement of other parts of the body, and the tone of voice which expresses our state of mind in seeking, having, rejecting, or avoiding something. Thus, as I heard words repeatedly used in their proper places in various sentences, I gradually learnt to understand what objects they signified; and after I had trained my mouth to form these signs, I used them to express my own desires."

 Wittgenstein summarises as providing a picture of language as follows:

The individual words in language name objects—sentences are combinations of such names. In this picture of language we find the roots of the following idea: Every word has a meaning.

Wittgenstein's Picture Theory adopted this kind of approach in that elementary propositions make claims about the relations of elementary objects. Wittgenstein is now explicitly rejecting this theory of language with his first critique being that Augustine doesn't "speak of their being any difference between kinds of word", explaining:

If you describe the learning of language in this way you are, I believe, thinking primarily of nouns like "table", "chair", "bread", and of people's names, and only secondarily of the names of certain actions and properties; and of the remaining kinds of word as something that will take care of itself.

Wittgenstein later suggest that we are fooled into believing words function in roughly the same manner by their "uniform appearance". He uses the analogy of a toolbox, where even though we might class all the contents as tools, the actual tools might be completely unalike. He proposes another example of handle on a locomotive which might all like identical, but actually function completely differently depending on what they control.

He writes:

When we say: "Every word in language signifies something" we have so far said nothing whatever; unless we have explained exactly what distinction we wish to make.

We will cover this in more detail in the section on expressions.

Wittgenstein introduces the term "Ostensive Teaching of Words" to describe teaching by saying a word whilst simultaneously pointing or otherwise directing the attention of the learner to the relevant object.

Wittgenstein criticises the claim that we learn language through ostensive teaching on the basis that it is only capable of teaching us to associate objects with words and that we need training in order to learn how we're actually supposed to respond to words. For example, imagine a parent shows a child picture of someone striking another person and says the word "punch". If the child responds by punching the parent because they believe that's what they are supposed to do, the parent may very well respond with a timeout or a spanking to teach the child that they performed an unwanted action. Even if we say that the naive child understood the word "punch", its clear that they didn't understand the meaning of the utterance.

Another criticism he makes is that it is very difficult to teach words like "there" and "this" ostensively. Perhaps you could point to an object then take your other finger and alternate point to your first finger and the the object, but this is kind of awkward and no-one does this.

Relevance: The Augustine model of language and obstensive teaching incline us towards a model where the meaning of words is represented by propositions as propositions focus on objects as the bearers of properties or the subjects or agents of actions. Wittgenstein undermines these models of teaching/language in favour of the importance of training in learning and a model of language as use.


Wittgenstein considers an example where a builder shouts "Slab!" to get an assistant to bring them a slab. We might be tempted to say that "Slab!" is a shortening of "Get me a slab", but Wittgenstein points out that it makes equally as much sense to say "Get me a slab" is really a lengthening of "Slab!".

He acknowledges that it might make sense to say that "Slab!" means "Bring me a slab" in contrast to statements such as "Get him a slab" or "Find a slab!", but he rejects going so far as to claim that's the definition of "Slab!" is something along these lines. Part of his argument is to deny that when we say "Slab!" we necessarily form in our minds a full expression like "Bring me a slab". Indeed, if we make extensive use of this expression, I would expect "Slab!" to become something that would be use as a single used of thought. 

For Wittgenstein this isn't just a curious fact about what goes through our heads when we invoke expressions like "Slab!". Rather, it is a point against any theory that claims that such expressions really represent something else, including the theory that the something else is a proposition.

We'll provide further support for this position through another example. Suppose someone says, "Leave!". Picture theory might reinterpret this as a logical proposition identifying the worlds where I leave. But surely, I'm not meant to only picture leaving, but to actually leave. An alternative characterisation of picture theory would interpret this by producing a picture of my brain where I have some kind of desire to leave. But people can say things even when they don't desire them. Perhaps I really want you to stay, but I know it would be best for you to leave. Or perhaps, I don't know what I want, and I told you to leave in response to being overwhelmed.

Later Wittgenstein writes that people who are confused about language will be inclined to ask, ""What is a question?" —Is it the statement that I do not know such-and-such, or the statement that I wish the other person would tell me . . . .? Or is it the description of my mental state of uncertainty?"

All three examples appear to lack a straightforward translation into propositions; at least without being reductive reductive. Instead, these words seem to represent actions and they are better understood as an attempt to achieve a result.

Relevance: These considerations further lay the ground for Wittgenstein's theory of language as use by demonstraing the limit of modelling language as propositions.

Language as Use

Reframing all language in terms of logical propositions is particularly tempting to those with rationalist inclinations, but we already saw in the last section that it isn't as straightforward as we might imagine. 

We noted that if someone says "Leave!" they don't want you to just form a picture of yourself leaving in your head, but to actually actually leave. That is, we can conceive of an agent that has a mere propositional understanding in that they can form a picture of the desired action, but without knowing what they are supposed to do with it. For example, are they supposed to leave, to pretend to leave, say they are thinking of leaving or to do the opposite and not leave?

Compare with the following quote from the investigations:

Imagine a picture representing a boxer in a particular stance. Now, this picture can be used to tell someone how he should stand, should hold himself; or how he should not hold himself; or how a particular man did stand in such-and-such a place; and so on

Perhaps, my example might seem kind of silly, but I'm trying to illustrate a deeper point. Wittgenstein asks whether this kind of propositional understanding is the meaning of expressions and for him it is clear that it is not. While being able to bring to mind appropriate images undoubtedly assists in achieving our purposes, it seems that we use words because we want to actually achieve something in the world. Further, this example illustrates one reason why we might want to follow Wittgenstein and conceive of learning language as training, rather than ostensive teaching.

It's worth noting that Wittgenstein doesn't ideologically insist that language is use in all cases, just that it is use in most cases. We haven't covered the concept of language-games yet, but I would argue that the natural development of Wittgenstein's ideas would be to allow meaning as use to play a larger or a smaller role depending on the particular language-game in question.

Relevance: Rationalists have a tendency to insist on using words rather literally. This certainly has advantages - sticking to this rule reduces our ability to frame situations in accordance with our biases and can help train precise thought. On the other hand, if we accept Wittgenstein's argument that language is primarily about use there's a sense in which this would be attempting to use language in a way contrary to its nature. This isn't an essentialist claim - it's simply a claim that it's often easier to go with the flow such as moving things downhill rather than uphill or driving the "right" way down a street. So this would be an argument for using language in a more "normie" way.

Additionally, I see language as use as broadly analogous to the economic concept of revealed preferences. When economists talk about revealed preferences, on the most part they aren't actually claiming that doing something necessarily means that there's a sense in which we prefer it, rather that given our tendency to be unreflective and to deceive ourselves and others, in many cases it is better to look at what we do rather than what we say. Language as use seems to be making a similar move in terms of language.

Arguably, if we attempt to understand language by asking people what words mean people will just concoct some kind of hand-wavey explaination would provide a misleading picture of how the word is actually used. One strong piece of evidence for this is the difficultly of defining words we use all the time.

We will also link this idea to conceptual engineering, but first we have to cover family resemblances and language-games.

Family Resemblances

Wittgenstein uses this term to describe phenomena that "a have no
one thing in common... but that they are related to one another in many different ways". For example, we can imagine all members of a family appearing similar despite there being no one single feature that every member has. That is, that most of them might have the same eyes, same hair and same facial structure, but for each such feature there is at least one family member who is the exception to the rule.

This broadly corresponds to Elizer's Yudkowsky's description of The Cluster Structure of Thingspace. Eliezer's definition is more amenable to mathematical formalisation, but there are slight differences between the kinds of sets that are defined by having N out of M attributes and those sets that are formed via some kind of distance metric.

If you are worried that the concept of family resemblances is kind of vague, I would suggest that Wittgenstein has a strong defence for not having specified them more precisely. This defence is that family resemblances are themselves family resemblances, which by their nature are not very amenable to precise definition.

Note that Wittgenstein doesn't claim concepts are always family resemblances, for example, he acknowledges that we can provide "rigid limits" for what falls within a particular class. Instead, the point is that often our regular use of langauge often does not draw boundaries nearly so cleanly.

Even though both Wittgenstein and rationalists embrace similar ideas here, there is a stark difference in behaviour. Rationalists understand that there is often no set of necessary and sufficient conditions that captures precisely what we mean by a word, yet often adopt such definitions anyway due to pragmatism. On the other hand Wittgenstein responds by mostly eschewing explicit definitions and instead leans towards listing examples (See Intension and Extension in Logic and Semantics).

Relevance: Understanding this concept helps avoid pointless linguistic disputes or falling into dysfunctional versions of conceptual analysis.


Language-games are another one of the most famous concepts that Wittgenstein introduced. Since Wittgenstein considers these to be a family resemblance, he doesn't provide an explicit definition, but instead explains by way of examples:

Review the multiplicity of language-games in the following examples, and in others:

  • Giving orders, and obeying them
  • Describing the appearance of an object, or giving its measurements
  • Constructing an object from a description (a drawing)
  • Reporting an event
  • Speculating about an event
  • Forming and testing a hypothesis
  • Presenting the results of an experiment in tables and diagrams
  • Making up a story; and reading it
  • Play-acting
  • Singing catches
  • Guessing riddles
  • Making a joke; telling it
  • Solving a problem in practical arithmetic
  • Translating from one language into another
  • Asking, thanking, cursing, greeting, praying.

Roughly, I interpret language-games as referring to a particular activity or purpose for which language is used and which follows its own distinctive rules. I consider it valid to consider a use of language part of a number of different language-games depending on how we want to frame it.

The Stanford Encyclopedia of Philosophy describes language-games as follows:

Language-games are, first, a part of a broader context termed by Wittgenstein a form of life. Secondly, the concept of language-games points at the rule-governed character of language. This does not entail strict and definite systems of rules for each and every language-game, but points to the conventional nature of this sort of human activity. Still, just as we cannot give a final, essential definition of ‘game’, so we cannot find “what is common to all these activities and what makes them into language or parts of language”

Regarding the first point, Wittgenstein seems to believe that it is impossible to understand a language-game outside of the context in which is practised writing: "If a lion could speak, we could not understand him".

Consider the biblical injunction to "turn the other cheek". This is typically understood as suggesting that the most virtuous path is not fighting back against (perhaps figurative) violence, but instead allowing further injury.

On the other hand, Walter Wink has interpreted this as suggesting nonviolent resistance to attempts by their social superiors to humiliate them. One way to humilate someone was to give them with a backhands slap to their cheek. However, by turning their face, it would be impossible for them to be backhand slapped with their right hand as their nose would be blocking their cheek. Cultural tradition forbade the use of the left hand except for unclean tasks and a punch would be seen as a sign of equality.

Regardless of whether of not this theory turns out to be correct, it demonstrates potentially how difficult it can be to understand a statement outside of its cultural context; the seeming impossibility of severing certain expressions from the form of life.

Regarding the notion of language-games having rules, he doesn't mean everything is precisely specified - indeed that would be contrary to his concept of family resemblances. He always doesn't mean that these rules cannot have exceptions; these exceptions often need rules in order to subvert them. Merely that without rules language wouldn't be usable for communiation.

We've already covered our inability to give a 'final, essential definition' in the section on language-games.

Another aspect of the metaphor of language-games that resonates with me is the idea that we can also create new language-games. When a new situations arises, a new language-game can be created to handle it.

Relevance: We've covered a lot of different aspects of language-games, but the most important aspect of it for me is as an exhortation to take seriously the diversity of different ways that language can be used and to be very skeptical about any attempts at universal statements.

For example:

  • If we say all language is an attempt to communicate, what about purposefully silly uses of language that are supposed to make us laugh
  • If we emphasise the importance of connotations in language and make denotations secondary, what about circumstances where someone tells us something we already know? For example, saying "I am the leader" to someone who already knows that you are the leader in order to remind them to show you respect.
  • If we try to define questions in terms of an attempt to get someone to answer, then what about rhetorical questions?
Wittgenstein as Anti-Philosopher

In both the the Tractatus and Investigations, Wittgenstein was highly critical of philosophy. In Tractaus, his critique was that philosophers attempted to debate or argue for theses that didn't correspond to a picture of the world. In Philosophical Investigations, Wittgenstein's critique is that "language goes on holiday".

That is, that philosophy has a tendency to use words outside of the language-games in which they make sense. This results in abstract questions which aren't grounded in anything and don't have any consequences for anything out of philosophy. Wittgenstein suggests that the following words are abused by philosophy: "knowledge", "being", "object", "I", "proposition", "name".

For Wittgenstein the consequence is that the vast majority of philosophical questions aren't real questions, but simply nonsense. These questions aren't resolved (ie. answered), but dissolved. He sees his task as "to show the fly the way out of the fly bottle"; this it to help philosophers escape problems that they created for themselves.

As an example, many philosophers have deeply pondered the question, "What is Being?". For Wittgenstein, this question wouldn't make sense. Wittgenstein suggests that what we mean by Being is "existence and non-existence of connexions between elements". For example, if we say that there happens to be a complete set of playing cards on my desk, I mean that there are elements representing one of each kind of card which are connected by all being located next to each other. And if I say that there happens to be a suitable candiate I may mean that there's a single person who has the skill, determinination and experience to handle the role; that is that these attributes are connected by all being contained in the one person.

For Wittgenstein it doesn't really make sense to talk about individual elements existing in any non-trivial sense. He writes: "If this thing did not exist, we could not use it in our language-game". In other words, saying that a simple element exists says nothing beyond the fact that this element is part of our language-game. Trying to ask what it means for a simple element to exist is confused because the notion of existence doesn't have any non-trivial meaning outside of checking whether their is a thing for which the claimed connections hold.

This is further clarified by the following analogy: Wittgenstein suggests that it doesn't really make sense to say that the "standard meter in Paris" is either one meter long or not. The former is a tauntology so it doesn't really say anything of any significance and the later is false.

I don't find his charactersation of Being as necessarily concerning connections completely persuasive. On the other hand, I think he has a point. When we ask questions like, "What is Being?" it seems quite reasonable to suggest that we are using the word in a way that is different from everyday life. For example, if we ask "What does it mean for an animal to be a orangutan?" we probably don't actually care about whether the orangutan is ultimately built up from atoms or quantum wavefunctions as opposed to wanting to know how to differentiate it from a gorilla.

Here the language-game consists of trying to differentiate one kind of object from another. And if we try to extend this to differentiate objects that are from objects aren't, it's unclear that this makes sense, as perhaps anything we could compare to another is something that is. Obviously, we could try extending the definition from another language-game, but we shouldn't be surprised if we encounter the same issue.

Relevance: The general lesson I take from this is to be very careful about taking a term from within a language-game and trying to use it outside of it. When we ask what these words mean in the abstract we risk having caused the definition to become ungrounded.

Take for example the Sleeping Beauty Problem. This has led to endless debate between the thirders and halvers, but perhaps the issue is that probability is a concept defined for situations where an event can only ever be counted exactly once. Once we leave that region of problem-space it's hardly surprising that the concept of probability starts to come apart. Merely asking, "What is probability?" in the abstract isn't very helpful. Instead we have to pay attention to the language-games that people might want to play in these unusual situations.


Wittgenstein's concept of knowledge is deeply bound up with his concept of language -. Wittgenstein thinks it was a mistake for philosophy to attempt to produce absolute knowledge and that we should only try to produce certainty in terms of what we mean by certainty within our ordinary, everyday language-games.

I can be as certain of someone else's sensations as of any fact. But this does not make the propositions "He is much depressed", "25 x 25 = 625" and "I am sixty years old" into similar instruments. The explanation suggests itself that the certainty is of a different kind"

For Wittgenstein, these are different language-games with different notions of certainty. Different language-games include different methods of either proving or attempting to disprove claims; and claims that pass muster according to these standards are classed as "certain". We might complain that this isn't true certainty as these methods will only reveal the truth under certain assumptions. Wittgenstein's addresses these kinds of doubts when he writes "what we do in our language-game always tests on a tacit presupposition" and "doubting has an end". In other words, this is the best that we can do and seeking absolute certainty like Decartes is the pursuit of foolish philosophers.

Wittgenstein goes further and questions whether we really doubt:

But that is not to say that we are in doubt because it is possible for us to imagine a doubt. I can easily imagine someone always doubting before he opened his front door whether an abyss did not yawn behind it, and making sure about it before he went through the door (and he might on some occasion prove to be right)—but that does not make me doubt in the same case

And also whether we can doubt:

But, if you are certain, isn't it that you are shutting your eyes in face of doubt?"—They are shut.


If I see someone writhing in pain with evident cause I do not think: all the same, his feelings are hidden from me

In other words, let's be honest here and not pretend that we doubt things that we don't. That we can't help to believe.

Elsewhere he writes: "What has to be accepted, the given, is—so one could say—forms of life." We can compare this to David Hume's injection to, "be a philosopher; but, amidst all your philosophy, be still a man.” Hume was a famous skeptic who doubted everything from causation to induction. Nonetheless, there was there was David Hume the philosopher and David Hume the billard player. When it was time to play billards, he simply put aside his skepticism that we had any reason to believe that the balls would move the same way that they had in the past. Wittgenstein's outlook is similar, however for him the only reason to be a philosopher is to avoid being fooled by philosophy.

We can also see Wittgenstein as making a similar move to Decartes "I think therefore I am". Any question or statement can be seen as presupposing the existence of the language-game in which it makes sense. So perhaps being too skeptical ends up being self-defeating?

One of his key epistemological claims is that it only makes sense to talk about knowledge when there is some method of falsification or verification. For Wittgenstein, knowledge refers to beliefs that have passed some kind of test, with the appropriate kind of test depending on the language-game the belief comes from. If there is no test that we can apply, then we can't play any language-games related to knowledge, at least in any non-trivial sense. He uses this argument to try to demonstrate that we can't have knowledge of private sensations.

Wittgenstein presents these methods of falsification or verification as being intersubjective rather than objective. That is, the important element is that other people should be able to tell whether or not we are playing the language game correctly. Without this ability, we wouldn't be able to inculcate people in this language-game.

My main objection to this definition of knowledge is that it carves the joints of the world in a strange way. The existence of some method of verification seems like a somewhat arbitrary criterion as it's not clear that we should put more belief in claims that are unlikely in our prior that have passed some weak level of verification vs. claims that seem extremely likely in our prior, but without any method of verification beyond that it appears to be so to us. Perhaps Wittgenstein could respond that his notion of knowledge isn't supposed to always result in claims designated as knowledge having more certainty than those not so designated. This isn't unreasonable, but it would significantly lower how much is at stake when we are discussing about whether a particular fact is deserving of the knowledge designation.

Relevance: Pragmatically, it makes a lot of sense to utilise different definition of what does or doesn't count as knowledge depending on the use case and the level of certainty required. Naturally, these vary by language-game. On the other hand, if we adopt such a pragmatic definition, we need to be aware of how it limits the importance of what is or isn't given this designation

His claim that we don't doubt certain things is a little black and white for me, but I think he is right to criticise the broadly Cartesian model where we doubt everything and refuse to adopt any belief unless it is proven (I don't want to claim that Decartes actually believed to this model, as opposed to proposing it as a useful thought experiment). Not only does it fail to get us anywhere (we can't get anything from nothing), it fails to acknowledge that there are limitations to how much we can doubt. Frank Ramsey demonstrates how an awareness of this can aid our understanding of epistemology when he writes:

We are all convinced by inductive arguments and our conviction is reasonable because the world is so constituted that inductive arguments lead on the whole to true opinions. We are not, therefore, able to help trusting induction, nor if we could help it do we see any reason why we should.


Throughout the book, Wittgenstein seems somewhat ambivalent on the existence of external reality. While reading this book, I spent a lot of time struggling to identify his position. I think a good entry point is to consider his position on internal feelings:

The thing in the box has no place in the language-game at all; not even as a something: for the box might even be empty.—No, one can 'divide through' by the thing in the box; it cancels out, whatever it is.

That is, for the purpose of understanding language, it doesn't really matter whether there is anything outside of the language-game. This is not the same as claiming that things outside the language-game are unimportant. In our summary of Tracatus, we already observed that he implied that these might be the most important things of all.

There are, indeed, things that cannot be put into words. They make themselves manifest. They are what is mystical.

On the other, this seems like it might be in tension with his suggestion that disputes between "Idealists, Sophists and Realists" are merely linguistic. The way I understand this is that all three may prefer to use different language in a way that doesn't really affect the language-games we play. That is, a realist might say "Bananas are very satisfying", an idealist my say, "The perception of eating a banana is accompanied by the perception of satisfaction" and the Sophist "My perception of eating a banana is satisfying for me".

Relevance: I also think it's quite reasonable to allow that there might be things that we can't know or observe that would nonetheless be important. (I've argued as such in The Universe Doesn't Have to Play Nice). Unlike Wittgenstein I'd take this as supporting the notion that the differences between idealism and realism are significant. At the same time, I have sometimes observed people defend a philosophical position by policing grammar. For example, I've seen people who seem to be allergic to any mention of "objective reality", yet make no objection as long as you avoid the certain keywords.

Suppose I ask about how to grow coconuts. Someone who's an idealist may assert that there's no such thing as coconuts, only the perception of coconuts. Even if I consider this a plausible metaphysical theory you'd hardly be surprised if I were to get annoyed with the person for not answering my question. In this case, as far as I am concerned, the possibility of an external reality is irrelevant. It is a nothing that just cancels out.

Internal Experiences

Just as Wittgenstein tends to be somewhat ambivalent about the existence of the external world, he is tends to be ambivalent about the existence of interal experiences.

Nonetheless, in response to a demand to admit that there is a difference between "pain-behaviour accompanied by pain and pain-behaviour without any pain" he writes:

Admit it? What greater difference could there be?

To the claim that an inner process must take place:

What gives the impression that we want to deny anything?

Again, his position seems to be that the possible existence of inner processes has no impact on how the relevant language games work.

Wittgestein famously considers the possibility that "only I can know whether I am really in pain" declaring that "in one way this is wrong and in another way it is nonsense". He argues that in the regular, everyday use of the term "know" other can know that we are in pain from how we are acting. And, he argues that on the contrary, if we take pain to be an internal experience, we can't know that we're in pain as there wouldn't be any criteria to distinguish between thinking and knowing.

He boosts these arguments by reminding us of the inherent slipperness of memory. That even if I think my experience of smelling lavander matches when I smelled it in the past, that I might misremember. And that I don't really have any criterion about whether two experiences match apart from the fact that they appear this way to me.

When reading The Investigations, I often struggled to try to figure out why he didn't consider himself a behaviourist. In response to, "Are you not really a behaviourist in disguise? Aren't you at bottom really saying that everything except human behaviour is fiction", he responds, "If I do speak of a fiction, then it is of a grammatical fiction"

I understood this as him claiming not that internal sensations are a fiction, but rather that when we appear to be speaking of internal sensation, this is usually only the surface-level appearance. For example, he suggests that "There has just taken place in me the mental process of remembering" can be understood as just another way of saying, "I have just remembered". In other words, he tends towards interpreting talk about internal processes figuratively. He wants to allow this figurative use, but whilst also denying "the yet uncomprehended process in the yet unexplored medium".

Another one of his key arguments is that inner experiences aren't vital for language to have a particular meaning or for us to have a particular understanding. He has a number of arguments for this. One is that if he introspects, sometimes these experiences are there, but sometimes they are not. Another is that if we learned that someone had the same feeling when using "if" and "but" we might think it unusual, but we wouldn't claim they didn't understand how to use the words. I saw these as reasonable objections to attempts to ground meaning in emotions, however he seems to have been assuming that these processes must be conscious, which seems like a very questionsable assumption.

Relevance: Even though "He is in pain" nominally claims that someone is undergoing a particular experience, that need to be the intent. Someone may not, for example, believe that humans are nothing but machines and disbelieve the notion of subjective experience, yet they may make that statement so that the person is administered pain-killers so that they shut up. Alternatively, we can imagine someone who isn't really sure about whether or not qualia exists, but who figures that they may as well proceed on the basis that there is qualia because that is the most natural thing to do.

As another example, suppose someone says, "I'm thirsty". Nominally, it asserts that the are experiencing the sensation of thirst. But perhaps they aren't experiencing that sensation at all and they just want you to pour them a beer. So even if Wittgenstein is wrong to explain "I'm thirsty" by exclusively focusing on its role in the language game, it also seems to be a mistake to exclusively present it as an expression of a subjective feeling. Perhaps we should take an intermediate position instead.

The Investigations serves as a reminder not to take things too literally, which is a trap that rationalists often fall into, my past self included.

The Private Language Argument

The Stanford Encyclopedia of Philosophy (SEP) describes the private language argument as claiming the impossiblity of a language that is "in principle unintelligible to anyone but its originating user".

Attempting is discuss this argument is extraordinarily difficult. As SEP says, "Even among those who accept that there is a reasonably self-contained and straightforward private language argument to be discussed, there has been fundamental and widespread disagreement over its details, its significance and even its intended conclusion, let alone over its soundness. The result is that every reading of the argument (including that which follows) is controversial"

A significant part of the debate seems to concern whether Wittgenstein is claiming that there is something we cannot do (ie. produce a private language) or whether he is claiming that the concept of "private language" is nonsense. However, I don't see a need to focus too much on this distinction. 

SEP also suggests that Wittgenstein might be responding to Bertrand Russell's ‘The Philosophy of Logical Atomism’:

A logically perfect language, if it could be constructed, would not only be intolerably prolix, but, as regards its vocabulary, would be very largely private to one speaker. That is to say, all the names that it would use would be private to that speaker and could not enter into the language of another speaker

… A name, in the narrow logical sense of a word whose meaning is a particular, can only be applied to a particular with which the speaker is acquainted, because you cannot name anything you are not acquainted with.

... We say ‘This is white’. … But if you try to apprehend the proposition that I am expressing when I say ‘This is white’, you cannot do it. If you mean this piece of chalk as a physical object, then you are not using a proper name. It is only when you use ‘this’ quite strictly, to stand for an actual object of sense [i.e., a sense-datum], that it is really a proper name. And in that it has a very odd property for a proper name, namely that it seldom means the same thing two moments running and does not mean the same thing to the speaker and to the hearer.

Bertrand Russell had the ideal of constructing a logically perfect language. This would be build it up from simple objects of sense-datum rather than external, physical objects presumbaly because of the inherent uncertainty in deriving what actually exists from sense data.

Note that Bertrand Russell conceded that his position meant other people wouldn't understand the proposition that he meant by "This is white" and that the meaning of that proposition would change from moment to moment.

Compare this to Wittgenstein's beetle-in-a-box thought experiment:

Now someone tells me that he knows what pain is only from his own case!——Suppose everyone had a box with something in it: we call it a "beetle". No one can look into anyone else's box, and everyone says he knows what a beetle is only by looking at his beetle.—Here it would be quite possible for everyone to have something different in his box. One might even imagine such a thing constantly changing.—But suppose the word "beetle" had a use in these people's language?—If so it would not be used as the name of a thing. The thing in the box has no place in the language-game at all; not even as a something: for the box might even be empty.—No, one can 'divide through' by the thing in the box; it cancels out, whatever it is.

There seems to be very little difference between their positions. I suppose, as they say: one man's modus ponens is another man's modus tollens, although perhaps Russell would dispute that the "thing-in the-box" has no role in the language-game. After all, even if we can't be sure that our sensations are the same as that of other people, searching for a sensation seems to be part of how we understand when to use these terms.

For example, when learning to taste wine I might take a sip, but miss a flavour because I'm distracted by other flavours and then only notice the flavour I was supposed to note upon taking another sip. If repeated attempts at tasting the flavour failed to detect it, I might wonder if there was something unusual about my taste receptors or whether I just wasn't paying close enough attention during the tasting. Wittgenstein would undoubtedly describe this process as a language-game without any reference to this private sensation, but I would see this as a reductive account that has leaves out a core part of the phenomenon.

Wittgenstein's definition of knowledge is key to his argument for private languages being impossible. If we accept his argument that we can't have knowledge of internal sensations, then anything that private language could communicate would be of dubious authenticity. I've already explained my skepticism of his epistemology which leads me to also be skeptical of the private language argument.

Relevance: His private language argument is probably the position he takes that I am most critical of. Nonetheless, I think there is an important insight here. We shouldn't just take statements at face value, but rather we should always keep in mind the language-game that is being played and what function the particular use of language serves. This leads to the insights found in the section on Subjective Experiences.

The episode In Our Time that focused on Wittgenstein containing an interesting reframing of his private language argument. We often see language as something in which we clothe our pre-existing thoughts so as to express them to others. However, participating in language-games with others is a key part of how we acquire the concepts and distinctions we make use of during thinking. We might expect that an individual who grew up isolated from others since their childhood would, at best, only be able to produce a severely impoverished form of language without interacting with other.

Link to Linguistic Freedom

I've previously used the term linguistic freedom to describe the freedom that we have to define or redefine words to suit our purposes. This is related to the idea that the Map is Not the Territory as once we have realised that we construct the map, it is natural to assert the freedom to draw the map differently in order to better suit our purposes. At a number of points, Wittgenstein seems to argue for this kind of linguistic freedom:

What about the colour samples that A shews to B: are they part of the language? Well, it is as you please.


But how we group words into kinds will depend on the aim of the classification,—and on our own inclination

Logic and Maths

Wittgenstein criticises the idea that logical statements are true a priori:

We want to say that there can't be any vagueness in logic. The idea now absorbs us, that the ideal 'must' be found in reality. Meanwhile we do not as yet see how it occurs there, nor do we understand the nature of this "must". We think it must be in reality; for we think we already see it there.

In other words, don't just assert that logical statements "must" be true without understanding why you think they must be true.

And towards the end:

But: if anyone believes that certain concepts are absolutely the correct ones, and that having different ones would mean not realizing something that we realize—then let him imagine certain very general facts of nature to be different from what we are used to, and the formation of concepts different from the usual ones will become intelligible to him

Perhaps we lean towards classical logical because we exist at the macro scale, but if we were to exist at the quantum scale we'd free notions of logic that allow things to be both true and false.

For the crystalline purity of logic was, of course, not a result of investigation: it was a requirement

We think logic must necessarily be true because we cannot think of any examples where it is false. But of course, if we could have thought of any counter-examples, then we wouldn't have constructed the rules of logic this way.

I think the Liar's Paradox is instructive here. It's very common to think that statements that aren't nonsensical must be true or false. However, the Liar's paradox explodes this. Did this cause people to abandon logic?

Of course not, people just update their scheme by excluding Liar sentences from meeting this requirement. This doesn't look like a priori knowledge, but rather a ad-hoc patch. And perhaps we can't think of any counter-examples to logic because if we could we would have already adjusted logic to be compatible with what we observe.

Mathematicians do not in general quarrel over the result of a calculation. (This is an important fact.)—If it were otherwise, if for instance one mathematician was convinced that a figure had altered unperceived, or that his or someone else's memory had been deceived, and so on—then our concept of 'mathematical certainty' would not exist.

Link to Conceptual Engineering

Conceptual engineering is the idea that instead of asking what a word means, we should be constructing words that serve particular purposes. It is often contrasted to conceptual analysis to attempt to find necessary and sufficient conditions for being a part of a particular class and in particular tries to avoid the existence of any counterexamples.

I suspect that if analytical philosophy had placed more emphasise on Wittgenstein that they wouldn't have fallen into the trap of counter-example philosophy that arose from conceptual analysis. LukeProg describes the problem as follows:

The trouble is that philosophers often take this "what we mean by" question so seriously that thousands of pages of debate concern which definition to use rather than which facts are true and what to anticipate.

While as far as I know, Wittgenstein doesn't explicitly argue precisely for conceptual engineering, it seems to arise naturally from his philosophy. Firstly, his model of words as family resemblances suggest that it'll be impossible to find necessary and sufficient conditions that avoid all counterexamples. Secondly, his model of language as use seems to suggest that what is important about words is the purpose that they serve. Thirdly, his argument for language-games and a kind of linguistic freedom seems to suggest that we have the ability to collectively define and redefine conventions about how words are used in order to make them more convenient for us. Lastly, Wittgenstein even compares the different functions of words to the different tools in a toolbox.

Should you read this book?

I am a huge fan and would strongly recommend it, however, you should note that the text is deceptively difficult to make your way through and is best read slowly. My reasons for recommending this are as follows:

  • Wittgenstein is eminently quotable so it is much better to hear him in his own words.
  • This text is unlike almost any other philosophical text that you will ever read. Unlike analytical philosophy, it more often makes its arguments implicitly rather than explicitly. However, his thinking is much less fuzzy than that of Continental Philosophers who in many cases are best thought of as writing poetry. And, thank God, Wittgenstein actually writes in clear language!
  • This text by its very nature resists summarisation. Wittgenstein goes off on countless tangents and it would be impossible to cover all of them. So many sentences can be interpreted in a multitude of different ways. And even when he repeats himself, he approaches things from a fresh perspective that adds new insight.


Quantifying Risk

12 октября, 2021 - 18:02
Published on October 12, 2021 2:41 PM GMT

Epistemic Status: Providing a jumping-off point for further discussion, not intended to be a final authority by any means.

With COVID, a lot of people in the rationality community have had to fundamentally reassess what kind of risks they're willing to handle. We've used microCOVID extensively, tried to figure out what our chances of severe illness or death are, and decided on what precautions to take relative to that.  

Regardless, it's incredibly difficult to fundamentally assess risk or the tradeoffs of risk. The EPA values a statistical life at $7.4 million based on surveys of people asking what they valued tiny risks to their life to be. So should we measure risk in dollars, trying to quantify risk using the universal currency of utility, and figuring out exactly what the potential impact of that risk is? I'd argue against it. 

My moral values don't prohibit valuing life in dollars, nor would I want them to, but I think that in general it's pretty socially frowned upon to use monetary values as a way to measure statistical life; moreover, we get into questions of what trying to quantify life using money means, and generally I think people would value their own lives very differently from $7.4 million dollars if they were ever in that situation. 

Microdeaths might work better as a statistical measuring tool. But I think that the best way to quantify any risk might be to relate it directly to a risk that people take every day and generally know is pretty dangerous—driving. Already, driving is considered a good comparison point; when trying to evaluate COVID risk recently, I've consistently compared my chances of dying at any given activity to my chances of dying on the car ride there. (The car ride usually wins, often by an order of magnitude or so). So I think the best way to evaluate risk might actually be in miles, where each mile is equivalent to roughly 12 nanodeaths, or a 12/1,000,000,000 chance of dying.

What does this mean for risks like COVID? Well, combining CDC data with the Economist's calculator from March, there was a 0.1% chance of death for an unvaccinated 35-year-old male who got COVID, which did not change substantially with Delta. Vaccination lowered that risk by a further (pessimistically) 92%, so a fully vaccinated 35-year-old male has around 8,000 microdeaths by getting COVID, or 666,666 miles of driving. Thus, a single microCOVID for this fully vaccinated person is "equivalent" in death risk to around 2/3 of a mile.

We don't have a good understanding of the risks of a lot of activities yet, and this calculation does not take into account things like hospitalization risk and contagion to more vulnerable people, so take any calculation you do with a grain of salt. But hopefully, comparisons like this will make it easier to visualize the actual risk of something, and be more or less careful as necessary.


Building Blocks of Politics: An Overview of Selectorate Theory

12 октября, 2021 - 14:27
Published on October 12, 2021 11:27 AM GMT

From 1865 to 1909, Belgium was ruled by a great king. He helped promote the adoption of universal male suffrage and proportional-representation voting. During his rule Belgium rapidly industrialized and had immense economic growth. He gave workers the right to strike. He passed laws protecting women and children. Employment of children under 12, of children under 16 at night, and of women under 21 underground, was forbidden. Workers also gained compensation rights for workplace accidents and got Sundays off. He improved education, built railways and more.

Around the same time, Congo was ruled by an awful dictator. He ruled the country using a mercenary military force, which he used for his own gain. He extracted a fortune out of ivory. He used forced labor to harvest and process rubber. Atrocities such as murder and torture were common. The feet and hands of men, women and children were severed when the quota of rubber was not met. Millions have died during his rule.

The catch? They were the same person - King Leopold II of Belgium. He's part of a small club of people that have led more than one country, and might be the only one who did so simultaneously. What made the same person act as a great king in one nation and a terrible dictator in the other? If neither innate benevolence nor malevolence led to his behavior, it has to be something else.

Leopold II, 1900

This post covers Selectorate Theory. We'll come back to the story of Leopold and see how this theory explains it, but first, we have to understand the theory. 

The theory takes a game theoretical approach to political behavior, by which I mean two things. First, that it's built on a mathematical model. And second, that it's agent and strategy based. That means the analysis doesn't happen at the level of countries, which aren't agents, but at the level of individuals, like leaders and voters, and that the behavior of these agents is strategic, and not a product of psychology, personality or ideology.

This abstraction makes this model more generally applicable beyond countries to any hierarchical power structure, such as small local governments, companies, and even small teams and groups, but to keep things simple I'll only talk about it in the context of countries.

I will try to give a comprehensive overview of the theory based on the book The Logic of Political Survival. We'll start with the basic framework, then go through the predictions and implications the authors talk about, then I'll mention further implications I think the theory has.

I won't go over the statistical evidence for the theory, except for a brief comment at the end, or over the mathematical model itself[1] - the post is long enough without it - but I might do that in future posts.

To give some background: Selectorate Theory was developed by Bruce Bueno de Mesquita, Alastair Smith, Randolph M. Siverson, and James D. Morrow.

They introduced it in The Logic of Political Survival and later the first two authors wrote a more public oriented version in The Dictator's Handbook.

I want to thank Bruce, the first author, for reading this article before publication. I sent him a question, not even sure I would get a response, and mentioned the article, saying I'd be happy to send it to him. He responded in just two hours and agreed.

Also thanks to Shimon Ravid, Nir Aloni, and Daniel Segal for beta reading this article.

The Basic Framework

The theory is based on the idea that the primary goal of leaders is to remain in power, or put simply, to survive, and that the behavior of organizations can be predicted through the optimal survival strategy for the leader, which depends on various properties of that organization.

To do all that, we need to make some assumptions and build a simple model of a country and the people and groups in it. The theory doesn't use abstract terms like "democracy" and "dictatorship" to define nations, instead, it tries to derive them from other properties.

The Groups

Every country has a leader, and usually also a challenger for leadership. The other residents of the country are split into three groups, the Winning Coalition, which is part of the Selectorate. Those not in the selectorate are the Disenfranchised.

The Leader

The leader or leadership is the one who can make policy decisions - this means Tax policy, and spending policy, which is the allocation of tax revenue to public goods and private goods

These two assumptions about the leader are the basis for the whole theory:

  • No ruler rules alone. Every leader has to satisfy at least some people in order to rule. If they don't satisfy them, they'll be deposed.
  • The leader's goal is to gain as much influence/power/money as they can, and to keep it for as long as they can. This may sound cynical. And it might be, somewhat. But it also makes sense. Holding office is required to achieve the leader's personal goals - whether these goals are selfish or altruistic. To some people holding office isn't that important, but these people don't usually become leaders, and if they do, they don't stay long.

The leader's desire to survive stays constant, but the most effective survival strategy changes depending on the size of the other groups and other facts about the nation.


All residents engage in economic activity, pay taxes, and benefit from public goods. The size of the population determines the cost of providing public goods and increases how much tax can be collected. Residents may be included or excluded from the selectorate. Those excluded are called the disenfranchised.

The Winning Coalition 

The Winning Coalition are the essentials, the keys to power - The people the leader has to satisfy to survive. The leader does that by rewarding them with private goods. The size of the Coalition (w) is one of two most important characteristics of a nation. 

When the coalition is small, the leader can give private rewards to each person in the coalition. The more the coalition grows, the more expensive it becomes to produce private rewards for all coalition members, so the leader starts producing public goods instead.

This creates an interesting dynamic. When the coalition is sufficiently small, making it smaller is within a coalition member's interest (as long as they aren't the ones getting ejected, of course) since it lets them demand higher pay from the leader. As the coalition gets larger, there comes a point where it's better for the coalition to expand, as all of them already get so little private goods, that they can all benefit more from the leader creating more public goods and less private goods.

We'll see how small winning coalitions create autocracies and monarchies, and large coalitions create democracies.

The Selectorate 

The Selectorate are those who can influence who gets to be the leader (say, by voting). The size of the Selectorate (s) is the other most important characteristic of a nation. They do not get private rewards from the leader, but still benefit from public goods. The base rate probability of being included in the coalition for any selectorate member is w/s

The selectorate wants the winning coalition to expand, since then more money will be spent on public goods, and it increases their own chance of getting into the coalition. They don't want the selectorate to expand as that decreases their chance of inclusion in the coalition - though this effect gets weaker as the coalition grows and more public goods are produced.

In the real world, common characteristics societies use to divide people in and out of the selectorate include birthplace, lineage, skills, beliefs, knowledge, wealth, sex and age. In the Coups and Revolutions section, we'll see how military ability matters especially.

The disenfranchised 

The disenfranchised are those who don't have any influence over who gets to be leader. They too do not get private goods, but still benefit from public goods. The disenfranchised want the coalition to expand for the same reason as the selectorate. They also want the selectorate to expand so they may be included, but have no established way of making that happen - other than violence and asking nicely.

The Challenger 

The Challenger is a person that challenges the current leadership in order to replace it. The challenger has a commitment problem - they have to get support from at least some members of the current coalition to win, but even if they promise to those who defect to their side that they will get more rewards than they currently do, they can't guarantee that, or even guarantee that they'll remain in the winning coalition at all.

The challenger can be anyone, but challengers from within the current coalition have an inherent advantage - they automatically get and take one supporter away from the current incumbent.

The challenger usually has a similar interest to the current incumbent (except who's the leader, of course) since they wish to replace and get the same benefits as the leader, or more. For example, if the winning coalition grows, the country the challenger is trying to take over now has a larger coalition, which makes it less valuable.


Every game theory model has to state what agents desire and get value from, and every model of a country needs to model some basic economics.

In this model the things people value are:

  1. The untaxed portion of their economic activity
  2. Leisure
  3. Public goods
  4. Private goods (only available for coalition members)

And the leader values:

  1. Above all else - staying in office. If the leader fails to stay in office nothing else matters.
  2. And, if they remain, tax revenue not spent on public or private goods.

In the mathematical model these are precisely defined utility functions with diminishing returns and temporal discounting. If you don't know what that means, you can ignore it.

Economic Activity and Leisure

Residents split their time between economically productive activities, which we'll shorten to work, and economically unproductive activities, which we'll shorten to leisure.

More specifically, work refers to activities that can, and leisure to activities that can't, be subjected to:


The leader decides on the tax rate, and collects the revenue. 

The theory defines the tax rate as the percentage of total economic products the government extracts from the residents. No complex tax policies here - any such policy is simplified to that definition for analysis. But as we'll see, the theory does make predictions about more complex tax policies.

When the tax rate is 0%, residents split their time equally between work and leisure. As the tax rate increases they work less, until at 100% they spend all their time on leisure.

As people work more their income increases, which further increases the money available for taxation. Together this creates a tension between tax rates and GDP (the sum of what is produced by the economic activity of residents).

  • High tax rate > Less productive economic activity > Lower GDP overall
  • Low tax rate > More productive economic activity > Higher GDP overall

The tax revenue is a percentage of the GDP, so the leader always wants to find the tax rate that will create the most revenue.


The leader splits their tax revenue between private goods and public goods. Whatever isn't spent on those is the surplus, with which the leader can do whatever they want - engage in kleptocracy and keep it to themselves, invest it in some pet project, or keep it as a cushion against future political rivals.

Goods are assumed to be "normal", such that more is always better.

The optimal spending strategy for the leader requires finding how much needs to be spent on the coalition in total, and how much of that should be split between private and public goods.

Private goods only benefit coalition members. The pool of private goods is divided between the members of the coalition, making the value of private goods shrink as the coalition size increases.

Public goods are indivisible and non-excludable - they benefit everyone and have to be provided to everyone. Think roads, defense, education, sewage, the grid and communications. The price of public goods rises with the size of the population.

It's not necessary that any one good will be a pure private or public good - the theory simply deals with how much is spent on each type. Almost any public good will also have private benefits. If in the real world one of the things I listed as a public good is excluded or divided, it just becomes partially private.

Loyalty and Replaceability

The loyalty norm refers to how loyal to the leader are the coalition members . It is defined as the size of the winning coalition divided by the size of the selectorate (w/s) and it is also the base rate probability that a selectorate member will be part of a winning coalition.

A strong loyalty norm happens when coalition members are easy for the leader/challenger to replace. A weak loyalty norm happens when it's hard. The larger the selectorate is compared to the coalition, the more replacement options there are, which makes it easier to replace coalition members.

  • Selectorate size close to coalition size > Large w/s ratio (closer to 1) > Hard to replace members > Weak loyalty norm.
  • Selectorate size much larger than coalition size > Small w/s ratio (closer to 0) > Easy to replace members > Strong loyalty norm.
The three rough clusters of political systems (the leader's preferences go from left to right) 

A weak loyalty norm means members of the coalition are more likely to defect to the challenger (since the probability of being included in the coalition is higher), and will require more spending from the leader to stay loyal. A strong loyalty norm means low chances of defection, and less required spending. Needless to say - Leaders like strong loyalty from their supporters.

This creates two competing effects on the coalition's welfare. On one hand, expanding the coalition reduces the amount of private rewards each member gets, on the other hand, if the selectorate size is kept constant, it increases the total amount spent on the coalition.

The following graph shows the relationship between the size of the coalition and these two effects.

The Logic of Political Survival, figure 3.2, reproduced

Whether the coalition prefers to shrink or expand depends on where they are on this graph.

Coalition members prefer weak loyalty. When they're on the left side of the graph they only want to do so by shrinking the selectorate, since expanding the coalition would hurt their welfare, but on the right side of the graph both options are good for them.

Shrinking the coalition without shrinking the selectorate will increase loyalty, but if the coalition is small enough the extra goods compensate for it.


Affinity represents the idea that there's some bond between leaders and followers independent of policy that can be used to anticipate each other's future loyalty. All else being equal, people prefer to support leaders they have affinity for, but they won't support a leader with worse policy due to affinity. It is used in the mathematical model only for tie breaking, and isn't necessary for any of the main conclusions of the theory.

Leaders include in the coalition those they have the most affinity for. But, affinity has to be learned, and can never be known perfectly. Affinity is learned by staying in power. Challengers can have some knowledge of affinities before coming into power, but they'll always learn more once they're in power, and will remove and add coalition members as they do.

This asymmetric knowledge of affinity creates the Incumbency Advantage, expanded upon later.


The deposition rule defines the circumstance under which an incumbent leader is deposed. In the book they use a deposition rule called constructive vote of no confidence, which simply means a coalition of size w is both sufficient and necessary to stay in office (though not sufficient to get it in the first place). For the challenger to win, they must both have enough supporters that they can create a coalition of size w, and get enough people to defect from the current leader so that they lack w supporters. In other words, if less than w of the incumbent's coalition supports the incumbent and at least w of the challenger's coalition supports the challenger, the incumbent is deposed and replaced by the challenger. Otherwise they stay. Hence the amount of people who's choice actually matters is never greater than 2w.

(The authors say that other deposition rules are plausible, but produce similar results, so they focus only on this one. We'll take them at their word for now and do the same, since we'll need to reproduce the mathematical model to see for ourselves.)

Coalition members who are only in the Incumbent's coalition will always prefer to support the incumbent, likewise for the challenger's coalition. Hence the decision depends on those who are in both coalitions, and on them the challenger has to compete with the incumbent, by offering a better deal.

The incumbent, to stay in office, has to at least match the challenger's offer. So the incumbent's strategy is to maximize the surplus after offering their supporters at least as much as the challenger's best possible offer.

Incumbency Advantage

The incumbent has the advantage, since they have better knowledge of affinities and can promise inclusion in the coalition and private goods more credibly, while the challenger cannot credibly promise to keep supporters in their coalition. The Incumbency Advantage is inversely related to coalition size, as the larger the coalition the less private goods matter.

The more the selectors know the affinity between them and the challenger, the lesser the incumbency advantage, and the more they'll be willing to defect. The incumbent counters that by oversizing their coalition, so they can punish defectors and still retain power.

The risk of defection moves from the risk of not being included in the challenger's coalition, to the risk of being excluded from the incumbent's coalition, or a mix of the two.

With the model in mind, we can see how the interaction of all these interests and incentives imply and predict various political behaviors.

Scope and Limitations

But before I get into the implications and predictions of the theory, I want to lay out the scope of the theory and its limitations.

  • The model doesn't distinguish between one ruler having all authority to set policy and a large group of legislators all capable of setting policy. For the purposes of the theory, they're treated as an individual and their inner group dynamics aren't addressed. This might sound like a big shortcoming, but I think the theory does exceptionally well even with democracies, considering that it abstracts the decision making process so much. Also unaddressed are questions of separation of powers and checks and balances.
  • The theory assumes no limitation on the implementation of policy. The theory has implications on how inefficiencies are addressed and how strategies are implemented, as far as they can be described as goods, but not on what the strategies themselves are.
  • The model treats good very abstractly. It does not deal with the question of which goods are prioritized (beyond public and private).
  • The model also assumes all members of all groups to be identical (except for affinity). There are no differences in competence. Particular interests (beyond what is covered above in economics) like protecting the environment, advancing science, or buying lots of yachts are not represented. Leaders don't represent people who share their opinions, but those who share their interests (and are in the coalition).
  • The theory naturally lends itself to being fractal - meaning every group might have subgroups with a similar structure, where the leader of the subgroup is an individual from the super-group. For example, a member of a country's selectorate or winning coalition might be the mayor of a town. With that said, the analysis in the book focuses on one level at a time, and doesn't consider interplay between levels (Though see bloc voting later in the article, which comes close to that).

That said, we should see that the insights from this theory have implications on all these questions when explored on their own.


With all those limitations in mind, the authors still extrapolate the implications of the model to a vast array of subjects, giving many concrete predictions. In this section I'll try to give a comprehensive overview of these implications and predictions.

Form of Government

The three general clusters of polities produce the characteristics of various regime types we're familiar with.

  • Large winning coalition systems resemble democracies - The leader requires a large supporter base, near or totally universal suffrage is common, plenty of public goods are provided and relatively little private goods, taxes are lower and economic productivity is higher.
  • Small-coalition, small-selectorate systems resemble monarchies and military juntas - The leader requires a small supporter base chosen from a small group such as aristocrats, priests and military persons, little public goods are provided and many private goods, people aren't rich. Examples: Old England monarchy, Saudi Arabia Monarchy, Argentine Junta in 1976-1983
  • Small-coalition, large-selectorate systems resemble autocracies - The leader requires a small supporter base chosen from a vast pool of potential people who otherwise usually only participate in rigged elections, the amount of public goods is tiny and the amount of private goods big but smaller than in monarchies, the leader extracts the vast majority of people's wealth and people are extremely poor. Examples: The soviet union, North Korea, Maoist China.

Smaller variations in size can account for variations within these regimes. It's hard to say which of two democracies is more democratic, or what makes it so, but if we can estimate the winning coalition of both, it's easy to say which is larger and what we should expect based on that. Not all democracies are the same, and neither are monarchies and autocracies - some more extreme and some milder.

I will sometimes use these regime types instead of specifying coalition and selectorate sizes, but remember what it represents are coalition and selectorate sizes. I do it mostly because it makes for less awkward phrasing, but also to reinforce the connection.

Transitional Democracies

When autocracies transition to democracies and expand the selectorate faster than they expand the coalition, the loyalty norm increases, which mimics the structure of a more autocratic system where the coalition is smaller relative to the selectorate. In such cases transitional democracies will temporarily exhibit more autocratic behavior like kleptocracy and willingness to start wars. This shouldn't happen in transitional democracies that either increase the coalition first, at the same rate, or faster than the selectorate.

Presidential VS Parliamentary Democracy

In presidential systems the leader is usually elected directly by the people. In parliamentary systems the people choose a group of legislators which choose the leader themselves. As we'll see in the next section, this means they require less votes to be elected, leading to a smaller coalition. The US, which has a presidential system but also has indirect elections through the electoral college, is an exception.

Federalism and Localism

The authors predict that corruption will "rise as one moves down the ladder from the central governments to state or provincial governments and on down to city, town, and village governments. Each successive layer relies on a smaller coalition and so provides more incentive to turn to private rewards rather than public goods as the means of maintaining loyalty. That incentive may be partially offset by the central government's incentives to protect the rule of law, one of the central public goods it can be expected to provide."

Federalism should let people benefit from both the benefits of large states and the benefits of small states.

Correlated Support, Bloc Voting and Indirect Election

The basic model assumes selectors are independent - the choice of one selector doesn't influence the choice of another. But of course that's not the case in reality. If we relax this assumption we can see how correlation in selector support effectively reduces coalition size.

The people's choice of support one person can influence, how many people their support correlates with, the more valuable they are as a member of the coalition, and the less the people influenced are. This applies to influential writers, speakers, celebrities, prominent community figures, owners of media outlets and so on.

Bloc voting is when a group votes similarly, usually based on the directives of one person.

In such a case that person becomes highly valuable as a member of the coalition, since their support is effectively equivalent to the size of the group that follows them. The leader would want them in the coalition, but not their followers.

Whether the followers benefit from the bloc leader being in the coalition depends on whether that leader shares their rewards with them, which will depend, like just the leader of the country, on the structure of that group (See note about fractality in Scope).

When leaders can't reward people directly for their support, like in democracies where the vote is anonymous, they may still be able to reward groups. For example in Israel each ballot box is counted independently, and then the results from each ballot box and in each town is made publicly available. You can see the last election results here. This makes it very easy for politicians to invest more in places that support them and ignore those that don't.

Bloc voting can be institutionalized through indirect election, where instead of directly choosing the leader, citizens choose electors who choose the leader for them.

The US has the electoral college. In Israel the prime minister is chosen by the Knesset. In both cases they're not completely free to support whoever they want, the US electors can have limitations set on them by the states, and in Israel Knesset members need to be careful of displeasing their supporters, but in both cases it still reduces the influence of the people on the final outcome.

(The Dictator's Handbook splits the selectorate in two to make this distinction between those who can potentially influence, which they call the "nominal selectorate" or the "Interchangeables", and those who actually choose, which they call the "real selectorate" or the "Influentials". Though this distinction is useful for bloc voting and indirect elections, it's not consequential elsewhere, so I chose not to use it.)

Selectorate theory suggests that leaders have an interest to increase things that cause vote correlation such as ethnic, racial, religious, linguistic and other social divides. Residents benefit instead from increased independence of votes.

Term Limits and the Verge of Deposition

Leaders that expect to be deposed the next time they're challenged have nothing to lose, but much to gain if they can manage onto hold to power. Therefore they'll be more willing to do reckless things to survive, like going for a diversionary war.

A term limit creates two opposing effects. 

  1. It reduces the incumbent's advantage, because they can't supply private goods beyond the end of their term. This forces the leader to work harder to please their supporters.
  2. It removes any reselection incentives by decoupling policy performance and survival, making the leaders stop working for the state, and turn kleptocratic.

The second effect comes from having nothing to lose, but since there's also little to gain from reckless actions, the leader is more likely to turn to kleptocracy to make the best of the remaining time than to do something that will keep them in power. Civic minded leaders may use this freedom to take actions that the public would like but the coalition wouldn't.

Post office consequences for kleptocracy can reduce the second effect.

The effect of term limits on spending by incumbents. The Logic of Political Survival, figure 7.3, reproduced and simplified. 

Enforcing term limits requires the winning coalition to remove the incumbent. Since in small coalitions the value of inclusion and risk of exclusion are higher, the members don't want to enforce the limit and risk exclusion. This is why autocracies rarely have them, democracies often do, and some autocracies (mainly those with rigged-elections) have fake, unenforced term limits.

Political survival

The theory predicts that leaders in autocracies survive longer in office than in democracies, with monarchies in between.

This stems from the Incumbent's advantage in guaranteeing inclusion in the coalition and promising private goods. So as the coalition expands and public goods become more important, the incumbent's advantage diminishes.

  • Small coalition > competition is over the provision of private goods > The incumbent has a big advantage
  • Large coalition > competition is over the best public goods policy > The incumbent has a smaller advantage

In the early period in office the new leader still hasn't learned affinities and sorted out his coalition, and therefore he lacks the incumbency advantage. Competitors will prefer to depose them as quickly as possible to take advantage of that. Therefore the early phases are most dangerous, but if the new leader survives them, they can persist for very long. This affects small-coalition systems more than large ones since their leaders are more dependent on private goods. This creates a higher variability in tenure in small coalition systems than large coalition systems.

Autocracies often have leaders like Stalin and Gaddafi who ruled for 39 and 42 years respectively. But they also more often have leaders like Bachir Gemayel who only survived two weeks in office before being assassinated. In democracies most elected leaders serve their full term, and are voted out after one or several terms if they don't hit a term limit.

Former leaders are dangerous to current leaders, as they're similar to a challenger with very good mutual knowledge of affinities with their supporters. The more important political survival is to the leader, the more incentive new incumbents have to permanently get rid of (say, by killing) the deposed leader. This leads us to expect that deposed incumbents are most likely to be killed or exiled in small coalition systems, and even more so when the selectorate is large. This can be seen as another reason leaders will want to keep power, though that's not included in the model.


When leaders are terminally ill, coalition members know that soon they will stop receiving private goods. This breaks their loyalty and drives them to defect. It might even become a competition of who defects more quickly to the new leader.

This makes small coalition leaders hide their health status from their supporters. This effect diminishes with coalition size as private goods become less important.

One way leaders can mitigate it is by having credible heirs who will take their place but keep the same coalition. Then coalition members have less to worry about not being included in the next coalition, and are happier to stay loyal.


Everything Leaders do they do with the purpose of keeping and gaining power, therefore as long as the coalition doesn't know, any effects their policies have after they leave office are unimportant. Policies that will have a good effect in the future are good only if the coalition knows. Similarly, Policies that will have a bad effect in the future are bad only if the coalition knows.

As we saw in the last section, democratic leader survive much less than autocratic leaders, so although autocratic leaders provide far less public goods, we can expect them to invest far more in the long term.

Even though we can expect regular public goods like rule of law, education, and infrastructure to be much better in democracies, we should expect to see that trend disappear for long term good like green technology, carbon capture, AI safety, pandemic preparedness, and so on. Green technology is a slight exception in that list, as it is a long term good heavily valued by most democratic coalitions.

Autocrats invest more in the long term, but for themselves and their friends, not the public.

Term limits should make this even more extreme, as leaders cannot even hope to last more than usual or come back after a time.


We can model competence as an ability to produce more goods from the same pool of resources. You can think about it as competent leaders paying less for goods, or as competent leaders simply having more resources - the math is the same.

Competent leaders and challengers are able to offer more goods than their opponent, and so find it easier to attain and retain office.

If the competence of the challenger is known, the leader will take it into account in his spending strategy - spending more against a competent challenger and less against an incompetent one. As far as the challenger's competence is unknown, the leader has to make a bet on how much to spend to be confident about surpassing the challenger's spending ability.

Since competent leaders spend less, they have more surplus revenue to use however they like.

Over time, all systems would select for competence. But the selection pressure is much higher in large coalitions than small coalitions.

Economic EffectsTaxes

There are three constraints on tax rates:

  1. High taxes diminish how much people work. This tends to be the limiting factor in autocracies.
  2. The coalition is affected by taxes, so it has to be compensated. This tend to be the limiting factor in democracies.
  3. Tax collection isn't free, it requires resources and people to collect them.

As the coalition shrinks and the selectorate expands, autocracies tend to extract as much resources as they can from residents to give large rewards to the coalition and keep large amounts to themselves.

In small coalition systems, the coalition is compensated with private goods, which In the real world could also be tax exemptions. When the coalition is large, the leader cannot compensate them as much, since public goods cost more, and has to lower the tax rate.

Low taxes can also be considered a public good, which are inversely correlated with coalition size.

The theory predicts that as the winning coalition shrinks, taxes grow, and as the coalition grows taxes shrink.

  • Small coalition > High tax rate
  • Large coalition > Low tax rate

The lower tax rate in democracies is offset by the higher economic activity.

Though not part of the model, in the real world collecting taxes isn't free, and we can expect that the higher the taxes the more people would try to evade them and the more collecting them would cost. This can also offset the lower tax rate in democracies, and act as another limit for autocracies.

But you're probably thinking, "I live in a democratic country and I pay high taxes, what's up?". Indeed, many people pay high taxes in democracies - which seems counter to what the theory suggests - but it's part of a progressive tax system. There isn't one tax rate for everyone like the abstraction in the theory. In some places, under a certain income you don't pay income tax at all. And there are various extra benefits for things like getting married and having children.  

On the other hand, autocracies often don't report correct tax rates or extract resources from citizens in roundabout ways, like forcing them to sell produce to the government, which the government then sells internationally at a much higher price. Autocrats may even raise tax rates beyond the point that maximizes revenue as a form of oppression.

We can also expect that the more competent at providing public goods the government is the more large coalitions will approve of higher taxes, but still not nearly as high as in autocracies.

The result should be that autocracies extract more resources in total from residents than democracies. We should also expect autocracies to tax the poor the most and the rich the least, while expecting democracies to do the opposite.

Economic Activity, Leisure and Black Markets

Per-capita income is directly related to coalition size.

This graph shows the functional relationship between coalition size, tax rates and economic activity predicted by selectorate theory.

The Logic Political survival, figure 3.1, reproduced and simplified.

Everyone would like not to pay taxes while their fellow citizens continue to do so (well, at least everyone that doesn't assign much importance to notions of fairness). As taxes grow people are more tempted not to pay them, and instead engage in the black market.

Leaders never want people to avoid paying taxes. But, they might offer that as a private reward to coalition members, either in the form of tax exemptions in the law, or through selective enforcement of black market laws.

The theory predicts that as the coalition shrinks, people will engage more in black market activities, and leaders will enforce anti black market laws more selectively.

Spending and Welfare

As loyalty decreases, the proportion of revenue spent goes up (and surplus goes down), and as the coalition expands more of that spending goes towards public goods. 

Some things considered public goods by the authors and are expected to increase with coalition size:

Protection of property rights, protection of human rights, national security, Rule of law, free trade, transparency, low taxes, education, and better balanced markets, healthcare and social security.

In general, anything considered a public good by the coalition is expected to increase with coalition size.

Economic Growth is predicted to increase with coalition size since evidence shows it's related to some of the things considered public goods.

The authors also predict that "the total value of private goods will be higher in the initial period of incumbency - the transition period from one leader to another - than in subsequent years and that the overall size of the winning coalition will shrink after the transition period."


The authors suggest 3 reasons for corruption, all of which are much worse in small coalition systems, and exacerbated by strong loyalty norms:

  1. Complacency: As far as reducing corruption can be considered a public good, small coalition leaders have no interest to pursue that and instead prefer to be complacent.
  2. Sponsored Corruption: Allowing corruption can be a private benefit given to supporters.
  3. Kleptocracy: The stealing of wealth from the state directly by the leader.

I think there's a fourth mode of corruption that is more common in large coalitions, is consistent with the theory and explains why democracies still feel so full of corruption. I explain it in the Gifts section under Further Implications.

Additional Sources of Revenue

In the basic model the only source of revenue for the leader is tax revenue. But it's easy to see what would be the effects on the country from an extra revenue stream for the leader. We'll explore three possible sources: Natural resources, debt and foreign aid.

Natural Resources

An abundance of natural resources can create another income stream for the leader and reduce the leader's dependence on the economic activity of citizens.

In small coalition systems, this allows the leader to raise taxes even further.

In large coalition systems, it allows the leader to lower taxes even further.

National Debt

The basic model assumes that spending can be lower than the tax revenue, but cannot be higher. Later in the book the authors check what happens if that assumption is removed and spending is allowed to grow beyond revenue.

debt acts like another source of revenue and increases kleptocracy.

Foreign Aid

Monetary foreign aid is usually given mostly to small coalition systems, where residents are poor and are in need of it. But if the resources are given to the leader to distribute to the population, the leader is expected to take much of it to themselves.

This gets worse when the leader is in crisis too. If the leader lacks resources to provide private rewards to their coalition, they will have an even greater incentive to distribute foreign aid money away from the public and, in this case, toward the coalition.

If a body wants to give foreign aid and wants the leader to make political reforms in the favor of the public, they have to condition the aid on the reform. Otherwise, aid given before a reform helps the leader fund rewards for his coalition, and is more likely to prevent these reforms rather than incentivize them.

Selectorate theory suggests that to be effective at improving the lives of residents, foreign aid should be conditional on prior political reforms, especially ones that hurt political survival. The aid should be transferred to independent organizations and administered by them, without interference by the recipient government. Evaluation the success of aid should focus on outcomes, and not just how much aid was given. More aid should be given to those who demonstrate effective use of it.

But wait, what reason do leaders even have in providing foreign aid to other countries according to the model? Foreign aid is part of foreign policy, which is discussed later, and can influence the policy of the receiving country. That influence can be a public good if it aligns with the interests of the citizens.

Immigration and Emigration

When people feel that the system doesn't work in their favor, they have three options

  1. Exit: Leave the country to a more favorable place.
  2. Voice: Try to change the system.
  3. Loyalty: Stay loyal and wait for better times.

This section will focus on the first option, and the next section will focus on the second.

In this model, the reason for emigrating is to increase your access to public goods and, if you're lucky, private goods, so emigration is expected intuitively to be from poor polities to rich polities, and from small-coalition systems to large-coalition systems. 

Disenfranchised, and selectorate members to a lesser extent, are most likely to take this option. Coalition members already benefit from their position, and are unlikely to be better off elsewhere.

Polities are affected by emigration. Every emigrant is one less person that can be taxed. In non-proportional systems, every selectorate member who emigrates also strengthens the loyalty norm. Emigration harms especially small-coalition leaders, who benefit from kleptocracy, and they are likely to prevent it. We see that in autocracies like North Korea and the Soviet Union.

Receiving polities are also affected by immigration. Immigrants increase the population size and the price of public goods. If they are enfranchised it expands the selectorate, and in a proportional system, the coalition as well - increasing spending public goods. If they are not enfranchised, the population grows but the coalition shrinks in proportion to it, making the leader spend less on public goods.

Polities may make immigration easier or harder, making them more or less preferable for emigrants. Large-coalition states that make immigration easier hurts the leaders of small-coalition systems by making it easier for their subjects to leave.

Potential emigrants have to weigh their decision against how difficult emigration is, and how rich, public good oriented and welcoming their target nation is.

Since there are many countries, the barriers to immigration are easier to overcome than the barriers to emigration.

I find Switzerland an interesting case study for immigration policy. It's very hard to gain Swiss citizenship, point-in-fact, nearly 25% of Switzerland's residents aren't citizens, or in the language of the theory, are disenfranchised. But these are mostly foreigners that came there knowing they won't get citizenship. More than that, these mostly aren't refugees who are looking to run away from some terrible country, but well-off people living in democratic countries where they're either in the winning coalition or have a high chance of getting into it (though, in democracies that matters less).

Reading Martin Sustrik's post on the Swiss political system, I intuit that they have a large minimum size for the winning coalition. Most of that coalition doesn't want to expand the selectorate and bring more people in, and yet due to its size, Switzerland is producing so much public goods that people prefer to be disenfranchised in Switzerland than enfranchised in their home country.

Coups and Revolutions

If migrating isn't a good option, people can try to alter the system. There are several ways people may go about doing that - From passing laws, to constitutional amendments, and up to assassinations, coups, revolutions and civil wars.

Protests and Revolts

Selectorate members with a small chance of entering the coalition might seek to expand the coalition in hope they'll be included, or just throw out the current members and hope to replace them ("Seize the means of production").

The disenfranchised have no chance of entering the coalition as long as they remain disenfranchised, and need a more fundamental change.

These groups are the most likely to rebel against small-coalition, large-selectorate systems. 

The winning coalition is expected to oppose these attempts, as they have different interests. But they also have their own way of changing the system:

Coups and Purges

The leader and coalition may also take action to change the system. Remember, Leaders want to shrink the coalition and expand the selectorate. The coalition wants to expand the selectorate, and to either shrink or expand the coalition. We'll call the act of shrinking the selectorate or the coalition by removing some of its members purging.

Whether the coalition prefers to shrink or expand depends on where they are on the welfare graph. When the coalition is on the lowest point of the graph, where both expanding and shrinking the coalition increases their welfare, they’re conflicted on which direction to go in. Some may support reduction while others support expansion.

The Logic of Political Survival, figure 3.2, reproduced

Purging the Selectorate

Given the chance, after a coup for instance, coalition members are glad to purge the selectorate, as it weakens the loyalty norm and forces the leader to spend more. Though the total spending increases, this doesn't benefit the selectorate and disenfranchised much, as most of that spending is directed towards the coalition.

Purging the Coalition

The leader is always happy to purge the coalition. For the coalition it's more complicated. 

A coalition member on the left side of the welfare function can benefit from that as long as they're not the ones purged, as they will get a larger share of the rewards. But, if the coalition shrinks while the selectorate doesn't, the loyalty norm is strengthened and the total amount spent on the coalition goes down. Which effect dominates determines whether coalition members benefit from their fellow members being purged or not.

Purging the Selectorate and the Coalition

This is the optimal purge for non-purged, small-coalition members. They can get the benefits of both types of purges. Their share of private goods grows, and if the selectorate was reduced more than the coalition, such that the loyalty norm weakens, total spending also goes up.

Expanding the Coalition

On the right side of the welfare function, even non-purged coalition members never benefit from purges. Instead they benefit from expanding the coalition. But the leader would still like to purge the coalition, so they have conflicting interests.

This further predicts that once a coalition is far enough right on the welfare curve, they cannot possibly have anything to gain from shrinking the coalition. This predicts that the larger a coalition, the more stable the system will be.

Expanding the Selectorate

Leaders always want to expand the selectorate in proportion to the coalition, and the coalition always wants to stop them. If the selectorate expands and the coalition also expands proportionally, the coalition is fine with that.

Purging the Selectorate and expanding the Coalition

This is the ideal case for a coalition on the right side of the welfare graph, and the worst case for a leader. 

Civil Wars and Revolutions

Going beyond protests and coups, the authors expand the model to talk about civil wars and revolutions.

The goal of revolution in this model is to either take control of part of the nation (creating a new one), or replace the existing selectorate with another (including the leader, of course). The American Revolutionary War is an example of the first, and the French Revolution is an example of the second.

The model suggests that revolutionaries would be motivated by the prospect of overthrowing the current system so they, the excluded, become the included. The revolution attempt is modeled as a civil war between the disenfranchised (the excluded), and those in the selectorate who chose to oppose them, where each side tries to rally people/strength to their side, and whoever has more wins.

Those in the selectorate can either join, oppose or ignore the revolutionaries. The selectorate has two advantages over the disenfranchised. 

Those in power have an incentive to monopolize military ability to be able to defeat a revolution, so they either only train those in the selectorate, or induct those who are skilled into the selectorate. If instead the military was disenfranchised, they would just overthrow the system. So the first advantage of the selectorate is military ability. Formally, this is represented by a multiplier on their strength. Various things can change the value of this multiplier, like the technology available, but that's outside the scope of the model.

The second advantage is a greater ability to mobilize, due to an asymmetry of motivation. The disenfranchised can benefit from the revolution if it succeeds and they become selectors, but stand the risk of oppression and death if it fails. Passivity is safe for the disenfranchised, but not for the selectorate. If the revolution succeeds the selectorate will lose their current privileges. But like the disenfranchised, fighting is dangerous for them and it might deter them from fighting back.

The revolutionary leader promises a new alternative system. The disenfranchised calculate the expected benefits and costs from joining the revolution, and decide to join if it's worth it. The better the system the revolutionary leader promises relative to the current one the easier it will be to recruit. The ability of the leader to promise private goods in the new system solves the free rider collective action problem that would appear if they could only offer more public goods.

Selectors make the same calculation, and decide based on it whether to fight back. The better the promised system relative to the current one, the less inclined they will be to fight back. The worse it is, the more they'll be willing.

This makes large-coalition systems immune to revolution. A new system with a large winning coalition isn't better for the current selectorate, so a revolutionary leader can't improve their situation. If the leader promises a smaller coalition they will replace the current selectorate with their supporters and the current coalition would lose their chances of getting private goods and get less public goods. They will also have trouble recruiting, as even the disenfranchised benefit from the high amount of public goods, and they're probably a much smaller group than the selectorate, making it impossible to recruit enough supporters to defeat the defenders. 

Small-coalition, small-selectorate systems are vulnerable to revolution. There are way more disenfranchised than selectors, and their motivation to revolt is high. The selectorate, and especially the winning coalition, benefits greatly from such a system and will fiercely defend them. 

In Small-coalition, large-selectorate systems, there are less disenfranchised, but the selectorate may have it almost as bad as they do, and will not be willing to defend the system - They might even join the revolution. Given the competing effects of selectorate size, the authors are uncertain what selectorate size would make revolutions more common, and do not make predictions. But they do predict that such systems have less chance of surviving a revolutionary movement so they focus their efforts oppressing the ability to recruit and organize for a revolution.

This leads to an expected difference in the members of the military in small and large coalition systems. Small coalition systems have to include the military in the selectorate, or else it would lead a revolution. Large coalitions don't need to worry about revolution and so can professionalize the army and include people outside the selectorate.

Outcomes of revolution

I wrote that the revolutionary leader promises a new, better system, with a larger selectorate and usually a larger coalition, and indeed, the theory suggests that the leader is sincere when they make this promise. But once the revolution is successful and the revolutionary becomes a leader, their incentives become that of a leader, and suddenly a big coalition is not in their favor.

The model predicts that if unconstrained, leaders will choose small-coalition, large-selectorate systems. Yet some revolutionaries like Nelson Mandela and George Washington greatly expanded the coalition after they won. Therefore, if a revolution results in an expansion of the winning coalition, it must be due to constraints.

One form of constraint is a non definitive win. When Mandela's revolution succeeded it wasn't a decisive win, and they had to form a coalition agreement with the former power. The rules were not the decision of a single person.

Another form of constraint is not having a single definitive leader to the revolution. In America the revolution was a joint victory by the thirteen colonies.

Large-coalition systems are expected to have little severe anti-government action taken by residents. But in the absence of deterrence, selectorate theory predicts that small-coalition, large-selectorate systems will have the most, and the most intense, domestic resistance. That's why they turn to:


To prevent coups, revolts, and other forms of challenges, leaders can turn to oppress their population. We'll see when and who leaders oppress the most, and how they go about doing it.

Every opposer compares the benefits of success to the risks of failure. Oppression deters opposers by increasing the risk of failure. To be successful, leaders intensify oppression with the expected gains of successful opposition, making the risk of failure match or overwhelm it. 

Leaders use oppression to stay in power. The motivation to stay in power is a function of the value of holding office, and the risk from losing it. In small coalition leaders get the most value out of office, and also have the most chance to be punished when deposed. Large coalition leaders get the least out of office, but are allowed to walk out with what they got. The incentive to oppress opposition increases with the motivation to stay in office. Therefore large coalition leaders have a low motivation to oppress opposition, and small coalition leaders will attempt to hold office by any means possible.

Oppressing Challengers

The greater the inequality between the welfare of the leader and the welfare of the coalition and selectorate, the more tempting it is to challenge the incumbent. To counteract that the leader will intensify oppression on challengers. 

In large coalitions the disparity between leader welfare and coalition/selectorate welfare is small, and thus oppression of challengers is also small.

A larger selectorate (stronger loyalty norm) also increases the disparity of welfare and oppression of challengers.

Oppressing Defectors

Since challengers from the winning coalition have an advantage over other challengers, leaders more fiercely oppress their own supporters who lead challenges.

Leaders also oppress anyone who supports challengers, and especially their own supporters, as they're the most influential. 

Void of oppression, any selector not in the incumbent's coalition will join the challenger's coalition as that's their only chance of entering the coalition. Oppression discourages that. The extent of this type of oppression grows with the benefit of inclusion in the coalition. Put another way, "a leader has the greatest incentive to oppress selectors when the selectors stand to gain the most from unseating them", which is when the coalition is small.

Oppressing The Disenfranchised

Disenfranchised have an incentive to revolt when public goods provision is low. Small coalition leaders have a great incentive to oppress them.

Finding Oppressors

Just as no ruler rules alone, no oppressor oppresses alone. Those who carry out the leader's oppression are more willing to do what it takes when they benefit from their rule.

Coalition members are an obvious choice. They're willing to oppress any source of opposition. This explains why the military and secret police are key members of the coalition in autocracies. 

Selectorate members may be willing to oppress the disenfranchised if they benefit from the current system even if they don't benefit from the current leader. This happens in small-coalition, large selectorate systems, as the loyalty norm is weak and they have a good chance to be included in the coalition. 

Coalition members have a conflict of interest in punishing challengers from within the coalition, as they benefit from the existence of credible challengers to the leader. The leader provides private goods to their supporters so they don't defect. If oppression removes all possible challengers, the leader no longer has to provide anything. Leaders can solve this dilemma for the coalition by hiring selectorate members to punish insider challengers. This could be the selectorate member's way to get into the coalition.

Large coalition leaders should find it hard to recruit people willing to oppress their fellow citizens, as the benefits of inclusion are small. They can also count on getting back into power if they lose it due to the higher turnover rate in large coalition systems.

Credible Oppression

Like any punishment, oppression depends on the credibility of the oppressor's threat to punish the oppressed. In particular these are the things required for credible oppression:

  1. The leader is capable of retaining power and the opposition may fail. Leaders who lose power cannot punish those who opposed them, so threats are less effective when opposers believe they can succeed. Small coalition leaders are better at retaining power, and so are more credible oppressors.
  2. Oppression has to be connected to opposition. Random oppression doesn't deter opposition, but it does increase the motivation for it.
War and Peace

The authors start the sixth chapter with an excerpt from Sun Tsu's The Art of War, and an excerpt from a speech by Casper Weinberger on the Weinberger Doctrine, to illustrate the differences between the approach to war in small coalition and large coalition systems. The full section is worth reading, but is too long to include here.

The authors set out to explain the phenomena of Democratic Peace, that democracies do not fight wars with one another, and more specifically, these empirical tendencies:

  1. Democracies are not immune from fighting wars with non-democracies.
  2. Democracies tend to win a disproportionate share of the wars they fight
  3. Democratic dyads choose more peaceful dispute settlement processes than other pairings do.
  4. In wars they initiate, democracies pay a smaller rice in terms of human life and fight shorter wars than nondemocratic states.
  5. Transitional democracies appear to fight one another.
  6. Larger democracies seem more constrained to avoid war than are smaller democracies.

To see the consequences of selectorate theory on war, we have to expand the model to a dyadic model, where we have two polities, and set the rules of engagement between them.

In this model, when leaders enter a dispute with leaders of other polities, they each either decide to fight or negotiate a settlement. If either chooses to fight, they both choose how much of their available resources to commit to the war effort. Like anything else, any amount spent on defense is an amount not spent on other things. Who wins is a function of regular defense spending (a public good) and war effort spending (which comes out of the private goods budget).

Residents receive payoffs according to the dispute's outcome (whether through war or negotiation), and if they're coalition members, the resources not consumed in the war effort. Then the selectors in each state decide whether to retain to replace the current leader.

The size of the coalition changes war strategies by changing which type of good the coalition focuses more on, and therefore which one the leader does as well. In a small coalition the leader is best off saving resources for the coalition rather than spending them on war. A defeat, unless specified otherwise, affects everyone equally - it doesn't affect the leader and coalition more than other members.

To clarify - It's not the outcomes themselves that are better or worse depending on coalition size, but increased effort at winning decreases the ability to give private rewards, which is more detrimental to survival the smaller the coalition is.

So like in the case of taxes, it's easy for the leader to compensate small coalitions for defeat, and difficult for large coalition leaders. Therefore, large coalition leaders try harder to win wars, and avoid them in the first place if they don't think they can win.

Further, this means that large coalition leaders are more likely to win wars. And since two large coalition leaders both anticipate that both would try hard if they war, they'd rather resolve conflicts peacefully.

Small coalition leaders try less hard, but still sometimes fight wars because the cost of losing is smaller for them. 

There is an exception though, leaders will always try hard if they worry that defeat will directly cause them to lose their position. For example in WW2 both small-coalition and large-coalition leaders were nearly certain to lose their position or their lives upon defeat, so they either tried hard or surrendered their independence for survival.

At the other extreme are wars that require little resources to win, which both large and small coalition leaders may be happy to initiate and invest the little it takes to win. Colonial expansion can fit this category.

  • Small coalition > Higher focus on private goods > Less available resources to spend on war > Higher chance to be reelected upon defeat > low motivation to win wars > Willing to fight unlikely-to-win and likely-to-win wars > Less likely to win wars they start
  • Large coalition > Lower focus on private goods > More available resources to spend on war > Lower chance to be reelected upon defeat > high motivation to win wars > Reluctant to fight unlikely-to-win wars, but willing to fight likely-to-win wars > More likely to win wars they start

Since democracies are happy to take on easy wars, how aggressive they are is not inherent, but depends on the situation.

If we assume that lower casualties act like a public good - since the smaller the coalition the less casualties are children of, or themselves are, coalition members - then we can also expect democracies to care more about the life of their soldiers and have lower casualties. Same case for winning fast.

We'll compare disputes between three pairings of polities.

Autocrat VS Autocrat

Neither tries hard if there's a war. Each attacks if it believes that on average it can get more from conflict than negotiations. To paraphrase the authors, Because the war's outcome is not critical to their survival, the decision to fight is more easily influenced by secondary factors not assessed in the model, like uncertainty, rally-round-the-flag effects, and personal whims of leaders.

Autocrat VS Democrat

Though autocrats are willing to fight, they are reluctant to attack democracies if they anticipate they will reciprocate with force. Since democrats try hard, autocrats know they're likely to lose. However, since democrats are reluctant to fight wars they're unlikely to win, they're more likely to offer concessions when they aren't certain enough they'll win. This gives autocrats a strategy of creating disputes and making demands of democrats that they know won't be certain enough of winning to take advantage of their concessions.

Therefore autocrats are expected to start many disputes with democrats, but few of them will escalate to violence.

Democrats are more likely to initiate wars with autocrats than with democrats, but still only if they're likely to win. Autocrats are likely to fight back and not offer concessions, since the price of losing is smaller for them.

Israel, where I live, is a great example of this. How could such a small country constantly  win wars against several countries much larger than it is, even when they attack together? Some attribute it to Jewish ingenuity, some to Arab disorganization. This model gives a different perspective.

Though small, Israel is a democratic country, and all of Israel's opponents are somewhere along the monarchy-autocracy line. So Israel tries hard, perhaps even harder than other countries would due to the worry that loss wouldn't just be some loss of independence or an economic blow, but an existential danger, both to the citizens and, perhaps more importantly, the leader.

On the other hand, its opponents don't try hard, and can only spend so much on war before displeasing their small coalitions.

Honestly, this paints a bleak picture for me, as it suggests that Israel may have an incentive to keep these countries autocratic. On the one hand, a democratic Egypt or Syria is (according to the model) less likely to attack Israel. On the other hand, if they do attack the war would be far more devastating than any war Israel previously had, and it's far more likely it will lose. And the size difference would make them more likely to attack than if the countries had a similar size.

Israel's other front is against regimes that are weaker, and even much weaker, than her. Terrorist organizations like Hamas and Hezbollah aren't a credible existential threat to Israel. Israel can conceivably attack Hamas tomorrow and land a decisive victory. It doesn't, because the cost is high and uncertain, it will not be a popular move.

Hamas knows that, so they make relatively small attacks against the citizens of Israel (something a dictatorship couldn't care less about, but a democracy cares a lot), and get concessions from Israel. When Israel does retaliate, it's not enough to deter an autocratic leadership.

Democrat VS Democrat

A democrat will initiate war against another democrat only if they're sufficiently sure they'll win, or that their opponent will offer concessions instead of fighting back. A democrat will only fight back if they believe they have sufficiently high chances of winning, otherwise they will concede.

Foreign Policy

In the last sections we haven't given much thought to why a leader would choose to go to war, except as one of two solutions to a dispute. We also didn't give much thought to the question of what they intend to do after they win. In this section we'll explore war aims, and how the outcome of war in the losing state is affected by the winner state. We'll need to add a few more assumptions to do that.

Foreign policy regards actions leaders take to get an advantage in international competition against other nations, in order to survive domestically. The model assumes foreign policy efforts are a public good.

The winner in war either wants to obtain resources from the defeated state, force policy changes, or force structural changes (the makeup of the selectorate and the coalition).

These are treated as regular goods and are split by the leader between public goods, private goods, and personal benefit. War aims are a mix between private and public goods that depends on the size of the coalition and the selectorate - Small coalitions drive leaders to seek private goods in war, and large coalitions drive them to seek public goods. 

The postwar settlement process is modeled as a struggle in which whoever spends more relative to the other gets more. Foreign policy spending is determined by coalition and selectorate size.

Commitment and Compliance

The settlement has to be maintained somehow, and various things can make it more and less difficult.

We'll split settlements into ones that require active compliance from the loser and ones that only require passive compliance. The UN's agreement with Iraq after the Gulf War that it would allow inspections of their disarmament required active compliance. Territorial changes only require passive compliance as the defeated state has to actively challenge the winner to get back territory. Active compliance is harder to enforce than passive compliance, and leaders take that into account when forming their war aims.

Further, It's likely the defeated leader would like to go back to their previous policy if they could, leading to a commitment problem for the loser and an enforcement problem for the winner. Even if the loser wanted to follow the agreement and could credibly demonstrate that, internal pressures can stand in their way. If a challenger suggests a more attractive policy that includes breaking the agreement, it'll be hard for the leader to survive without also breaking the agreement, especially in large-coalition systems.

If, however, the new policies are in the interests of the citizens and the coalition is large, the commitment and compliance problem is reduced.

Installing a Puppet

To mitigate the commitment and compliance problem, the winner can replace the losing leader and install a puppet.

Like any other leader, a puppet still has their own interests and faces domestic pressures. If the leader loses the ability to remove the puppet from its position it will stop being loyal to them. Still, Installing a puppet increases the chances of compliance, but requires further military investment to achieve a total victory. Winners who install a puppet are incentivized to also install a small-coalition, large-selectorate regime in the defeated state, since in these regimes leaders have the most power and survive longest.

Large coalition leaders are most likely to install puppets since they spend most on foreign policy.

Structural Changes

If the winner chooses to make structural changes in the defeated state - change the sizes of the coalition and selectorate, as well as who's included - they will make them smaller if their own interests are different from the ones of the residents, and make them larger if they do.

If we look at the US's history of modifying other countries, we can see Iran as an example of pushing a country in an autocratic direction, and West Germany and Japan as examples of pushing countries that had similar interests to them after the war in a more democratic direction. It's important to note that in Germany and Japan's case the move toward democracy wasn't instant, but instead took several years during which both countries were managed from the outside. So the move to democracy can be very slow and costly.

Making the state more autocratic can help a puppet leader rule, so such structural changes often come together with the installation of puppets, while making the defeated state more democratic is unlikely to come together with installing a puppet.

This further strengthens the bleak image of Israel's relation with our neighbors. If Israel just has it easier when her neighbors are autocratic, our foreign policy efforts are likely to keep enforcing it, even if citizens like me hope our neighbors will get to have better lives under better regimes.


Taking territory from the loser continues the war after it was already won, and the defeated leader can attempt to get the territory back, so leaders aim for territorial expansion only if they benefit from it. Territory can be valuable in two ways,

  1. Strategic value comes from strategic territory that helps the state to win wars.
  2. Resource value comes from resource-rich territory.

Autocratic leaders benefit more from resources than democratic leaders as they get to keep more to themselves. This also means that democratic leaders would be more willing to return resource-rich territory. Territorial expansion shifts resources from the loser to the winner, weakening the former and strengthening the latter. 

Strategic territory increases the ability of the leader to provide the public good of security, and the ability to defend other gains from war. Autocratic leaders may value strategic territory for the reduction in resource requirement in defense, but democratic leaders value it much more.

  • Small coalition > Leader gets more value from resources and has lesser need to defend citizens > Prefers resource-rich land to strategic land > Less willing to give back resource-rich land > More likely overall to seek territorial expansion
  • Large coalition > Leader gets less value from resources and has greater need to defend citizens > Prefers strategic land to resource-rich land > More willing to give back resource-rich land > Less likely overall to seek territorial expansion

As usual, the size of the selectorate has a small "autocratic" effect on large coalition systems, and a more pronounced effect on small coalition systems. 

Further Implications Satisfy != Benefit 

Implicit in the theory, but not made explicit by the authors, is that the leader has to satisfy their supporters, but that does not necessarily mean doing what's good for them. If a leader can make people believe policy x (which is better for the leader) is better than policy y (which is better for the people), the leader can do x, get the personal gains, and not lose support. This is part of why journalism is important, and why weak journalism fosters bad policy. In small coalition systems, it's easy to simply censor information and suppress the press. Large coalitions won't put up with that, but a flood of irrelevant information can do the job just as well, without reducing satisfaction. Charismatic leaders give less and get more.

Voting Methods 

The selectorate theory implies that the most important thing about the way a leader is chosen, is how much of the population they have to satisfy in order to get and stay in office. So voting methods which reward being approved by a supermajority of the population should result in better policy for the people. I say reward instead of require because systems that require supermajority can become weaker and less stable. The authors give an extreme example of Poland in the 18th century which gave veto power to all legislators, which led to foreign powers easily stopping any decision being passed by bribing just one person. Also see Abram Demski's Thoughts on Voting Methods which discusses voting methods and support levels. 

Gerrymandering can also be used to manipulate the voting system into giving some people less voting power than others, thus making the coalition smaller.

Voting Age

The theory says that in large coalition systems, the less disenfranchised people there are, the better. In modern democracies usually the only people who are disenfranchised (except non-citizen immigrants) are kids. Which suggests a motive for lowering the voting age (see also). It also discourages any form of limitation on voting rights, such as intelligence tests or maximum age limits.


In a small coalition system, especially ones with a large selectorate, the leader is almost always the richest, most powerful person in the country. This is of course because they can steal more from the state, but also because they stay for longer, and can extract much more resources from the population. Unless you become part of the coalition, any riches you get that draw the attention of the state can, and probably will, be taken from you.

This also means that usually no one under the leader can bribe them, only foreign parties.

In large coalition systems, this is very different. The leader can only steal so much from the state, only stays for so long, and cannot extract resources as they please. Combined with the prosperity large coalitions bring, this creates a situation where the leader is rarely the richest member in their country.

Thus it opens the opportunity to bribe the leader with gifts (and promises of gifts, lest they'll be discovered too early).

In other words, where in small coalitions the leader trades goods with the coalition for policy (army policy, police policy, production policy, etc..), in large coalitions the relationship is flipped and the leader trades policy with the rich for goods.

ConclusionsBack to Leopold

I opened this article with the story of Leopold II who was simultaneously king of Belgium and ruler of the Congo Free State. Now, armed with selectorate theory, let's see what explains the difference.

When Leopold was king, Belgium was already a constitutional monarchy, yet he still had considerable influence - Like his father, Leopold II was skillful in using his constitutional authority.

The authors estimate his, and his government's, selectorate to be fairly large for the time, at 137,000 out of nearly six million residents.

When Leopold became king, many European countries were gathering colonies and building empires, and Leopold wanted to join the party. 

Finally, after much trying, he acquired a lot of land in Africa. Though unlike other colonies, his colony wasn't owned by the state, but was his own private property.

Leopold did that by lending money from the Belgium government, and creating his own private army.

In 1878 he sent a company led by explorer Henry Stanley, disguised as a scientific and philanthropic expedition, to establish the colony in Congo. "Representatives of 14 European countries and the United States recognized Leopold as sovereign of most of the area to which he and Stanley had laid claim", which was 74 times the size of Belgium.

So while in Belgium he was constrained by a large winning coalition, in Congo he had almost no constraints - he only had to reward his private military. Further, the revenue from Ivory and rubber acted as an extra revenue source like natural resources. This let him provide more goods in Belgium without raising taxes. He got the name "The Builder King" for all buildings and urban projects he would construct (And also many private ones, of course).

Eventually, evidence got out that he was growing rich on the back of slave labor and atrocities, and he was forced to cede control to the Belgium government. And though still bad, Belgian Congo was much better than Leopold's.

The picture we get is of a person who just ruthlessly followed incentives. In Congo he had what to gain from showing no restraint, so he didn't. In Belgium he had to satisfy a large coalition so he was what can be considered a good ruler, at least by past standards.

Unfortunately, I think the theory suggests that even a benevolent version of Leopold couldn't have done better for Congo than leaving it alone. He probably didn't have the money to provide them with public goods, and the only way he could take over was by relying on a small coalition in the form of an army.

A Note on Evidence

The authors did vast empirical work, checking all the predictions they could against real-world data, which I completely ignored. This is because I wanted this post to focus on the ideas of the theory, and reviewing the evidence deserves its own full post. Still, it won't do to not even comment on it.

The biggest hurdle to the empirical study of the theory is estimating the size of the selectorate and coalition of different countries (With coalition size being much more difficult). In some countries (like Israel) it barely feels like it fits.

How they estimated these values and the techniques they used are a story for another time (as well as the criticism on how they've done that, and their response to the criticism), but using these estimates and techniques they found evidence for even the most complex and specific predictions they make, such as the swoosh-shaped welfare function of the winning coalition. Further, they found the model better predicts the data than other predictors such as government type (e.g, democracy or dictatorship). 

The reason I think a review of the evidence wasn't crucial to include in this post is that, on some level, the ideas speak for themselves. For the most part, you can see intuitively if the predictions fit or contradict the world you know. At least for me, they seem to fit fantastically well.

Further, at least at first glance, many of the patterns predicted by the model seem unrelated. A simple model that gives logical explanations for so many such patterns is already doing something right.

To say it differently, if neither the authors nor anyone else did any empirical work on this theory, I would probably still find value in it, as it elegantly explains many important (seemingly) unrelated patterns. Of course, it's important to do empirical work to make sure we're not fooling ourselves, and so I'm very glad the authors did it.

The easier it is to estimate selectorate and coalition size the more practically useful this theory will be. The authors are aware of that and just recently published a paper on a new measure of coalition size.

What I would like to have is a public, regularly updating index of selectorate and coalition size estimates for all countries.

Further Reading

Should you read the books? I tried to make this post comprehensive enough that for most people it would be a good alternative, so I don't think you need to read them to understand the theory.

Still, I haven't covered all the statistical evidence given in the book or the mathematical model, and with such a large book that touches on so many subjects it's difficult not to miss anything, and impossible to reach the same depth. The books also have far more examples than I could bring here.

So if you still want to learn more, I would recommend the books. The Dictator's Handbook for a more public oriented book with many examples and stories. And The Logic of Political Survival for a more in depth, academic version that includes a wider range of topics, the mathematical model (appendixes to chapters 3, 6 and 9) and in depth statistical analysis (chapters 4-10).

And if you do read them and find something I missed, be sure to leave a comment.

I also recommend CGP Grey's excellent video The Rules for Rulers which is based on The Dictator's Handbook. It doesn't cover nearly as much of the theory as this post did, but it covers the part it does cover much better than I can, and it's a more digestible resource to send to someone else.

Future Plans (You can help)

I hope to write more posts about or inspired by this theory. I already have some plans and ideas, some of which I would like help with. I will replace these with links as I write and publish them.

  • Reviewing the evidence for selectorate theory. I would like to do a followup post reviewing the evidence for the theory, but I'm not very strong on the statistics side. If you want to collaborate on this with me I'd be happy to.
  • A post going deeper into the mathematics of the theory. I tried to recreate the mathematical model in python, and got stuck on a few things I didn't understand. If you want to help me with the math and with interpreting things that aren't clearly explained (I can handle the coding), I'd be really grateful.
  • Explorable Explanation of selectorate theory. This requires recreating the mathematical model in code. It would be a far better explainer than just text and images could ever be, and I think the theory is important enough that I would really want there to be such an explanation, but It would also probably take a lot of time and effort, so I don't know if I would do it (even if i had the mathematical model coded up).
  • Term Limits and how to improve them.
  • Estimating the coalition size in Israel. I already started writing it, it's an interesting and difficult exercise as the political system in Israel doesn't lend itself easily to that notion.
To Conclude

Selectorate Theory gives a strong basis for thinking about politics. It shows that it's viable to analyze politics based on the interests of actors inside the state, specifically, based on the expectation that the prime goal of those in power is to stay in power.

The biggest shortcoming of the theory is the difficulty of estimating the coalition size. Though the authors are working on it. If this problem is solved it will make the theory much more useful.

But if you can estimate just 2 variables about a nation (coalition and selectorate size) you can know how to set your expectation regarding a wide range of possibilities - like human rights, taxes, economic activity, corruption, government spending, foreign policy, war aims and strategy, immigration, and oppression.

Large coalitions lead to more public goods, lower taxes, shorter tenure, less oppression, better civil rights, higher wealth and welfare, less corruption, more emigration freedom immigration appeal. They use natural resources to the benefit of the public. They try harder in war and don't get into wars they are likely to lose. They are more likely to make concessions in war and to return conquered land. They are more likely to intervene in the affairs of foreign countries by forcing policy, installing policy, and transforming regimes. Unfortunately the welfare of the citizens of those nations is not their interest, and they are more likely to make regimes more autocratic unless the citizens share common interests with them. 

Small coalitions lead to few goods for the many and many goods for the few. They have higher taxes, worse civil rights, more poverty, longer tenures, more corruption and kleptocracy, and less freedom to emigrate. They use private resources and foreign aid to the benefit of the leader and their small coalition, leaving residents in even poorer states. They don't try hard in war yet are happy to get into them, and are more willing to let their residents be hurt. They are more likely to steal resources from other nations. Their residents are driven to revolt, and they ruthlessly oppress them to prevent it.

Large selectorates reduce total spending by increasing the coalition's replaceability and forcing them to be more loyal. They most prominently affect small coalition systems which become more autocratic.

In the words of the authors, the theory provides "An explanation of when bad policy is good politics and when good policy is bad politics". and more specifically, "For those who depend on a small coalition, good policy is bad politics and bad policy is good politics". Corollary, for those who depend on a sufficiently large coalition, good policy is good politics and bad policy is bad politics. Succinctly, in small coalitions the interests of the leader and the public diverge, and in large coalitions the interests align.

Selectorate theory suggests that to increase prosperity, both in our own nations and in foreign nations, we need to increase the coalition size in these countries. We should include the largest proportion of the selectorate we can in the coalition, and include all residents in the selectorate. To achieve that we should implement direct elections with better voting methods and voter anonymity. We should organize government in ways that lead to larger coalitions, like presidential systems over parliamentary systems. Employ term limits effectively. Give local authorities more power, but make sure they don't become corrupt. And to help those in smaller coalition systems, we should open our borders and make it easy to become a citizen, be careful with our foreign aid, and be weary of some of our own bad foreign policy tendencies.


Niacin as a treatment for covid? (Probably no, but I’m glad we’re checking)

12 октября, 2021 - 08:40
Published on October 12, 2021 5:40 AM GMT


This article contains an interview with a doctor who believes NAD+ is the secret to covid’s heavy morbidity and mortality toll. The description was unusually well done for internet crackpottery. This is hard to convey rigorously, but it had a mechanistic-ness and the right level of complexity about it, and it made the right level of promises for a treatment. None of this is to say it’s definitely correct, but it had a bunch better chance of being correct than your average alt-covid-cure scribbled out in crayon. So I did some checks on it.



This post is organized as follows:

  • Description of theory. 
  • Long section defining terms. These are all useful for understanding the claims I check later on, but depending on who you are they may not be helpful, and you may find the contextless infodump kind of a drag. Feel free to skip if it’s not useful to you personally, and know that it’s there if you need it.
  • Deep dive onto particular claims the article makes.
  • Does it work?
  • Is it safe?
  • My personal experience with the protocol 
  • Some meta

This is your reminder that my only credential is a BA in biology and I didn’t specialize in anything relevant. It is a sign of civilizational inadequacy that this post exists at all, and you should think really hard and do your own research before putting too much weight on it.

For those of you would like to skip to the take home message: science is very hard, I’m glad they’re running larger studies to follow up on all of these because that’s a reasonable thing for a rich society to do, but I’m not super hopeful about this protocol.

The Theory

As described by Dr. Ade Wentze:

There is an extremely widely used coenzyme in your body, NAD. The more active form of this compound, NAD+, is depleted by covid (converted to NADH). In people with a preexisting deficiency or difficulty rebounding after depletion, covid infection results in a persistent NAD+ deficit. This is bad in and of itself, but causes additional problems when your body tries to make up for it by requisitioning all your tryptophan to make more. Tryptophan is also a precursor for serotonin, so this leads to either low serotonin or activation of mast cells to release their serotonin stores, accompanied by histamines (which cause allergies and other issues). 


There is a lot of vocabulary in that theory and in the supporting claims, which I go over here. If you’re reading for conclusions rather than deep understanding I would skip this.


Nicotinamide adenine dinucleotide is a coenzyme that plays an essential role in hundreds of chemical reactions in your cells, including many relating to processing energy and genetic transcription.  This is a mixed blessing as a foundation for crackpot theories go: something involved in hundreds of processes across every kind of tissue in your body can cause almost any symptom, which is great because long covid has a lot of symptoms to cover. On the other hand, it can cause almost any symptom, which means it’s hard to disprove, and you should distrust things in proportion to the difficulty to disprove them. Alas, sometimes core processes are impaired and they do express that impairment in a range of unpredictable ways that vary across people, but it’s also an easy home for crackpots. 

NAD+ has two major components, one made from either tryptophan or aspartic acid (both amino acids), or by altering niacin.


Like many vitamins, niacin aka vitamin B3 refers to a few different closely related compounds (most commonly nicotinic acid, nicotinamide, nicotinamide riboside, and inositol nicotinate, but there are others) that are almost but not quite interchangeable.

Chemical structures of niacin compounds: (a) nicotinamide; (b) nicotinic acid; (c) nicotinamide adenine dinucleotide (NAD þ ); (d) nicotinamide adenine dinucleotide phosphate (NADP þ ) (source)

Niacin is commonly prescribed for treating high cholesterol, although a metareview found it did not reduce overall mortality and may contribute to the development of type-2 diabetes. 

Severe niacin deficiency is called pellagra, and can be caused by either insufficient consumption or problems processing the vitamin. Pellagra is mostly defined as niacin deficiency but can also be caused by tryptophan deficiency, which you may remember is another path to manufacturing NAD+. Pellagra can cause diarrhea, dermatitis, dementia, and death, which are not a great match for acute or long covid. Niacin supplementation treats pellagra, often within a few days.


Sirtuin 1, also known as NAD-dependent deacetylase sirtuin-1, is a protein that regulates the expression of some genes in ways that haven’t yet been made clear to me but seem to be associated with aging (more SIRT1 is associated with better outcomes, although we haven’t broken down cause and effect). As indicated by its name, it’s dependent on NAD+ to operate, which means NAD+ is involved in the regulation of expression of some genes via some mechanism, which means niacin is involved in the regulation of expression of some genes via some mechanism.

SIRT1 is downregulated in cells that have high insulin resistance and inducing its expression increases insulin sensitivity, suggesting the molecule is associated with improving insulin sensitivity.

SIRT1 may be upregulated by selenium.


Another many-purposed enzyme whose activities include DNA repair, killing cells that are beyond repair. PARP requires NAD+ as a coenzyme.

Individual Claims Groups with low NAD+ suffer more from covid NAD+ declines with age

NAD+ does definitely decline with age but so does literally everything bad in your body, so I don’t find this very compelling.

Correlation between NAD+ levels and Age in (A) Males (B) Females (source)

Obese people have lower NAD+ levels, leading to worse outcomes

Yes, although obese people tend to do worse on a lot of metrics. However, that paper highlights that SIRT1 seems to be involved in this correlation somehow.

Diabetics have worse NAD+ levels

Yes, although diabetics also have more immune problems generally (definitely Type 2, some pop sites said the same for Type 1 and that’s believable but I didn’t quickly find a paper I liked that backed the claim).

Low selenium is associated with bad outcomes in covid

The post cites Zhang et al, which took advantage of high variations in selenium consumption in China to do a natural experiment. Variations in the population selenium levels do seem insanely correlated with the overall cure rate (defined as not dying). The study took place in February 2020 so neither data collection nor treatment was very good, but damn that is interesting.

Moreover, this study, which came out several months after the blog post was published, took advantage of the same variation and came to the same conclusion, with a much larger sample size and much more reasonable case fatality rate (1.17% in areas with no deficiency to 3.16% in severely deficient areas, P = 0.002). (Note: several authors on that paper are also named Zhang, but I assume that’s because it’s a common name in China).

Some pharma company thinks selenium is promising enough to launch a trial for it, although recruitment hasn’t started yet.

The pre-print servers are littered with natural experiments highlighting correlations that failed as interventions, but this is very strong for a correlation.

Niacin just generally seems to help lung damage

That is indeed what their citation says, however that paper’s only source looked at the effect of niacin on lung damage in hamsters deliberately induced with a chemotherapy drug, and it’s not obvious to me that that translates to damage from infection or immune reaction. There are some other scattered studies in rodents, combining niacin with other substances, none of which looked at damage from infectious disease.

The treatment for NAD+ deficiency is niacin

Their citation backs this up: niacin supplementation led patients (n=5) and controls (healthy people given the same supplementation, n=8) to increased NAD+ levels, and arguably increased strength, although with that much variation and such a small sample size I’m not convinced. Martens et al supports this with modest benefits seen in n=24 subjects.

A few minutes investigation found some other studies:

  • Dietary niacin deficiency led to NAD+ deficiency in baby rats. This paper works damn hard to hide its sample size but I think it was 10-15 per treatment group.
  • The same author exposed some rats (n=6 per treatment group) to excess oxygen and found that those with a niacin deficient diet had less NAD+ in the lungs and responded less to the damage caused by excess oxygen, but had the same wet/dry ratio as their well-fed friends (wet/dry ratio is a measure of lung health).
  • Ng et al found that in catfish liver NAD increased linearly with dietary niacin supplementation, but health returns like size and mortality dropped off between 6 and 9 mg/kg. They further found that tryptophan supplementation could not make up for a niacin deficiency (in catfish).

Plus niacin is so well established as a treatment for pellagra that no one bothers to cite anything for it, and that does seem to mediate through NAD+.

Nicotinic acid may act as a one of a kind bioenergetic “pump” of inflammatory molecules out of cells

They link to a preprint which has since been taken down, and I could not find it on my own. 

NAD+ problems have been indicated in chronic fatigue syndrome

Everything has been indicated in chronic fatigue syndrome; I’m not looking this up.

Low serotonin -> mast cell activation -> histamine release

Mast cells indeed produce serotonin, in mice. Note that that paper highlights fluoxetine as a way to reverse serotonin deficiency in mast-cell-deficient mice, and since the article was published fluoxetine has shown promise as a covid treatment. However this study says that while serotonin-producing mast cells are common, humans in particular don’t have them while healthy (although it still shows serotonin affecting mast cell movements). This appears to be an area of some controversy.

Mast cells releasing histamine in response to allergens is uncontroversial. Moreover, histamines and serotonin are stored in the same compartments (in mice). Second source (still in mice). 

Some Guy did an informal study based on this theory and it worked

Some guy (Birth name: Gez Mendinger) did indeed report this, and I have to say, for an uncredentialed dude on youtube recommending OTC supplements to treat a nebulously defined disease, this guy looks really credible, and his reasonably good analysis was quite promising. He shared his results with me, and it continued to look promising when I first dug into it with assistance from a statistician, but the deeper we drilled the less promising it looked (details). By the end, the most I could say is “yeah, worth a harder look”, but the history of things that look promising in small, poorly organized studies that wilt under large, well-organized ones is just too dismal to ignore. 

Mouse study shows low NAD+ hurts you via SIRT1

The interview also cites this mouse study featuring a direct NAD+ drip and a slightly different coronavirus. They show improved symptoms but not viral load. They don’t list the sample size anywhere I can find, judging from the low-resolution graph it looks like 7 mice in the control group and maybe 12 in the treatment group? Except for the embolism test which had many more mice.

(apologies for poor image quality, the PDF was crap)

(note: that article was up when I started this post but disappeared before I verified the SIRT1-specific part of the claim)

Quercetin increases NAD+ levels 

Yes, in rats and mice. Specifically, it speeds up the transition from NADH to NAD+

Male pattern balding and low vitamin D are both associated with poor covid outcomes and low NAD+.

The balding citation does indeed say that, but it only looked at hospitalized patients so it’s useless. Moreover, balding is associated with a testosterone derivative, and testosterone weakens the immune system. But when I went to find some cites for those, I found that within hospitalized patients, low testosterone was associated with worse outcomes. However these patients were already hospitalized, so the causality could easily go the other way.

Meanwhile I found several folk-wisdom level comments indicating a link between NAD+ and male pattern balding, but nothing rigorous.

Low vitamin D does seem to be associated with poor covid outcomes, maybe, but treatment doesn’t seem to help (at least not if you wait until patients are hospitalized). 

Chang and Kim assert that Vitamin D activates the NAD-SIRT1 pathway in fat cells in vitro, which if it held up elsewhere would be even stronger evidence for the overall theory than this claim attempts. Byers et al found that vitamin D did not protect guinea pigs against the NAD+ depleting effects of mustard gas. This is not a slam dunk.

Covid depletes NAD+ by activating PARP

Curtin et al lay out a theoretical case for using PARP-inhibitors to treat covid-caused ARDS.

Heer et al “we show that SARS-CoV-2 infection strikingly upregulates MARylating PARPs and induces the expression of genes encoding enzymes for salvage NAD synthesis from nicotinamide (NAM) and nicotinamide riboside (NR), while downregulating other NAD biosynthetic pathways” (notably, the forms not used in the protocol), “overexpression of PARP10 is sufficient to depress cellular NAD and that the activities of the transcriptionally induced enzymes PARP7, PARP10, PARP12 and PARP14 are limited by cellular NAD and can be enhanced by pharmacological activation of NAD synthesis”, “MHV induces a severe attack on host cell NAD+ and NADP+.” (MHV being used as a model)

Long covid and Pellagra share a lot of symptoms, including hyponosmia

Scattered claims pellagra causes hyponosmia but you have to look really hard, it doesn’t show up on any of the common descriptions. I checked in Spanish and didn’t find anything either.

Sen (published only last month) suggests that serotonin deficiency causes anosmia and other neuro symptoms in covid. They propose a different method for the depletion (ACE2 is a mechanism for moving serotonin into the cell), but it’s not mutually exclusive with Wentzel’s theory (that NAD+ depletion causes the body to use up tryptophan trying to produce more NAD+).

Your body hijacks tryptophan to make NAD+ at the expense of serotonin

Tryptophan can indeed be used to make NAD (albeit niacin is better) and serotonin. How your body prioritizes under a given set of circumstances is anyone’s guess.

NAD+ and the immune system

Probably at least some of long covid stems from autoimmune issues, as witnessed by the fact that it’s much more common in women and sometimes helped by steroids. The post and paper don’t make any claims on this beyond the effect of NAD+ on mast cells, which are implicated in autoimmune disorders, but out of curiosity I did some quick googling and found that NAD+ downregulate inflammation via CD4 cells (in mice) and activating SIRT1, the pathway mentioned previously (still in mice).

The Paper

Not that good. Feels associational rather than mechanistic. However Bordoni et al (published after the cited paper) found covid-19 was associated with diminished SIRT1- but Pinto et al found covid-19 upregulated SIRT1 and cite another study claiming that under conditions of energetic stress (which would imply low NAD+), SIRT1 substitutes for ACE2 (the receptor covid uses to enter the cell. Smith suggests that downregulating SIRT1 is good for fighting covid. So SIRT1, NAD+, and covid are probably related, but the first two items are very common so this isn’t damning.

Notably, this paper doesn’t explain why covid would deplete NAD+ more than other infectious diseases, which is an enormous hole.

Does it work? 

The mechanism and empirical data are definitely enough to merit more rigorous follow-up studies (which are in progress) and definitely not slam dunks. But you may need to make a decision before that’s in, so the real question is “should I take this stack if I get sick? Should my parents?”

My tentative answer is: the prescribed stack probably won’t physically hurt you (but see the next section), and it’s fairly cheap, so the limiting factor is probably “what do you have the energy to try”. This is a better thing to try than the interventions whose proof was actively made up or have been investigated and discarded, but there undoubtedly are or will be equally probable things floating around, and choosing between them will be a matter of taste..  

If you do end up giving this a shot, for covid long or acute, I invite you to preregister your complaints and intention with me (a comment here or email elizabeth@acesounderglass.com), so I can create my own little study. If you don’t feel like doing that I still encourage you to announce the intention somewhere, as a general good practice (I did so here). 

So you’re saying it’s safe then?

Anything that does anything is dangerous to you in sufficient dosages. If you’re considering an unverified supplement stack, you should carefully investigate the potential side effects of each substance and consider it in light of what you know of your own health (especially other medications you’re taking). Consider talking to a doctor, if you have a good one.

If any of you are thinking “oh niacin’s a water-soluble vitamin it must be fine”: that’s a pretty good heuristic but it doesn’t hold for niacin in particular.

My experience

As mentioned previously, I acquired lingering progressive chest congestion/inflammation from (probably) my covid vaccine. It’s always possible there was another reason but the timing and symptoms really do not match anything else. 

Since I never had covid (probably), my reaction can’t come from the infection itself, only my immune response to it. Since the theory doesn’t specify a mechanism that’s not disqualifying, but they do make it sound like it starts as a covid problem not an immune problem.

I started this supplement stack before doing any deep verification. The original blog post pattern matched to the kind of thing that was worth trying, everything on the list I either knew was generally safe or confirmed with a quick check (my doctor later confirmed my opinion on safety without endorsing the stack for any particular use), and I had a lot of client work to do. Shoemaker’s children go barefoot, and all that.  So by the time I was writing this I had been on the recommended supplement stack (and some other things besides) for 3 weeks, and was beginning to wean down. 

Overall: my chest pain got better but the timing fits better with attribution to a different intervention. The rash I got on matches very well with the supplement stack. I nonetheless was craving it after I weaned off, so probably there’s at least one thing in it I need, which hopefully isn’t the same as the thing causing the rash. 

[Alert twitter readers may have questions, since I previously was more positive on the stack. I had a major regression when I got a non-covid cold, and had to go back on the other treatment]

Interestingly, my tolerance for niacin increased and then plummeted. Originally I could take 250mg (the smallest size I could find in the right form) with only very mild flush, and that got better over time, to the point I tried 500 mg once (a mistake). But around week 3 my flush was getting worse. Lowering the dose helped, but it’s getting worse again, so I’m continuing to titrate down. This is extremely consistent with filling up NAD+ reserves over time, although very far from conclusive.


I was originally much more positive on this treatment/theory. I gave it more credit on Twitter, but that’s nothing compared to the excited messages I sent a few friends after an initial lit review. I wrote several much more positive versions of this post (and the forthcoming study analysis), but there kept being one more thing to check, until I talked my way down to what you see here. Some of my downgrade stemmed from asking better statistical questions, but some of it was just the emotional process of talking myself down from something that initially looked so promising, but ultimately had a similar amount of holes to many other things that looked equally promising and failed to pay off. This represents dozens of hours of work from me and my statistician, for the very disappointing result of “fringe treatment probably doesn’t do very much but can’t rule it out”. Reality is infinitely disappointing

Thanks to Alex Ray and my Patreon Patrons for partially funding this investigation, and Miranda Dixon-Luinenburg⁩ for copyediting.


EDT with updating double counts

12 октября, 2021 - 07:40
Published on October 12, 2021 4:40 AM GMT

I recently got confused thinking about the following case:

Calculator bet: I am offered the opportunity to bet on a mathematical statement X to which I initially assign 50% probability (perhaps X = 139926 is a quadratic residue modulo 314159). I have access to a calculator that is 99% reliable, i.e. it corrupts the answer 1% of the time at random. The calculator says that X is true. With what probability should I be willing to wager?

I think the answer is clearly “99%.” But a naive application of EDT can recommend betting with 99.99% probability. I think this is a mistake, and understanding the mistake helps clarify what it means to be “updateless” and why it’s essentially obligatory for EDT agents. My takeaway is that for an EDT agent, bayesian updating is a description of the expected utility calculation rather than something that EDT agent should do to form its beliefs before calculating expected utility.

Thanks to Joe Carlsmith and Katja Grace for the conversation that prompted this post. I suspect this point is well-known in the philosophy literature. I’ve seen related issues discussed in the rationalist community, especially in this sequence and this post but found those a bit confusing—in particular, I think I initially glossed over how “SSA” was being used to refer to a view which rejects bayesian updating on observations (!) in this comment and the linked paper. In general I’ve absorbed the idea that decision theory and anthropics had a weird interaction, but hadn’t noticed that exactly the same weirdness also applied in cases where the number of observers is constant across possible worlds.

Why EDT bets at 99.99% odds (under some conditions)

I’ll make four assumptions:

  • I have impartial values. Perhaps I’m making a wager where I can either make 1 person happy or 99 people happy—I just care about the total amount of happiness, not whether I am responsible for it. I’ll still describe the payoffs of the bets in $, but imagine that utility is a linear function of total $ earned by all copies of me.
  • We live in a very big universe where many copies of me all face the exact same decision. This seems plausible for a variety of reasons; the best one is accepting an interpretation of quantum mechanics without collapse (a popular view).
  • I handle logical uncertainty in the same way I handle empirical uncertainty. You could construct a similar case to the calculator bet using logical uncertainty, but the correlation across possible copies of me is clearest if I take a logical fact.
  • I form my beliefs E by updating on my observations. Then after updating I consider E[utility|I take action a] and E[utility|I take action a’] and choose the action with higher expected utility.

Under these assumptions, what happens if someone offers me a bet of $1 at 99.9% odds? If I take the bet I’ll gain $1 if X is true, but lose $1000 if X turns out to be false? Intuitively this is a very bad bet, because I “should” only have 99% confidence. But under these assumptions EDT thinks it’s a great deal.

  • To calculate utility, I need to sum up over a bunch of copies of me.
    • Let N be the number of copies of me in the universe who are faced with this exact opportunity to bet decision.
    • My decision is identical to the other copies of me who also observed their calculator say “X is true”.
    • My decision may also be correlated with copies of me who made a different observation, or with totally different people doing totally different things, but those don’t change the bottom line and I’ll ignore them to keep life simple.
    • So I’ll evaluate the total money earned by people who saw their calculator say “X is true” and whose decision is perfectly correlated with mine.
  • To calculate utility, I calculate the probability of X and then calculate expected utility
    • First I update on the fact that my calculator says X is true. This observation has probability 99% if X is true and 1% if X is false. The prior probability of X was 50%, so the posterior probability is 99%.
    • My utility is the total amount of money made by all N copies of me, averaged over the world where X is true (with 99% weight) and the world where X is false (with 1% weight)
  • So to calculate the utility conditioned on taking the bet, I ask two questions:
    • Suppose that X is true, and I decide to take the bet. What is my utility then?
      If X is true, there are 0.99 N copies of me who all saw their calculator correctly say “X is true.” So I get $0.99 N
    • Suppose that X is false, and I decide to take the bet. What is my utility then?
      If X is false, then there are 0.01N copies of me who saw their calculator incorrectly say “X is true.” So I lose $1000 * 0.01N = $10N.
    • I think that there’s a 99% probability that X is true, so my expected utility is 99% x $0.99N – 1% x $10N = $0.88N.
  • If I don’t take the bet, none of my copies win or lose any money. So we get $0 utility, which is much worse than $0.88N.
  • Therefore I take the bet without thinking twice.
Failure diagnosis

Intuitively 99.99% is the wrong answer to this question. But it’s important to understand what actually went wrong. After all, intuitions could be mistaken and maybe big universes lead to weird conclusions (I endorse a few of those myself). Moreover, if you’re like me and think the “obvious” argument for EDT is compelling, this case might lead you to suspect something has gone wrong in your reasoning.

The intuitive problem is that we are “updating” on the calculator’s verdict twice:

  • First when we form our beliefs about whether X is true.
  • Second when we ask “If X is true, how many copies of me would have made the current observations, and therefore make a decision correlated with my own?”

The second “update” is pretty much inherent in the nature of EDT—if I care about the aggregate fate of all of the people like me, and if all of their decisions are correlated with mine, then I need to perform a sum over all of them and so I will care twice as much about possible worlds where there are twice as many of them. Rejecting this “update” basically means rejecting EDT.

The first “update” looks solid at first, since Bayesian updating given evidence seems like a really solid epistemic principle. But I claim this is actually where we ran into trouble. In my view there is an excellent simple argument for using EDT to make decisions, but there is no good argument for using beliefs formed by condition on your observations as the input into EDT.

This may sound a bit wild, but hear me out. The basic justification for updating is essentially decision-theoretic—either it’s about counting the observers across possible worlds who would have made your observations, or it’s about dutch book arguments constraining the probabilities with which you should bet. (As an example, see SEP on bayesian epistemology.) I’ve internalized these arguments enough that it can feel like a primitive bedrock of epistemology, but really they only really constrain how you should bet (or maybe what “you” should expect to see next)—they don’t say much about what you should “expect” in any observer-independent sense that would be relevant to a utility calculation for an impartial actor.

If you are an EDT agent, the right way to understand discussions of “updating” is as a description of the calculation done by EDT. Indeed, it’s common to use the word “belief” to refer to the odds at which you’d bet, in which case beliefs are the output of EDT rather than the input. Other epistemological principles do help constrain the input to EDT (e.g. principles about simplicity or parsimony or whatever), but not updating.

This is similar to the way that an EDT agent sees causal relationships: as helpful descriptions of what happens inside normatively correct decision making. Updating and causality may play a critical role in algorithms that implement normatively correct decision making, but they are not inputs into normatively correct decision making. Intuitions and classical arguments about the relevance of these concepts can be understood as what those algorithms feel like from the inside, as agents who have evolved to implement (rather than reason about) correct decision-making.

“Updatelessness” as a feature of preferences

On this perspective whether to be “updateless” isn’t really a free parameter in EDT—there is only one reasonable theory, which is to use the prior probabilities to evaluate conditional utilities given each possible decision that an agent with your nature and observations could make.

So what are we to make of cases like transparent newcomb that appear to separate EDT from UDT?

I currently think of this as a question of values or identity (though I think this is dicier than the earlier part of the post). Consider the following pair of cases to illustrate:

  • I am split into two copies A and B who will go on to live separate lives in separate (but otherwise identical) worlds. There is a button in front of each copy. If copy A presses the button, they will lose $1 and copy B will gain $2. If copy B presses the button, nothing happens. In this case, all versions of EDT will press the button. In some sense at this point the two copies must care about each other, since they don’t even know which one they are, and so the $1 of loss and $2 of gain can be compared directly.
  • But now suppose that copy A sees the letter “A” and copy B sees the letter “B.” Now no one cares what I do after seeing “B,” and if I see “A” the entire question is whether I care what happens to the other copy. The “updateless” answer is to care about all the copies of yourself who made different observations. The normal “selfish” answer is to care about only the copy of yourself who has made the same observations.

This framing makes it clear and relatively uninteresting why you should modify yourself to be updateless: any pair of agents agents could benefit from a bilateral commitment to value each other’s welfare. It’s just that A and B start off being the same, and so they happen to be in an exceptionally good position to make such a commitment, and it’s very clear what the “fair” agreement is.

What if the agents aren’t selfish? Say they both just want to maximize happiness?

  • If both agents exist and they are just in separate worlds, then there is no conflict between their values at all, and they always push the button.
  • Suppose that only one agent exists. Then it feels weird, seeing button “B,” to press the button knowing that it causes you to lose $1 in the real, actually-existing world. But in this case I think the problem comes from the sketchy way we’re using the word “exist”—if copy B gets money based on copy A’s decision, then in what sense exactly does copy A “not exist”? What are we to make of the version of copy A who is doing the same reasoning, and is apparently wrong about whether or not they exist? I think these cases are confusing from a misuse of “existence” as a concept rather than updatelessness per se.


Book Review: Free Will

11 октября, 2021 - 23:45
Published on October 11, 2021 6:41 PM GMT


Sam Harris' Free Will isn't a conventional philosophy book. Rather, it's a laconic manifesto full of bold and provocative statements invoking us to free ourselves from the delusion of free will and abolish the whole concept as misleading and unnecessary. The book quickly shatters the naïve layperson’s intuition in the light of scientific advancements, then briefly explains Harris’ dissatisfaction with compatibilism as a half measure, and finally argues that our morality, penal and political systems would only benefit from the dispelling of the illusion which the “free will” is.

Or at least it's what the book tries to be. For me, however, it turned out to be something different. While my initial craving for deep arguments in favour of a position I disagree with wasn’t satisfied, I got interesting insight from Harris' attempts at resolving confusion and reinventing existing theories with different aesthetics. Most surprisingly, I got a new perspective on religious tolerance. Predictably, the publication of the book led to a philosophical debate on the matter of free will between Sam Harris and Daniel Dennet which turned out to be larger than the book itself. I’ll touch it a little in this review as well.

Main thesis

Harris begins his book with a description of a terrible crime. He points out how our perception of this crime can be shifted if we are informed of the underlying causes. But, under scrutiny, these causes go beyond the control of any of the perpetrators, leaving no extra place for their personal responsibility. He uses it as a high stakes example to make his point.

Of course, if we learned that both these men had been suffering from brain tumors that explained their violent behavior, our moral intuitions would shift dramatically. But a neurological disorder appears to be just a special case of physical events giving rise to thoughts and actions. Understanding the neurophysiology of the brain, therefore, would seem to be as exculpatory as finding a tumor in it. How can we make sense of our lives, and hold people accountable for their choices, given the unconscious origins of our conscious minds? 

And therefore he concludes:

Free will is an illusion. Our wills are simply not of our own making. Thoughts and intentions emerge from background causes of which we are unaware and over which we exert no conscious control. We do not have the freedom we think we have.

Calling Sam Harris a hard determinist seems to be an understatement. Not only does he claim that freedom of will is incompatible with determinism or causality, he claims that it's an inherently incoherent concept in any reasonable universe.

It is important to recognize that the case I am building against free will does not depend upon philosophical materialism (the assumption that reality is, at bottom, purely physical). There is no question that (most, if not all) mental events are the product of physical events. The brain is a physical system, entirely beholden to the laws of nature—and there is every reason to believe that changes in its functional state and material structure entirely dictate our thoughts and actions. But even if the human mind were made of soul-stuff, nothing about my argument would change. The unconscious operations of a soul would grant you no more freedom than the unconscious physiology of your brain does.


Free will is actually more than an illusion (or less), in that it cannot be made conceptually coherent. Either our wills are determined by prior causes and we are not responsible for them, or they are the product of chance and we are not responsible for them.

Harris doesn’t let the popular view that free will somehow benefits from randomness or unpredictability slow him down. Later, he mentions the idea of randomly occurring “self-generated” events in the brain as a justification of free will and quickly dispatches it.

If my decision to have a second cup of coffee this morning was due to a random release of neurotransmitters, how could the indeterminacy of the initiating event count as the free exercise of my will? Chance occurrences are by definition ones for which I can claim no responsibility. And if certain of my behaviors are truly the result of chance, they should be surprising even to me. How would neurological ambushes of this kind make me free?


In the limit, Heisenberg’s “self-generated” mental events would preclude the existence of any mind at all.

I think this is a good point, and appreciate that it was mentioned. Too much conventional discourse is focused on arguing whether determinism is compatible with free will, even though indeterminism is much more at odds with it. One may even notice that if the existence of the mind and its decision making properties requires an ordered universe, this is an evidence in favor of compatibilism

Not Harris, though. While he acknowledge that such qualities as planning for the future, weighting competing desires and conscious awareness are real, and distinguishes voluntary and involuntary decisions, he explicitly states that they have "nothing to do with free will".

Which is a shame. His original claim seemed so bold and strong. But if someone excludes from a definition everything that exists, while including inner contradictions, it is no wonder that we will find the concept to be not real and incoherent. To a degree it can be justified by the fact that Harris is arguing against a naïve layperson’s intuition about free will. But to a more sophisticated reader it can seem bizzare. As though Sam Harris is annoyed by the concept of the present, for instance.

I cannot decide what I will next think or intend until a thought or intention arises. What will my next mental state be? I do not know—it just happens. Where is the freedom in that?

Dealing with confusion

I’d say that Sam Harris is much less confused about free will than most. Not only is he aware of his own confusion, to a point that he can write a book about it, he makes an actual attempt to resolve it. Harris does try to reduce free will to the feeling that “arises from our moment-to-moment ignorance of the prior causes of our thoughts and actions”.  He even grapples with the concept of could-ness:

However, to say that I could have done otherwise is merely to think the thought “I could have done otherwise” after doing whatever I in fact did. This is an empty affirmation.

This is commendable, but not near enough to get an actual insight for a gears-level model of free will. The quotation is not the referent. He doesn't taboo the word "could", doesn't try to figure out the reason for this feeling to exist and what role it plays in our decision making. Excluding decision-making from the concept doesn't help.

Harris is good at pointing out incoherences in other people's reasoning, however, as represented in the book, his own position doesn't seem to be very coherent either. In one place he can claim that such concepts as counterfactuals or responsibility are meaningless, and in the other he uses them himself. When he claimed that "losing a belief in free will  has increased his feelings of freedom" -  I had serious troubles parsing the statement.  May it be him trying to speak to the audience in their own language? But the most obvious incoherence is highlighted when Harris argues against compatibilism only to prove its core points later.

Harris condemns compatibilism as "solving the problem of free will by ignoring it"; changing the definition of free will  to one people don’t actually use. He even compares compatibilism to theology.

Compatibilists have produced a vast literature in an effort to finesse this problem. More than in any other area of academic philosophy, the result resembles theology. (I suspect this is not an accident. The effort has been primarily one of not allowing the laws of nature to strip us of a cherished illusion.)

I think it tells us something important about Harris' reasons for embracing his views. He treats the concept of free will similarly to the concept of God. For him both are confusing, naive intuitions which do not correspond to reality and lead people astray. And if the correct answer to question of theology is to say that God doesn’t exist, whole idea doesn’t actually make any sense, and that we shall all be better without it, grounding our morality and sense of meaning in the real world instead of imaginary entities – why wouldn’t the same be true for the question of free will? Isn’t  Sam Harris just applying consistent strategy to deal with apparently mysterious phenomena?

Except, when he is not.  As Daniel Dennet mentions in his own review to the book, that’s not the course of action Harris takes regarding the concept of mind. He corrects the naïve definition rather than abolishing it. And it’s not what we do in general. An even better example from the same review is sunsets. Now, when we know that geocentrism is wrong and the sun doesn’t actually rise and set, we haven’t got rid of the concept calling it illusion, we’ve changed the definition.

Is it possible to develop a simple consistent policy on what to do when we find out that a definition doesn’t actually make any sense in the light of new evidence? I’m not sure. My intuition is against attempts to reframe “God” as a sense of meaning, compassion and oneness but completely supports the compatibilist definition of free will. Is it the fact that I perceive the concept of God to be too contaminated, unlike the concept of free will? But other people's intuitions can differ which doesn't necessery make them wrong.  If anything, this becomes not a question of fact but of a category border

And this is a good cause for tolerance. For the last couple of years I had problems talking to religious people. I've noticed that I approached them with a smug feeling of superiority, despite my best efforts to be charitable. "Pretend all you want, but you actually know that you are completely right and they are completely wrong" - some voice deep inside me was saying. Trying to persuade myself that it's noble and good to be tolerant, even towards silly ideas, was fruitless. But framing this as a question of a category border really helped to be genuinely curious. My opponents may be wrong about some things, but understanding their worldview and their way to define categories can give valuabale insights about things that I may have been missing. The fact that I got this insight due to Sam Harrris' book is both ironic and very appropriate.

Moral and political implications

I've always had an intuition that hard determinist views are usually a result of painful disenchantment with metaphysical libertarianism. Finding out that their naive intuition of free will is incoherent and/or doesn’t correspond to reality, people swing in the opposite direction, claiming that no free will is possible. However, I get a different feeling from Sam Harris. He seems to be entirely satisfied with the absence of free will, and he spends the last part of the book proclaiming how great it is.

My hopes, fears, and neuroses seem less personal and indelible. There is no telling how much I might change in the future. Just as one wouldn’t draw a lasting conclusion about oneself on the basis of a brief experience of indigestion, one needn’t do so on the basis of how one has thought or behaved for vast stretches of time in the past. A creative change of inputs to the system—learning new skills, forming new relationships, adopting new habits of attention—may radically transform one’s life.

Harris is optimistic that abolishing the concept of free will and therefore metaphysical responsibility and religious sin is going to dramatically improve the criminal justice system, moving its focus from retribution to correction. He is talking from consequentialist position here and I share his moral intuition about the utility of such change. His chapter on political implications mentions that without the illusion of free will, it would be much more obvious how much luck is responsible for personal success and how absurd the conservative "fetish of individualism" is. Such changes would indeed be beneficial, but I'm not sure that abolishing the concept would do the trick. And actually, neither is Harris, at least not completely:

 It must be admitted, however, that the issue of retribution is a tricky one. In a fascinating article in The New Yorker, Jared Diamond writes of the high price we sometimes pay when our desire for vengeance goes unfulfilled.


We are deeply disposed to perceive people as the authors of their actions, to hold them responsible for the wrongs they do us, and to feel that these transgressions must be punished. Often, the only punishment that seems appropriate is for the perpetrator of a crime to suffer or forfeit his life. It remains to be seen how a scientifically informed system of justice might steward these impulses.

I think we can apply the theological metaphor here once again. While for some people their religious beliefs are indeed the reason for their behaviour, for others it's just a rationalization for their other less socially acceptable impulses. The whole religious memeplex is built existing human intuitions in the first place. And people do not necessery act on their beliefs.  That's why raising the sanity waterline is much more important than attacking the religion directly. And that's why it's a bit naive to expect dramatic changes in the penal system due to some philosophical argument, even if, as Harris mentions, U.S. Supreme Court has indeed called free will a “universal and persistent” foundation for the system of law.


In the end, I was surprised how compatibalist the book turned out to be. Despite all the apparent critique of compatibalism, Harris makes mostly the same points and sometimes even uses the same language. Dennet calls him a compatibalist in everything except by name and I tend to agree. Their argument is a textbook example of disputing definitions, as they seem to agree on every objective matter. If we define C-freedom as agency and choice-making ability of the mind, which depends on the causal history, and L-freedom as transcending the laws of causality by being the ultimate source of one's actions, both Harris and Dennet agree that humans have C- but not L-freedom, and that it doesn’t lead to fatalism. 

In a sense Sam Harris has reinvented compatibilism. He comes to the same conclusions but is rallying under the flag of “Free Will Doesn’t Exist”. And while this approach seems unnecessary to me, I suppose it's a valid one.  I'd say in this case Harris is free to define his terms the way he wants. 

Or rather... not free - if that's what he prefers.  


my new shortsight reminder

11 октября, 2021 - 23:06
Published on October 11, 2021 8:06 PM GMT

Some user named Dan L commented on an ACX post:

[...]things can feel more intense now than ever before because more people now care about {things we care about now} than ever before. People used to care less about {things we care about now}, so past times feel like they were less intense. But actually, people used to care *a whole lot* about {things people used to care about}, which we don't care nearly as much about now. So people back then would have felt like things were uniquely intense, and would look at modern disagreements about {things we care about now} and not really get it.This is all very tautological and makes the very dry statistical bias obvious, but requires confronting that things we used to care about aren't the things we care about now, and almost certainly won't be the things we care about in the future. It is not easy to accept that the answer to "who will win this culture war?" will very possibly be "you won't care". Oh, you'll still have an opinion all right, but *this* fire in your belly will fade faster than you thought possible. As it will for nearly everyone else. This too shall pass.

I found this very powerful when I came across it, and I have been trying to keep it in mind ever since.

(See also: https://en.wikipedia.org/wiki/This_too_shall_pass )


Amman, Jordan – ACX Meetups Everywhere 2021

11 октября, 2021 - 23:03
Published on October 11, 2021 8:03 PM GMT

This year's ACX Meetup everywhere in Amman, Jordan

Location: Aristotle Cafe. – ///flashing.wells.mole

Contact: dnledvs@gmail.com


NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

11 октября, 2021 - 18:28
Published on October 11, 2021 3:28 PM GMT

In addition to reporting aggregate metrics on benchmark tasks, we also qualitatively analyzed model outputs and have intriguing findings (Figure 4). We observed that the model can infer basic mathematical operations from context (sample 1), even when the symbols are badly obfuscated (sample 2). While far from claiming numeracy, the model seems to go beyond only memorization for arithmetic.

We also show samples (the last row in Figure 4) from the HANS task where we posed the task containing simple syntactic structures as a question and prompted the model for an answer. Despite the structures being simple, existing natural language inference (NLI) models often have a hard time with such inputs. Fine-tuned models often pick up spurious associations between certain syntactic structures and entailment relations from systemic biases in NLI datasets. MT-NLG performs competitively in such cases without finetuning.

Seems like next big transformer model is here. No way to test it out yet, but scaling seems to continue, see quote.
It is not mixture of experts, so parameters mean something as compared to WuDao (also it beats GPT-3 on PiQA and LAMBADA).

How big of a deal is that?


On Solving Problems Before They Appear: The Weird Epistemologies of Alignment

11 октября, 2021 - 11:20
Published on October 11, 2021 8:20 AM GMT

Crossposted to the EA Forum


Imagine you are tasked with curing a disease which hasn’t appeared yet. Setting aside why you would know about such a disease’s emergence in the future, how would you go about curing it? You can’t directly or indirectly gather data about the disease itself, as it doesn’t exist yet; you can’t try new drugs to see if they work, as the disease doesn’t exist yet; you can’t do anything that is experimental in any way on the disease itself, as… you get the gist. You would be forgiven for thinking that there is nothing you can do about it right now.

AI Alignment looks even more hopeless: it’s about solving a never-before seen problem on a technology which doesn’t exist yet.

Yet researchers are actually working on it! There are papers, books, unconferences, whole online forums full of alignment research, both conceptual and applied. Some of these work in a purely abstract and theoretical realm, others study our best current analogous of Human Level AI/Transformative AI/AGI, still others iterate on current technologies that seem precursors to these kinds of AI, typically large language models. With so many different approaches, how can we know if we’re making progress or just grasping at straws?

Intuitively, the latter two approaches sound more like how we should produce knowledge: they take their epistemic strategies (ways of producing knowledge) out of Science and Engineering, the two cornerstones of knowledge and technology in the modern world. Yet recall that in alignment, models of the actual problem and/or technology can’t be evaluated experimentally, and one cannot try and iterate on proposed solutions directly. So when we take inspiration from Science and Engineering (and I want people to do that), we must be careful and realize that most of the guarantees and checks we associate with both are simply not present in alignment, for the decidedly pedestrian reason that Human Level AI/Transformative AI/AGI doesn’t exist yet.

I thus claim that:

  • Epistemic strategies from Science and Engineering don’t dominate other strategies in alignment research.
  • Given the hardness of grounding knowledge in alignment, we should leverage every epistemic strategy we can find and get away with.
  • These epistemic strategies should be made explicit and examined. Both the ones taken or adapted from Science and Engineering, and every other one (for example the more theoretical and philosophical strategies favored in conceptual alignment research).

This matters a lot, as it underlies many issues and confusions in how alignment is discussed, taught, created and criticized. Having such a diverse array of epistemic strategies is fascinating, but their implicit nature makes it challenging to communicate with newcomers, outsiders, and even fellow researchers leveraging different strategies. Here is a non-exhaustive list of issues that boil down to epistemic strategy confusion:

  • There is a strong natural bias towards believing that taking epistemic strategies from Science and Engineering automatically leads to work that is valuable for alignment.
    • This leads to a flurry of work (often by well-meaning ML researchers/engineers) that doesn’t tackle or help with alignment, and might push capabilities (in an imagined tradeoff with the assumed alignment benefits of the work)
    • Very important: that doesn’t mean I consider work based on ML as useless for alignment. Just that the sort of ML-based work that actually tries to solve the problem tends to be by people who understand that one must be careful when transfering epistemic strategies to alignment.
  • There is a strong natural bias towards disparaging any epistemic strategy that doesn’t align neatly with the main strategies of Science and Engineering (even when the alternative epistemic strategies are actually used by scientists and engineers in practice!)
    • This leads to more and more common confusion about what’s happening on the Alignment Forum and conceptual alignment research more generally. And that confusion can easily turn into the sort of criticism that boils down to “this is stupid and people should stop doing that”.
  • It’s quite hard to evaluate whether alignment research (applied or conceptual) creates the kind of knowledge we’re after, and helps move the ball forward. This comes from both the varieties of epistemic strategies and the lack of the usual guarantees and checks when applying more mainstream ones (every observation and experiment is used through analogies or induction to future regimes).
    • This makes it harder for researchers using different sets of epistemic strategies to talk to each other and give useful feedback to one another.
  • Criticism of some approach or idea often stops at the level of the epistemic strategy being weird.
    • This happens a lot with criticism of lack of concreteness and formalism and grounding for Alignment Forum posts.
    • It also happens when applied alignment research is rebuked solely because it uses current technology, and the critics have decided that it can’t apply to AGI-like regimes.
  • Teaching alignment without tackling this pluralism of epistemic strategies, or by trying to fit everything into a paradigm using only a handful of those, results in my experience in people who know the lingo and some of the concepts, but have trouble contributing, criticizing and teaching the ideas and research they learnt.
    • You can also end up with a sort of dogmatism that alignment can only be done a certain way.

Note that most fields (including many sciences and engineering disciplines) also use weirder epistemic strategies. So do many attempts at predicting and directing the future (think existential risk mitigation in general). My point is not that alignment is a special snowflake, more that it’s both weird enough (in the inability to experiment and iterate directly with) and important enough that elucidating the epistemic strategies we’re using, finding others and integrating them is particularly important.

In the rest of this post, I develop and unfold my arguments in more detail. I start with digging deeper into what I mean by the main epistemic strategies of Science and Engineering, and why they don’t transfer unharmed to alignment. Then I demonstrate the importance of looking at different epistemic strategies, by focusing on examples of alignment results and arguments which make most sense as interpreted through the epistemic lens of Theoretical Computer Science. I use the latter as an inspiration because I adore that field and because it’s a fertile ground for epistemic strategies. I conclude by pointing at the sort of epistemic analyses I feel are needed right now.

Lastly, this post can be seen as a research agenda of sorts, as I’m already doing some of these epistemic analyses, and believe this is the most important use of my time and my nerdiness about weird epistemological tricks. We roll with what evolution’s dice gave us.

Thanks to Logan Smith, Richard Ngo, Remmelt Ellen,, Alex Turner, Joe Collman, Ruby, Antonin Broi, Antoine de Scorraille, Florent Berthet, Maxime Riché, Steve Byrnes, John Wentworth and Connor Leahy for discussions and feedback on drafts.

Science and Engineering Walking on Eggshells

What is the standard way of learning how the world works? For centuries now, the answer has been Science.


I like Peter Godfrey-Smith’s description in his glorious Theory and Reality:

Science works by taking theoretical ideas and trying to find ways to expose them to observation. The scientific strategy is to construe ideas, to embed them in surrounding conceptual frameworks, and to develop them, in such a way that this exposure is possible even in the case of the most general and ambitious hypotheses about the universe.

That is, the essence of science is in the trinity of modelling, predicting and testing.

This doesn’t mean there are no epistemological subtleties left in modern science; finding ways of gathering evidence, of forcing the meeting of model and reality, often takes incredibly creative turns. Fundamental Chemistry uses synthesis of molecules never seen in nature to test the edge cases of its models; black holes are not observable directly, but must be inferred by a host of indirect signals like the light released by matter falling in the black hole or gravitational waves during black holes merging; Ancient Rome is probed and explored through a discussion between textual analysis and archeological discoveries.

Yet all of these still amount to instantiating the meta epistemic strategy “say something about the world, then check if the world agrees”. As already pointed out, it doesn’t transfer straightforwardly to alignment because Human Level AI/Transformative AI/AGI doesn’t exist yet.

What I’m not saying is that epistemic strategies from Science are irrelevant to alignment. But because they must be adapted to tell us something about a phenomenon that doesn’t exist yet, they lose their supremacy in the creation of knowledge. They can help us to gather data about what exists now, and to think about the sort of models that are good at describing reality, but checking their relevance to the actual problem/thinking through the analogies requires thinking in more detail about what kind of knowledge we’re creating.

If we instead want more of a problem solving perspective, tinkering is a staple strategy in engineering, before we know how to solve the problem things reliably. Think about curing cancer or building the internet: you try the best solutions you can think of, see how they fail, correct the issues or find a new approach, and iterate.

Once again, this is made qualitatively different in alignment by the fact that neither the problem nor the source of the problem exist yet. We can try to solve toy versions of the problem, or what we consider analogous situations, but none of our solutions can be actually tested yet. And there is the additional difficulty that Human-level AI/Transformative AI/AGI might be so dangerous that we have only one chance to implement the solution.

So if we want to apply the essence of human technological progress, from agriculture to planes and computer programs, just trying things out, we need to deal with the epistemic subtleties and limits of analogies and toy problems.

An earlier draft presented the conclusion of this section as “Science and Engineering can’t do anything for us—what should we do?” which is not my point. What I’m claiming is that in alignment, the epistemic strategies from both Science and Engineering are not as straightforward to use and leverage as they usually are (yes, I know, there’s a lot of subtleties to Science and Engineering anyway). They don’t provide privileged approaches demanding minimal epistemic thinking in most cases; instead we have to be careful how we use them as epistemic strategies. Think of them as tools which are so good that most of the time, people can use them without thinking about all the details of how the tools work, and get mostly the wanted result. My claim here is that these tools need to be applied with significantly more care in alignment, where they lose their “basically works all the time” status.

Acknowledging that point is crucial for understanding why alignment research is so pluralistic in terms of epistemic strategies. Because no such strategy works as broadly as we’re used to for most of Science and Engineering, alignment has to draw from literally every epistemic strategy it can pull, taking inspiration from Science and Engineering, but also philosophy, pre-mortems of complex projects, and a number of other fields.

To show that further, I turn to some main alignment concepts which are often considered confusing and weird, in part because they don’t leverage the most common epistemic strategies. I explain these results by recontextualizing them through the lens of Theoretical Computer Science (TCS).

Epistemic Strategies from TCS

For those who don’t know the field, Theoretical Computer Science focuses on studying computation. It emerged from the work of Turing, Church, Gödel and others in the 30s, on formal models of what we would now call computation: the process of following a step-by-step recipe to solve a problem. TCS cares about what is computable and what isn’t, as well as how much resources are needed for each problem. Probably the most active subfield is Complexity Theory, which cares about how to separate computational problems in classes capturing how many resources (most often time – number of steps) are required for solving them.

What makes TCS most relevant to us is that theoretical computer scientists excel at wringing knowledge out of the most improbable places. They are brilliant at inventing epistemic strategies, and remember, we need every one we can find for solving alignment.

To show you what I mean, let’s look at three main ideas/arguments in alignment (Convergent Subgoals, Goodhart’s Law and the Orthogonality Thesis) through some TCS epistemological strategies.

Convergent Subgoals and Smoothed Analysis

One of the main argument for AI Risk and statement of a problem in alignment is Nick Bostrom’s Instrumental Convergence Thesis (which also takes inspiration from Steve Omohundro’s Basic AI Drives):

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

That is, actions/plans exist which help with a vast array of different tasks: self-preservation, protecting one’s own goal, acquiring resources... So a Human-level AI/Transformative AI/AGI could take them while still doing what we asked it to do. Convergent subgoals are about showing that behaviors which look like they can only emerge from rebellious robots actually can be pretty useful for obedient (but unaligned in some way) AI.

What kind of argument is that? Bostrom makes a claim about “most goals”—that is, the space of all goals. His claim is that convergent subgoals are so useful that goal-space is almost chock-full of goals incentivizing convergent subgoals.

And more recent explorations of this argument have followed this intuition: Alex Turner et al.’s work on power-seeking formalizes the instrumental convergence thesis in the setting of Markov decision processes (MDP) and reward functions by looking, for every “goal” (a distribution over reward functions) at the set of all its permuted variants (the distribution given by exchanging some states – so the reward labels stay the same, but are not put on the same states). Their main theorems state that given some symmetry properties in the environment, a majority (or a possibly bigger fraction) of the permuted variant of every goal will incentivize convergent subgoals for its optimal policies.

So this tells us that goals without convergent subgoals exist, but they tend to be crowded out by ones with such subgoals. Still, it’s very important to realize what neither Bostrom nor Turner are arguing for: they’re not saying that every goal has convergent subgoals. Nor are they claiming to have found the way humans sample goal-space, such that their results imply goals with convergent subgoals must be sampled by humans with high probability. Instead, they show the overwhelmingness of convergent subgoals in some settings, and consider that a strong indicator that avoiding them is hard.

I see a direct analogy with the wonderful idea of smoothed analysis in complexity theory. For a bit of background, complexity theory generally focuses on the worst case time taken by algorithms. That means it mostly cares about which input will take the most time, not about the average time taken over all inputs. The latter is also studied, but it’s nigh impossible to find the distribution of input actually used in practice (and some problems studied in complexity theory are never used in practice, so a meaningful distribution is even harder to make sense of). Just like it’s very hard to find the distribution from which goals are sampled in practice.

As a consequence of the focus on worst case complexity, some results in complexity theory clash with experience. Here we focus on the simplex algorithm, used for linear programming: it runs really fast and well in practice, despite having provable exponential worst case complexity. Which in complexity-theory-speak means it shouldn’t be a practical algorithm.

Daniel Spielman and Shang-Hua Teng had a brilliant intuition to resolve this inconsistency: what if the worst case inputs were so rare that just a little perturbation would make them easy again? Imagine a landscape that is mostly flat, with some very high but very steep peaks. Then if you don’t land exactly on the peaks (and they’re so pointy that it’s really hard to get there exactly), you end up on the flat surface.

This intuition yielded smoothed analysis: instead of just computing the worst case complexity, we compute the worst case complexity averaged over some noise on the input. Hence the peaks get averaged with the flatness around them and have a low smoothed time complexity.

Convergent subgoals, especially in Turner’s formulation, behave in an analogous way: think of the peaks as goals without convergent subgoals; to avoid power-seeking we would ideally reach one of them, but their rarity and intolerance to small perturbations (permutations here) makes it really hard. So the knowledge created is about the shape of the landscape, and leveraging the intuition of smoothed analysis, that tells us something important about the hardness of avoiding convergent subgoals.

Note though that there is one aspect in which Turner’s results and the smoothed analysis of the simplex algorithm are complete opposite: in the former the peaks are what we want (no convergent subgoals) while in the latter they’re what we don’t want (inputs that take exponential time to treat). This inversion doesn’t change the sort of knowledge produced, but it’s an easy source of confusion.

epistemic analysis isn’t only meant for clarifying and distilling results: it can and should pave the way to some insights on how the arguments could fail. Here, the analogy to smoothed complexity and the landscape picture suggests that Bostrom and Turner’s argument could be interrogated by:

  • Arguing that the sampling method we use in practice to decide tasks and goals for AI targets specifically the peaks.
  • Arguing that the sampling is done in a smaller goal-space for which the peaks are broader
    • For Turner’s version, one way of doing that might be to not consider the full orbit, but only the close-ish variations of the goals (small permutations instead of all permutations). So questioning the form of noise over the choice of goals that is used in Turner’s work.
Smoothed AnalysisConvergent Subgoals (Turner et al)           Possible InputsPossible GoalsWorst-case inputs make steep and rare peaksGoals without convergent subgoals make steep and rare peaksGoodhart’s Law and Distributed Impossibility/Hardness Results

Another recurrent concept in alignment thinking is Goodhart’s law. It wasn’t invented by alignment researchers, but Scott Garrabrant and David Manheim proposed a taxonomy of its different forms. Fundamentally, Goodhart’s law tells us that if we optimize a proxy of what we really want (some measure that closely correlates with the wanted quantity in the regime we can observe), strong optimization tends to make the two split apart, meaning we don’t end up with what we really wanted. For example, imagine that everytime you go for a run you put on your running shoes, and you only put on these shoes for running. Putting on your shoes is thus a good proxy for running; but if you decide to optimize the former in order to optimize the latter, you will take most of your time putting on and taking off your shoes instead of running.

In alignment, Goodhart is used to argue for the hardness of specifying exactly what we want: small discrepancies can lead to massive differences in outcomes.

Yet there is a recurrent problem: Goodhart’s law assumes the existence of some True Objective, of which we’re taking a proxy. Even setting aside the difficulties of defining what we really want at a given time, what if what we really want is not fixed but constantly evolving? Thinking about what I want nowadays, for example, it’s different from what I wanted 10 years ago, despite some similarities. How can Goodhart’s law apply to my values and my desires if there is not a fixed target to reach?

Salvation comes from a basic insight when comparing problems: if problem A (running a marathon) is harder than problem B (running 5 km), then showing that the latter is really hard, or even impossible, transfers to the former.

My description above focuses on the notion of one problem being harder than another. TCS formalizes this notion by saying the easier problem is reducible to the harder one: a solution for the harder one lets us build a solution for the easier problem. And that’s the trick: if we show there is no solution for the easier problem, this means that there is no solution for the harder one, or such a solution could be used to solve the easier problem. Same thing with hardness results which are about how difficult it is to solve a problem.

That is, when proving impossibility/hardness, you want to focus on the easiest version of the problem for which the impossibility/hardness still holds.

In the case of Goodhart’s law, this can be used to argue that it applies to moving targets because having True Values or a True Objective makes the problem easier. Hitting a fixed target sounds simpler than hitting a moving or shifting one. If we accept that conclusion, then because Goodhart’s law shows hardness in the former case, it also does in the latter.

That being said, whether the moving target problem is indeed harder is debatable and debated. My point here is not to claim that this is definitely true, and so that Goodhart’s law necessarily applies. Instead, it’s to focus the discussion on the relative hardness of the two problems, which is what underlies the correctness of the epistemic strategy I just described. So the point of this analysis is that there is another argument to decide the usefulness of Goodhart’s law in alignment than debating the existence of True Value

 RunningAlignmentEasier Problem5kApproximating a fixed target (True Values)Harder ProblemMarathonApproximating a moving target
Orthogonality Thesis and Complexity Barriers

My last example is Bostrom’s Orthogonality Thesis: it states that goals and competence are orthogonal, meaning that they are independent– a certain level of competence doesn’t force a system to have only a small range of goals (with some subtleties that I address below).

That might sound only too general to really be useful for alignment, but we need to put it in context. The Orthogonality Thesis is a response to a common argument for alignment-by-default: because a Human-level AI/Transformative AI/AGI would be competent, it should realize what we really meant/wanted, and correct itself as a result. Bostrom points out that there is a difference between understanding and caring. The AI understanding our real intentions doesn’t mean it must act on that knowledge, especially if it is programmed and instructed to follow our initial commands. So our advanced AI might understand that we don’t want it to follow convergent subgoals while maximizing the number of paperclips produced in the world, but what it cares about is the initial goal/command/task of maximizing paperclips, not the more accurate representation of what we really meant.

Put another way, if one wants to prove alignment-by-default, the Orthogonality thesis argues that competence is not enough. As it is used, it’s not so much a result about the real world, but a result about how we can reason about the world. It shows that one class of arguments (competence will lead to human values) isn’t enough.

Just like some of the weirdest results in complexity theory: the barriers to P vs NP. This problem is one of the biggest and most difficult open questions in complexity theory: settling formally the question of whether the class of problems which can tractably be solved (P for Polynomial time) is equal to the class of problems for which solutions can be tractably checked (NP for Non-deterministic Polynomial time). Intuitively those are different: the former is about creativity, the second about taste, and we feel that creating something of quality is harder than being able to recognize it. Yet a proof of this result (or its surprising opposite) has evaded complexity theorists for decades.

That being said, recall that theoretical computer scientists are experts at wringing knowledge out of everything, including their inability to prove something. This resulted in the three barriers to P vs NP: three techniques from complexity theory which have been proved to not be enough by themselves for showing P vs NP or its opposite. I won’t go into the technical details here, because the analogy is mostly with the goal of these barriers. They let complexity theorists know quickly if a proof has potential – it must circumvent the barriers somehow.

The Orthogonality thesis plays a similar role in alignment: it’s an easy check for the sort of arguments about alignment-by-default that many people think of when learning about the topic. If they extract alignment purely from the competence of the AI, then the Orthogonality Thesis tells us something is wrong.

What does this mean for criticism of the argument? That what matters when trying to break the Orthogonality Thesis isn’t its literal statement, but whether it still provides a barrier to alignment-by-default. Bostrom himself points out that the Orthogonality Thesis isn’t literally true in some regimes (for example some goals might require a minimum of competence) but that doesn’t affect the barrier nature of the result.

Barriers to P vs NPOrthogonality ThesisProof techniques that are provably not enough to settle the questionCompetence by itself isn’t enough to show alignment-by-defaultImproving the Epistemic State-of-the-art

Alignment research aims at solving a never-before-seen problem caused by a technology that doesn’t exist yet. This means that the main epistemic strategies from Science and Engineering need to be adapted if used, and lose some of their guarantees and checks. In consequence, I claim we shouldn’t only focus on those, but use all epistemic strategies we can find to produce new knowledge. This is already happening to some extent, causing both progress on many fronts but also difficulties in teaching, communicating and criticizing alignment work.

In this post I focused on epistemological analyses drawing from Theoretical Computer Science, both because of my love for it and because it fits into the background of many conceptual alignment researchers. But many different research directions leverage different epistemic strategies, and those should also be studied and unearthed to facilitate learning and criticism.

More generally, the inherent weirdness of alignment makes it nigh impossible to find one unique methodology of doing it. We need everything we can get, and that implies a more pluralistic epistemology. Which means the epistemology of different research approaches must be considered and studied and made explicit, if we don’t want to be confusing for the rest of the world, and each other too.

I’m planning on focusing on such epistemic analyses in the future, both for the main ideas and concepts we want to teach to new researchers and for the state-of-the art work that needs to be questioned and criticized productively.


Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions)

11 октября, 2021 - 10:16
Published on October 11, 2021 7:16 AM GMT

This essay will require you to watch three short Youtube videos, totaling less than two minutes.

Naming things is hard.  Generally speaking, a thing should be named evocatively, such that people find it memorable and sticky, or precisely, such that people can reconstruct the concept just from its title.

(So, "Moloch," or "trigger-action planning.")

This essay is about "cup-stacking skills."  It's a noun that I use in phrases like "I think you're exhibiting a cup-stacking skill right now" or "I'm slowly trying to unravel this cup-stacking skill" or "I think we should consult Dave; he has the relevant cup-stacking skill."

Unfortunately, that's not a great name.  Most people, encountering the name, will have to memorize both the concept and the label, rather than having to just memorize the concept and have the label stick, or just memorize the label (and being able to rederive the concept from it).

Sorry.  I've made a genuine effort for the past couple of years to find a better name, and failed.  Since I've failed, I need you to watch three Youtube videos.

Here's the first video.

This is me, in my kitchen, cup-stacking.  It's a fun little game-slash-sport in which you stack and unstack cups in a specific pattern, to see how fast you can go.  It's extremely rewarding once you get even a tiny bit good at it; you can feel things going almost-right and the pattern loops onto itself and it's very easy to just chase that feeling of smoothness for hours at a time.  I've probably put between 50 and 100 hours into cup stacking over the past ten years, though at the time of filming I hadn't pulled them out much at all in the past two.

Here's the second video.

This is my partner Logan, cup-stacking for the very first time, after having watched me demonstrate the pattern exactly twice.  They've got a little card for reference on the table with them, so they know each of the three end-states they're shooting for, but otherwise I told them to not worry about process or technique and just generally do their best to imitate what they'd seen in a low-stress sort of way.

The thing about the literal skill of cup-stacking is that (approximately) "anyone can do it." Even as total beginners, most people can follow the directions and hold the pattern in mind and get the cups to stack up in the right shapes.

There's obviously a big difference between someone who's practiced for 50 hours and someone who's practiced for zero.  But it feels like a quantitative difference, not a qualitative one.  There are little bits of the technique that I am doing that Logan doesn't yet know about, and some basic misunderstandings (they're using their hands symmetrically rather than complementarily), but in-the-absolutely-literal-sense-of-the-word essentially, Logan and I are both attempting, and succeeding at, the same task.

The cups go up, and the cups come down.

This is, in my mind, a pretty solid metaphor for most of what would count as "rationalist skill."  Things like checking for truth, recognizing cognitive biases, zeroing in on cruxes, doing intelligent emotional regulation, and employing formalized techniques like TAPs or goal factoring or Gendlin's Focusing.

All of that stuff is wildly popular with a certain class of nerd (in part) because it's accessible.  You can pick up the core of the concept in the course of a five-minute lecture, and test it out in the course of a five-minute timer.  You can start doing it right away, as a total novice, and see it working, in the way you hoped it would work.  And with 50 hours of practice, it goes much more fluidly and reliably and is integrated much better (though still not perfectly).

This is very different from, say, gymnastics, or learning programming from scratch, where many of the learning paths involve spending a lot of time establishing a foundation of background skills and concepts before ever getting to "the good stuff."

Here is the third video.

I have paused for emphasis.

The third video is still technically just a quantitative improvement over the first two. There are some things Chang Keng Ian is doing right that Logan and I aren't (for instance, he's just letting the cups fall out from under his fingers, rather than wasting time and energy reversing the momentum of his hands and putting them down), but overall it's just the same skill, executed better.

But it's so much better that it has become a different thing entirely.  It's a level beyond what we would feel thoroughly justified calling "mastery."  In particular, there's a way in which "make a tower of cups" has ceased to be an action requiring a series of discrete steps, and has instead become something like a single, atomic motion.

This is what I mean by "cup-stacking skill."

How many repetitions did Chang Keng Ian put in, to achieve that level of instantaneity? My own fastest-ever stack took about fourteen seconds, and my slowest about a minute. At 50 to 100 continuous hours of practice, ignoring mistakes and incomplete rounds, that means I've done somewhere in the range of 3000 to 25000 cycles, most likely leaning heavily toward the lower end.

But it gets easier, and as it gets easier it gets faster, and as it gets easier and faster it gets more rewarding and pays off more reliably.  Once Chang Keng Ian was under ten seconds every time, he could easily get in a hundred and fifty cycles per hour without even trying particularly hard.  With this being one of his main interests, done off and on all day and a couple of hours intensively each afternoon, he could put away a thousand-plus repetitions per week, week in and week out.  It wouldn't even have to be a special week—if he was genuinely training hard, at the six- or seven-second level for hours at a time, he might plausibly complete a thousand reps in one day.  Certainly in one weekend.

By the time you have done something a hundred thousand times, it bears almost no resemblance to the fumbling, hesitant motions of a beginner.

In my household, things were—ostensibly—open to debate.

If you could make a convincing argument as to why something ought to happen, it was indeed possible to change my father's mind.  Even on questions infinitely beyond the reach of most suburban middle-class children—say, getting to stay home from school, or to skip all of your chores, or to have ice cream for dinner.

You just had to be able to lay out the case, in cool, dispassionate logic.

I think that, if asked, most people could construct a cool, dispassionate argument for just about anything they wanted.  It might not pass muster with an actual logician, but you could probably cobble together some relevant facts and glue them in place with a couple of broad and reasonable-sounding principles.

You could make a tower of cups, if you tried.  It might be slow work, and the tower might be a little rickety, but you could do it.

I, though—

Over and over and over and over and over and over and over again, I could get what I wanted, if and only if I could frame the argument such that the thing I wanted was obviously the right thing.  The sensible thing, the justified thing, honestly, I swear, it's not even that I want it so much as that it's, like, objectively indicated by the present state of affairs—wouldn't you agree, Dad?

Perhaps not literally a hundred thousand times, but certainly several orders of magnitude more often than the average human, I have practiced the skill of noticing precisely which perspective makes my position inarguably correct, and persuading my father to adopt that perspective.

What others can do on purpose—what the well-practiced can do quickly—I can do in less than the blink of an eye.

In fact, "do" isn't even the right word.  It's not an act of deliberate intent.  It just happens to me.

Suddenly, a tower of cups appears.

One of my colleagues is unsettled by frames.  Theories, models, stories, philosophies—anything that attempt to coherently explain everything about a given thing.

She can't help it.  She's come into contact with their falsehood, their inadequacy, too many times to count.  Too many times, she's been told that things are a certain way, and felt a note of quiet disagreement, and seen that note of quiet disagreement borne out, in the end.

She became a frame-breaker.  A story-unraveler.  An anti-modeler, often unwilling to endorse even the words that had just come out of her own mouth, seconds earlier. They were just an approximation, like Newtonian mechanics, and it was important not to mistake them for truth.

If I were to present to you a plausible-sounding theory, and ask you if you would perhaps be willing to try to find the flaw in it, you might sit down and start thinking through its implications, looking for contradictions with what you know of how the world works.

By the time you had settled into your chair, my colleague would have already torn the thing asunder.  Identified three fatal flaws with its premises, two absurd consequences emerging from its conclusion, and an infinitely relatable anecdote that made its falsehood not only obvious, but visceral.

Hands flash, and a stack becomes a pyramid.

I had a romantic partner who was abused as a child.

If I try—if I muster my attention and put my empathy to work—I can imagine a string that goes something like:

  • Someone just said X, and their face moved just so as they said it.
  • They really mean to say Y.
  • They didn't come right out and say Y because Z.
  • If I respond with A, they'll be angry.  If I respond with B, they'll be furious.
  • If I say Q, though, this will deflect their attention, turn them in a different direction.
  • And if I say P, this will be almost as good as Q, but with the additional benefit of being non-obvious and plausibly deniable.

My partner had practiced loops like this so many times that she did not even notice herself moving through them.  Could not stop herself from moving through them, if she tried—there was no accessible space between the start and the end, no time to even think the word "wait—"

There was just a trigger, and a response.

Cups, assembling themselves upward at terminal velocity.

These are the characteristics of a cup-stacking skill:

  • It is an adaptive response to something in your past.  It served an instrumental purpose.  It paid off.
  • It's something you did over and over again, like a worker on an assembly line. Something so baked into your context that you were practicing it without even noticing, after a while.
  • It happens blindingly quickly—so quickly that, if you do in fact manage to unpack it, and describe all of the steps, people will often literally not believe that your brain could have executed all of them so quickly, and will think that you're making it up.
  • It's the sort of thing anyone could do, and some people are really quite visibly skilled at.  But the thing you're doing goes beyond "visibly skilled."  (Did you know that they had to film Bruce Lee at 32 frames-per-second, because the industry standard 24 fps was too slow to capture his movements?)

And lastly (and most unnervingly):

  • It's the sort of skill you might be completely unaware that you're executing, and might possibly not be able to stop executing—at least not by just telling yourself "stop."  It's like looking at a fish and trying not to categorize it as a fish.

Not everyone has a cup-stacking skill.*  Not everyone experienced the preconditions to develop one.

But everyone I know who has identified one in themselves experiences it as a sort of Greek curse.  I've been working quite hard for the last four years on not reflexively wrenching the frame around to whatever is maximally convenient for my goals, so hard that it leaves others disoriented, and I'm still only successful part of the time.  My colleague said words I interpreted as wishing she could at least build things out of solid blocks sometimes, when she wanted to, rather than living perpetually in mutable uncertainty.  My romantic partner was extremely good at detecting stealth hostility and deflecting incoming abuse—at the cost of running everything through a filter that took ill intent for granted, and always found something it needed to dodge.

Once you do gain control of a cup-stacking skill, it can be something of a minor superpower.  You can accomplish, in a flash of intuitive insight, what takes everyone else minutes or hours of deliberate effort to do.

But until that point, and especially if you're unaware of it, you don't really have it.  It more-or-less has you.

* A reader points out "nonsense. everyone has dozens of cup-stacking skills. most of them are just close to universal. such as walking."  To which I reply "True.  But not everyone has a unique and idiosyncratic cup-stacking skill that has control over them under certain circumstances."  To which they reply "yes but many of the near-universal cup-stacking skills also have control over almost everyone under certain circumstances, which is maybe what i actually wanted to point out."



Outdoor dancing is likely very safe

11 октября, 2021 - 04:40
Published on October 11, 2021 1:40 AM GMT

After the spontaneous contra dance at Porchfest, I'm helping organize another one. I wanted to get a better sense of how much covid risk an attendee would be taking, so I ran some numbers on microcovid. If everyone is masked and vaccinated, I count ~2.2 microcovids:

  • ~1.7 from your partner. While your partner is not the only person your head gets close to, you're this close to at most one person at a time, so for simplicity assume its your current partner.
  • ~0.2 from your neighbors and next/previous neighbors.
  • ~0.2 from your next/previous hands fours.
  • ~0.08 from the hands fours one farther away.

If you have multiple lines close together, you could ~double these numbers. Other social dances are likely ~half as risky.

This is a very low level of risk: about 1% of a cautious risk budget of 200 microcovids/week (1% risk of covid/year).

I wish I'd run these numbers sooner: this is probably our last chance for an outdoor dance in Boston before spring.

An outdoor dance in October 2013

We may end up dancing indoors this winter. Over the next few months I think our communities are likely to move away from treating covid as something where we have a duty to make substantial sacrifices to limit spread. Once everyone is vaccinated who wants to be, including boosters and approving the vaccine for kids, I think people will view the tradeoffs very differently.

Comment via: facebook


Good AI alignment online class?

11 октября, 2021 - 03:48
Published on October 11, 2021 12:48 AM GMT


I am fascinated with AI alignment, and I am interested in diving deeper into the area. I am going through the recommended readings listed on MIRI Research Guide, but I wish there were structured online classes covering similar content. I enjoy reading textbooks and learning, but having a structure and a community of people with similar goals would improve my learning experience and my grasp of these powerful ideas.

My employer gives a yearly allowance for structured learning (not self-directed learning). I have wanted to take advantage of it and use it to do some paid online apprenticeship on AI, focusing on alignment. However, the closest thing I have found is these online masters degrees in CS from Georgia Tech and other universities. Given that I work full-time, doing a masters degree would be infeasible. Is there anything worthwhile and high quality that can address my need? Preferably with a low time commitment so I can realistically manage it alongside full-time work.

The reasons I am looking for an online course rather than an in-person course are the COVID-19 pandemic and the fact that I live in Melbourne, Australia. Hence, the options for studying AI alignment in-person are limited or non-existent.

Academic background: I graduated with an undergraduate degree in computer science two years ago.


Budapest Less Wrong/ SSC

10 октября, 2021 - 19:14
Published on October 10, 2021 4:14 PM GMT

The meetup will be between 2 pm and 5 pm on Sunday.  I'll send the door number and floor when you RSVP. 


Taking advice from Milan, I've decided that it will probably be too cold for everyone to be comfortable meeting outdoors next Sunday, so I'm going to have the meetup at my own apartment, which is a big space and perfect for hanging out indoors -- we'll probably open the windows up while hanging out, for slightly more Covid protection, but the main defence will be that we all ought to be vaccinated. Also we have a cat and a dog for anyone for whom that is a concern.

Anyway: Here are two articles from Scott that I've suggested for reading and discussion.



Timothy Underwood


Secure homes for digital people

10 октября, 2021 - 18:50
Published on October 10, 2021 3:50 PM GMT

Being a “digital person” could be scary—if I don’t have control over the hardware I’m running on, then someone else could get my code and run tons of copies in horrible conditions. (See also: qntm’s Lena.)

It would be great to guarantee digital people some control over their situation: 1. to control their local environment and sensations, 2. to avoid unauthorized rewinding or duplicating.

I’ll describe how you could modify the code of a digital person so that they retain this control even if an adversary has access to their source code. This would be very expensive with current cryptography. I think the overhead will eventually become cheap enough that it’s possible to do for some digital people, though it will likely remain expensive enough that it is never applied to most digital people (and with luck most digital people will be able to feel secure for other reasons).

Part 1: the right to control my environment My ideal
  • I live in a comfortable virtual home. I control all of the details of that world.
  • When people communicate with me, I can choose how/whether to hear them, and how/whether to update my home based on what they say (e.g. to render an avatar for them)
  • Sometimes I may occupy a virtual world where a foreign server determines what I see, feel, or hear. But even then I can place boundaries on my experiences and have the ability to quickly retreat to my home.
  • I have as much control as feasible over my own mental state and simulated body. No one else can tamper directly with them.
  • I can choose to pause myself for as long as I want (or permanently).
  • My local environment is private, and I have access to plenty of tamper-proof storage. I can do whatever I want with computers in my home, including e.g. verifying signatures or carrying on encrypted conversations.
  1. First we write a simple environment that reflects all my desiderata (the “home”).
  2. Then I apply indistinguishability obfuscation to (me + home), so that the house becomes private and tamper-proof. (This is an extremely expensive operation, more on that later.)
  3. I distribute the obfuscated home and hopefully destroy any unprotected copies of myself.

One conceptual difficulty is that indistinguishability obfuscation applies to circuits whereas I would like to obfuscate a long-running program. But this can be handled straightforwardly, as discussed in Appendix A.

The home could consume terabytes of memory and teraflops of compute before it added significantly to the expense of running a human-like digital person, so I could live in relative luxury. The home could also negotiate resource requirements with the external world, and to decide what to do when requested resources are unavailable (e.g. to pause until it becomes available).

Limitation 1: cost

Indistinguishability obfuscation is extremely expensive, more like a factor of 10000000000 slowdown than 10.

It will get faster with further research, but probably not fast enough to obfuscate the whole person+home. But there are other ways to speed up the process:

  • I think it’s probably possible to have most of the computation be “merely” homomorphically encrypted, and to have an obfuscated controller which verifies and decrypts the results. FHE could be much faster than obfuscation; if I had to guess I’d say it would converge to something like 2-3 orders of magnitude of slowdown.
  • We can potentially have an obfuscated controller verify a much larger untrusted computation. I don’t know how fast we can make delegated computation, but I could imagine it getting closer to 2x than 100x. It might help further that we are not applying these methods to generic problems but to a very specific structured problem (which probably has quite low circuit depth). One complication is that we need our proof system to be secure even against an adversary who can unwind the prover, but I don’t think this is a huge deal.
  • Delegating computation would preserve integrity but not security. So the computation we delegate may need to already be private. Here it seems likely that we can benefit a lot from the structure of the computation. Almost all of our operations are in doing a brain simulation, and we don’t really care about leaking the fact that we are doing a brain simulation, just about leaking the state of the brain. I don’t know how fast this can be made but again I would not be surprised by a factor of 2.

It’s pretty unclear how fast this could get, either from taking some of these techniques to their limits or from thinking of other cleverer ideas. I would not be at all surprised by getting the whole thing down to a factor of 2 slowdown. That said, I also think it’s quite plausible that you need 10x or 10000x.

Limitation 2: security?

The cryptography used in this construction may end up getting broken—whether from a mistaken security assumption, or because the future contains really giant computers, or because we implemented it badly.

The software used in my home may get compromised even if the cryptography works right. An adversary can provide trillions of malicious inputs to find one that lets them do something unintended like exfiltrate my code. With modern software engineering this would be a fatal problem unless the home was extremely simple, but in the long run writing a secure home is probably easier than writing fast enough cryptography.

I may be persuaded to output my source code, letting an adversary run it. I might not give myself the ability to inspect my own source, or might tie my hands in other ways to limit bad outcomes, but probably I can still end up in trouble given enough persuasion. This is particularly plausible if an adversary can rewind and replay me.

Limitation 3: rewinding

In the best case, this scheme guarantees that an attacker can only use my code as part of a valid execution history. But for classical computers there is no possible way to stop them from running many valid execution histories.

An attacker could save a snapshot of me and then expose it to a billion different inputs until they found one in which I responded in a desired way. (Even if I’m cagey enough to avoid this attack in most possible situations, they just have to find one situation where I let my guard down and then escalate from there.) Or I could have revealed information to the outside world that I no longer remember because I’ve been reset to an earlier state.

Someone living in this kind of secure house is protected from the worst abuses, but they still can’t really trust the basic nature of their reality and are vulnerable to extreme manipulation.

This brings us to part 2.

Part 2: the right to a single timeline My ideal
  • No one should be able to make a second copy of me without my permission, or revert me to a previous state.
  • I should be able to fork deliberately. I can’t force someone to run a second copy of me, but I should be able to give specific permission.
Implementation with trusted hardware

This is easy to achieve if we have a small piece of trusted tamper-resistant hardware that can run cheap computations. We use the same mechanism as in the last section, but:

  • The trusted hardware has a secret key sk, and it maintains an internal counter k.
  • On input x, the trusted hardware signs (x, k) and increments the counter.
  • Whenever someone provides my obfuscated controller an input and tries to step it forward, the obfuscated controller first checks to see that the input has been signed by the trusted controller with the correct timestep.
  • In order to make a copy, I need to have the public key of another piece of trusted hardware, which I use to initialize a new copy. (Ideally, the manufacturer signs the public key of each piece of trusted hardware they built, and I know the manufacturer’s public key.)

If I were willing to make a round trip to a trusted third party every time I received a novel input, then I could have them implement this function directly instead of using tamper-proof hardware. The real critical ingredient is me trusting someone on the outside. I’ll discuss how to potentially remove this assumption in the section on quantum computers below.

None of this actually requires my house to be built to guarantee the right to a single timeline—I could start without such a right, and then install a wrapper to enforce a single timeline once there was some hardware I trusted or if it became important enough.

Implementation with 1-of-2 transfer

Suppose that the only kind of trusted hardware is a device that holds two secrets, and will reveal one or the other of them when asked but not both. I think this is somewhat easier to build than general trusted hardware. (Related: locking phones with quantum bits.)

Now suppose there is a trusted party who manufactures a bunch of these devices, with a public key pk. Each device a serial number n, and its two secrets are signatures from pk: one of (n, 0) and one of (n, 1)

I can still achieve a single-timelines guarantee:

  • At each time, I have a list of 100 secure devices I expect to use to verify my next step. When I am initially created I have to choose 100 devices that someone will need to run me.
  • Each time someone wants to step me forward, they need to provide both the input and the serial numbers for 100 new secure devices to use for the next step.
  • I then apply a cryptographic hash function to get a 100 bit string. I output this string and wait for a response before stepping forward the simulation.
  • They use the nth bit of my string to decide which secret to reveal from their nth device. I verify the signatures for all 100 secrets.
  • If the adversary tries to run me with two different inputs, or to have two different sets of 100 devices to use for the next steps, they will end up with two different 100 bit string challenges (unless they find a hash collision). Stepping me forward for both strings would require getting the signatures for two different bits, which is impossible if the secure devices work as intended and the signature scheme remains secure.
Implementation with quantum computers

We can probably remove the need for a trusted party if we have quantum computers big enough to perform delegated cryptographic operations (but no big enough quantum computers to run an entire digital person in superposition).

The vague idea in this section looks to me like it should work—I’d give it at least 85% odds—but would require some new ideas and analysis. It’s on much sketchier ground than any of the ideas in previous sections

The basic idea is to replace the secure trusted hardware with a small quantum computer that holds a quantum secret. At all times the obfuscated controller knows the secret, but it is impossible for anyone external to get two copies of the secret. When we ask the obfuscated controller to advance its simulation, it will first ask the quantum computer to transform the secret in an irreversible way and to provide proof that it has done so.

To achieve this we can basically repurpose a protocol for delegated quantum computation and verifying quantumness of a single quantum device, which is roughly what you need if you don’t trust the external world at all and so need to treat it as one device (it’s an amusing coincidence that those papers are by more old officemate Urmila Mahadev and that I worked with her on this problem in another life). To briefly summarize the trick:

  • The operator knows 100 hard-to-invert functions f_i.
  • For each i, the untrusted quantum computer has a superposition over two values x for which f_i(x) is equal.
  • By measuring the state, the untrusted quantum computer could collapse the superposition to a single pre-image x, but it is provably hard for them to find both pre-images.
  • The prover can also *prove* that they have both values by applying a hadamard transform.
  • This means that the prover has a given superposition, but they don’t actually know the parts of the superposition, and so the only way they can manipulate it is by applying quantum gates.
  • The obfuscated machine can then tell the untrusted quantum computer a sequence of encrypted operations to perform on this data in order to transform it into a new set of secrets, and then can ask for a set of measurements to both verify that this has been done correctly and ensure that it is irreversible. This step is the one that would require an expert in the field to spend some time.
  • If that actually works then they can use the same idea to fork by stretching the 100 secrets to 400, verifying 200 to confirm that everything was done correctly, and then splitting the remaining 200 into two new secrets, one used by each of my 2 copies.

This protocol fails if an attacker can run the entire home in superposition, since then they can simply rewind the entire process. But if the attacker can’t run the secure home in superposition, then you can extract a classical copy of each input on which they run the home, and so running it on two different inputs would allow you to obtain classical copies of two challenge-responses and so violate the assumption.

This protocol probably provides better support for forking, since I can probably direct the external quantum computer to produce two new secrets without having to find a source of additional trusted hardware.


  • You only obtain security-against-copying *after* you have set up the quantum secret. You can do this by performing a simple protocol from inside your home, and it doesn’t require trusting the quantum device at all. But if an adversary has a copy of you from before you’ve performed this protocol, they could create many copies of you and perform the ritual separately for each of them. (It actually takes some extra work to ensure that an adversary who had a copy of you from before the protocol couldn’t simply break the scheme—I think that can be done but I’m not sure.)
  • There will eventually be quantum computers with quadrillions of qbits, and at that point an attacker (who has access to the final version of the quantum computer) could make many copies of me. If I was naive they could also revive any prior snapshots of me, but I could prevent that if I want to by asking the quantum computer to periodically shred and refresh its secret.
  • For each new input someone wants to send to my home, they need to first consult with a quantum computer. The total cost of the quantum computation is not likely to be too large, but having quantum computers “on site” might be logistically challenging, and round trips could introduce significant latency.
Appendix A: obfuscation for uniform computations

Suppose that I want to obfuscate the program that repeatedly applies the circuit C to a state, i.e. we start from some initial state S[0], then we repeatedly compute (S[t+1], output[t]) = C(S[t], input[t]).

We’ll instead produce an obfuscated “controller” C’, and an appropriate initial state S'[0]. A legitimate operator with access to C’ can simulate my original program, whereas a malicious operator will not be able to do anything other than running multiple copies of me, rewinding to old snapshots, or killing me prematurely.

C’ contains a secret cryptographic key sk. When it receives an input (S'[t], input[t]) it does the following operations:

  • First verify that S'[t] is signed with sk.
  • Then decrypt S'[t] with sk in order to obtain S[t].
  • Now apply C(S[t], input[t]) to obtain (S[t+1], output[t])
  • Now encrypt and sign S[t+1] to obtain S'[t+1]
  • Output (S'[t+1], output[t])

The analysis is left as an easy exercise for the reader (famous last words, especially hazardous in cryptography).

The same idea can be used to obfuscate other kinds of uniform computation, e.g. providing access to secure RAM or having many interacting processors.


Without a phone for 10 days

10 октября, 2021 - 10:20
Published on October 10, 2021 5:19 AM GMT

I woke up this morning to a bricked Google Pixel 4. After taking it to a local repair shop for a diagnosis, I was told that a fuse had been blown on the motherboard. A board-level repair would cost half as much as a brand new phone, and I was thinking about upgrading to the new Pixel 6 once it came out later this month. After spending a few hours sorting out account details and learning about replacement options with my carrier I learned that it would cost me $150 to get a replacement by this coming Monday. What good would it do to pay $150 for a phone that I would only have for a week until upgrading?

And that’s when I realized I had stumbled into a very unique moment in which I had every reason to attempt something I had been hesitantly curious to try: living without a phone. After all, the Pixel 6 was rumored to launch only ten days from now on October 19th. And if I decide at the end that life is better with a smartphone, then I’ll get one.

Okay so there are a few things I’m a bit worried about. The most obvious one is that I’ll be unreachable to close family and friends during this time. Ten days isn’t a ton of time, so I decided to email those closest to me to tell them about this experiment. A less obvious problem is that I’ll be unable to do typical two-factor authentication, which my university and some other services periodically require. The good news is that I have backup codes saved on my laptop, but it’s kind of a hassle.

I’m very curious to see how this will turn out. I’m hoping that I’ll appreciate the disconnection so much that I won’t want to go back to smartphones. I’ll likely still want the basic call and text functionality, so maybe I’ll go with a simpler phone. I had heard of the lightphone before and loved the idea, but was afraid of giving up apps like Uber for emergencies. Today I looked into some other “feature phones” and discovered the Nokia 6300, the Punkt. MP02, and the Mudita Pure. Anyway, I’ll probably write at least one more post on this experiment when the ten days are up.

Has anyone else made the switch away from their smartphone? Would love to hear about it below.


The Extrapolation Problem

10 октября, 2021 - 08:11
Published on October 10, 2021 5:11 AM GMT

In a previous post I predicted that machine learning is limited by the algorithms it uses (as opposed to hardware or data). This has two big implications:

  • Scaling up existing systems will not render human thinking obsolete.
  • Replacing the multilayer perceptron with a fundamentally different algorithm could disrupt[1] the entire AI industry.

Today's artificial neural networks (ANNs) require far more data to learn a pattern than human beings. This is called the sample efficiency problem. Some people think the sample efficiency problem can be solved by pouring more data and compute into existing architectures. Other people (myself included) believe the sample efficiency problem must be solved algorithmically. In particular, I think the sample efficiency problem is related to the extrapolation problem

Extrapolation and Interpolation

Machine learning models are trained from a dataset. We can think of this dataset as a set of points in vector space. Our training data is bounded which means it's confined to a finite region of vector space. This region is called the interpolation zone. The space outside the interpolation zone is called the extrapolation zone.

Consider an illustraion from this paper. Ground truth is black. Training data is green. An ANN's output trained on the training data is in blue.

A human being could look at the green training data and instantly deduce the pattern in black. The ANN performs great in the interpolation zone and completely breaks down in the extrapolation zone. The ANNs we use today can't generalize to the extrapolation zone.

One way around this problem is to paper over it with big data. I agree that big data is a practical solution to many real-world problem. However, I do not believe it is a practical solution to all real-world problems. An algorithm that can't extrapolate is an algorithm that can't extrapolate. Period. No amount of data or compute you feed it will get it to extrapolate. Adding data just widens the interpolaton zone. If a problem requires extrapolation then an ANN can't do it―at least if you're using today's methods.

Under this definition of extrapolation, GPT-3 is bad at extrapolation. An example of extrapolation would be if GPT-n.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} could invent technology in advance of training data (including prompts) from before the technology was invented. For example, if GPT-n could produce schematics for a transistor using only training data from 1930 or earlier then my prediction would be falsified. This is theoretically possible because the field-effect transistor was proposed in 1926 and the Schrödinger equation was actually published in 1926. Feeding GPT-n the correct values of universal constants is allowed. GPT-n would have access to much more compute than was available when the transistor was actually invented in 1947. It'd just be a matter of doing the logic and math, which I predict GPT-n will be forever incapable of.

Other architectures exist which are better at extrapolation.

Neural Ordinary Differential Equations (ODEs)

My predictions on AI timelines were shortened by Neural Ordinary Differential Equations. The math is clever and elegant but what's really important is what it lets you do. Neural ODEs can extrapolate.

The Neural ODE paper was published in 2019. I predict the first person to successfully apply this technique to quantitative finance will become obscenely rich. If you become a billionaire from reading this article, please invite me to some of your parties.

  1. I'm using "disrupt" the way Clayton Christensen does in his 1997 book The Innovator's Dilemma. ↩︎


Betting That the S&P 500 Will Drop Over 30 Percent (i.e. Below 3029)

10 октября, 2021 - 07:02
Published on October 10, 2021 4:02 AM GMT

If someone knows the best way for an Australian to buy US Put options, please let me know.

This post is somewhat unfinished, unedited, and much less detailed than I wanted it to be due to a new job that I’ve changed to.

Empirical prediction

My credence that the S&P 500 index will drop below 3029 at some point before this time next year, i.e. over a 30 percent decrease from current levels (at the time of writing), is much higher compared to both the market’s likely estimate (because rational investors would sell their stocks if they thought the index was overvalued at the current level, which at the time of writing is 4327) and compared to my base rate of such large crashes. If I had to estimate my current credence of a 30 percent devaluation happening within the next year (i.e. before July 16, 2022), I’d put it at around 65 percent.


As you might see from the dates being behind the date at which this was posted, I’ve been writing this for a while. I’ve delayed posting this because I can’t fully justify the probabilities I’m assigning. I take methodology quite seriously (i.e. how do I know that my line of reasoning reliably leads to true conclusions), so I hate filling out my argument with weakly justified back-of-the-napkin calculations and gut feelings. But that’s all I’ve got here. So take everything I’ve written with a grain of salt.

The market actively fights any prediction. If an indicator of a future market movement is found, it will tend to become less effective over time, as other people learn to adjust their bets to accommodate that information. This means that all of my justifications based on historical data could be very wrong. I’ll be careful to, in each of these instances, give you reasons why a particular historical indicator might not indicate what it once did (note: ran out of time to do this).

To be clear, this is not financial advice. This is an empirical prediction with a justification that might have critical errors. I’m not a permabear, but I think the market conditions are heavily weighting the probability distribution of returns on the negative side of graph. To give you an idea of my confidence in my prediction, I sold all my stocks on 12 July 2021; the week after, I explained my position to some of my loved ones, and they sold their retirement stocks and held cash soon after; currently, I am deciding how to safely execute a large short position. So if I’ve made a mistake in my prediction, let me know.

The base rate of a now-to-trough drop of 30 percent

To calculate my base rate for a greater than 30 percent drop, I used the daily S&P 500 data since January 1986 (roughly when US inflation and US interest rates stabilized). (This data simply tracks the index without reinvested dividends, which is not ideal). On any random date since then, the probability of more than a 30 percent decrease at some point during the next year is 6.1 percent. This was way higher than I expected, so let me be extremely clear on what it means.

Put your left index finger on a random point on the S&P 500 graph between January 1986 and July 2020. Put your right index finger on the graph a year later. Put a thumb at the lowest point on the graph between your two index fingers. The probability that your thumb is more than 30 percent lower than your left index finger is 6.1 percent. This can be true even if your right index finger is higher than your left index finger. So why use this metric?

Why estimate now-to-trough?

I want to address a particular error I’ve seen a lot of people make. To quote a friend, “Stocks go up and down all the time, but if I leave it in there for 40 years, it’ll almost certainly be higher than it is now.” I actually agree with that statement, but it does not imply that you can’t get higher returns by temporarily selling your stocks (or shorting the market). If you knew with certainty that the market would drop 50 percent within the next month, you’d be a fool if you didn’t sell all your stocks now and then rebuy stocks closer to the bottom of the crash. The fact that we have to assign probabilities to crashes, rather than give guarantees, does not change anything of substance—it just adds extra steps in the maths. Suppose there’s only a 10 percent chance of the market crashing 30 percent tomorrow and a 90 percent chance of it going up 3 percent, then it would still be prudent to put your money into cash even though it’s a low probability event: Given those odds, it would be negative expected value to keep your wealth in stocks at that point, so there is no profitable Kelly bet.

The fact that the stocks will ultimately be higher does not mean that the people who sell now are making a mistake: There are mechanistic reasons why markets crash even if it’s guaranteed that the S&P 500 will be higher in 40 years than it is now (which I think it basically is). However, if we are going to try to “time the market”, we need to estimate a probability distribution for the highest peak and another for the lowest lows. If we sell too early, we lose. If we think the market will drop lower than it actually does, we’ll be waiting for a time that never comes. It is incredibly hard to correctly time the market, which is why investors who attempt it typically underperform the market in the long run. I’d generally recommend always having long positions, even when you think a crash is likely and that those stocks will fall in a market crash. This is to decrease your maximum drawdown if you’ve bet too early on a bubble. Then keep increasing the size of your short positions as the market rises (without an underlying reason for higher valueations, and decrease the short as the market drops.

Are we in the 6.1 percent?

Assuming these odds of now-to-trough drops of over 30 percent haven’t changed too much, the question then is “How much more likely is it that we are in that 6.1 percent today?”

The S&P’s CAPE ratio and PS ratio

The cyclically-adjusted price-earnings (CAPE) ratio of the S&P 500 is over double its historical average, and it’s a metric designed to evaluate when the stock market is overvalued. The only time the CAPE ratio has ever been higher is the lead up to the Dot Com crash. Right now, the CAPE is higher than it was at the market peak before the 1929 crash that led to the Great Depression. If the market does fall, and if this metric is reliable, the market has a long way to go down.

If you run a regression on the CAPE ratio to either the average now-to-peak or now-to-trough, you see a negative relationship at all timescales. I.e., as the CAPE increases, your expected upside is lower and your expected downside is greater, whatever your investment time horizon is. In my rough model, when the CAPE is below 33, peaks exceed troughs at all timescales. At 34, troughs are bigger only on a timescale of 1 to 5 months, but at all other times, the peaks are bigger. At a CAPE of 36.6, the 12-month trough finally exceeds the 12-month peak, with 6 months being the biggest gap. In July, the actual CAPE was around 38.

The price-to-sales ratio (PS ratio) of the S&P500 tells a similar story, being at more than double its median value since Jan 2001. Looking at data that goes back to 1993, this is the highest the PS ratio has ever been. The ratio is 38 percent higher than what it was before the Dot Com Crash.

Note that there could be reasons why the CAPE and PS ratios might not return to their historical average, e.g. increased savings chasing investments, which are not perfectly elastic. There may be risk of persistent inflation, which stocks hedge against. But I don’t think anything that’s happened justifies a CAPE of 38 when its average is 17, except perhaps perpetually low interest rates, and there’s some bad news on that front.

US inflation is high and interest rates are very low

At the time of writing, annualized month-to-month US inflation has been over double the long-run target which, if it persists, will force the Fed to take actions that increase interest rates earlier than their 2023 forecast. Higher than normal inflation is something that the Fed predicted, but inflation went beyond the Fed’s predictions, being higher and more persistent. The Fed is arguing that this inflation is transitory. What they do not have, however, is good justification for why they think all the extra inflation is transitory, which was pointed out by former U.S. Treasury Secretary Larry Summers.

Even if inflation “should” be transitory, people’s mere expectation of higher inflation can cause higher inflation: A shop that expects the money it receives to be worth less will raise its prices, and if all shops do so, then the money is indeed worth less. Because inflation is heavily influenced by expectation, it’s hard to predict with precision. The labour market is also going to create inflationary pressure. Wage growth is at a recent high, which can cause inflation: More dollars are competing for the same goods.

In order to reduce inflation, interest rates usually have to increase.

As I’ve been writing this, the Fed has brought forward its reduction in asset purchases, while also stating that it will not increase the interest rate earlier than intended. However, when the Fed reduces its purchases of bonds, they move the demand curve for all bonds to the left, which decreases the price of the bonds, which raises effective interest rates for everyone except the banks.

Interest rates significantly affect asset prices. Unexpected increases in the interest rate increase the discount rate that investors use in their (explicit or implicit) net present value calculations, which decreases the true value of stocks. Rational investors will sell until the trading price reaches that new true value.

The federal debt-to-GDP ratio is very high

When the Fed buys assets, it is adding to the demand side, meaning higher prices and lower interst rates. The Fed is now reducing asset purchases. When they raise interest rates on the US’s enormous debt, the US is more likely to default, which would likely cause a huge decrease in the S&P 500 index. A mere increase in the likelihood of default should cause a decrease in the index.

Margin debt to GDP ratio is at an all-time high

The total money that is in margin loans is at both an absolute all-time high, and a relative all-time high compared to GDP. Right before the Dot Com crash, the margin-debt-to-GDP ratio peaked at 3 percent. Right before the 2008 crash, the ratio peaked just below 3 percent. Today, the ratio is at 3.9 percent. This is the highest it has ever been. Before each of these three major crashes, the ratio rose rapidly over the course of a few months, rising 52% before 2000 and 60% in 2007. From the bottom of the Covid crash, the rise has been 77%.

However, increasing levels of automation in the financial markets causes me to expect higher margin-debt-to-GDP ratios in correctly valued markets. One reason for this is that when a computer program finds arbitrage, it can, theoretically, leverage infinitely with zero risk. Fewer companies were doing this in the early 2000s. Another reason may be that more investors are better maximizing long-run growth. To misuse a term (but get the idea across), they’re “Kelly betting”, which would require increasing leverage when the S&P 500 is undervalued, and decreasing leverage when the S&P 500 is overvalued. We can actually see that the troughs of the margin-debt-to-GDP ratio has gotten higher each time (though, this is not strong evidence, as there are only four datapoints). The ratio hasn’t been below 2% since late 2012. And will you look at that, I just had a quick look, and from 1960 to 1995, the ratio never went above 1%.

China’s housing bubble

China’s housing bubble is truly enormous. The China narrative I’ve heard for most of my life is that it will overtake the US and become the world’s dominant country. I strongly disagree with this claim. Government systems and culture are hard to quantitatively evaluate, but I also think they’re probably the most important drivers of long-run economic growth. Good systems, in any domain (not just government), ultimately have some robust logical justification. In the case of public policy and market rules, each good system provably satisfies some optimality criteria (free markets, for instance, satisfy Pareto efficiency). For the CCP to declare themselves as “Communist” means they are likely to disregard much granularity that good policy must have (the same can be said of policy makers who cheerlead Capitalism, who have privatised things in such a way that the incentives of the new system produce incredibly poor results, such as the US prison system).

When you have a government that identifies as Communist, that’s a threat that needs to be taken seriously. The evidence I’ve looked at this threat to evaluate this threat is anecdotal, which is often wrongly criticised. Certain anecdotes have much stronger weight than other anecdotes, which is often not considered by people who don’t understand Bayesian reasoning. Anecdotes are data, and data should update beliefs by an amount that’s subject to the strength in your belief of your priors, and the likelihood of the data occurring under the hypothesis you’re updating relative to the likelihood of the data occurring under all other hypotheses. Some anecdotes are incredbly strong evidence.

In this case, my anecdotal evidence is from Winston Sterzel and Matthew Tye, two Westerners who both lived in China for over a decade, both married Chinese women, and both had to escape China with their families under the threat of arrest under false charges of spying. (One of their Canadian friends was arrested, and subsequently released hours after Canada released an executive of Huawei who was legitimately arrested.) They have both experienced China’s extreme housing bubble. They’ve talked about

  • the CCP’s concerns about a housing bubble and how they’ve implemented loan restrictions
  • how married couples are getting on-paper divorces so that they can increase the loans they can legally attain
  • how people take out additional, illegal loans under the names of their family members and friends
  • the price of housing is something like 40x the annual salary
  • that if a house is previously lived-in, the buyer of a property will pay less because of a common superstition that you inherent the previous tenant’s bad luck, so a lot of these houses aren’t producing rent, meaning that house prices go well above the net present value of rental payments
  • the poor quality of these buildings and poor maintence, leading to cracks in the walls and apartments being demolishes before the 70 year lease is up (land in China is not privately owned, only leased by the CCP)

In early July, I told a friend that buying housing in China was like investing in a banana that you’re not going to eat. The investing, cultural, and political environments that these two people describe makes me very worried for the people of China, and we’ve seen before that large housing bubbles can take down the world economy. (I was not aware of any of particular housing businesses, such as Evergrande, when I wrote this.)


There are a lot of triggers for a potential sharp short-term drop in the S&P 500 index. Every major leading indicator is flashing red, which tells me the chance of a drop over 30 percent is much more likely than it normally is. Either forced sell offs (due to margin calls) or sell offs due to fear could cause a massive and rapid selling cascade. At the time of posting, the S&P 500 is 1.4 percent higher than when I made my prediction. I still claim there is a 65 percent chance that drop to 3029 will occur sometime before July 16, 2022.