Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 5 минут 49 секунд назад

[AN #127]: Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment

1 час 5 минут назад
Published on December 2, 2020 6:20 PM GMT

Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.

Audio version here (may not be up yet).

Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer.

This newsletter is an extended summary of the recently released Cartesian frames sequence.

Cartesian Frames(Scott Garrabrant) (summarized by Rohin): The embedded agency sequence (AN #31) hammered in the fact that there is no clean, sharp dividing line between an agent and its environment. This sequence proposes an alternate formalism: Cartesian frames. Note this is a paradigm that helps us think about agency: you should not be expecting some novel result that, say, tells us how to look at a neural net and find agents within it.

The core idea is that rather than assuming the existence of a Cartesian dividing line, we consider how such a dividing line could be constructed. For example, when we think of a sports team as an agent, the environment consists of the playing field and the other team; but we could also consider a specific player as an agent, in which case the environment consists of the rest of the players (on both teams) and the playing field. Each of these are valid ways of carving up what actually happens into an “agent” and an “environment”, they are frames by which we can more easily understand what’s going on, hence the name “Cartesian frames”.

A Cartesian frame takes choice as fundamental: the agent is modeled as a set of options that it can freely choose between. This means that the formulation cannot be directly applied to deterministic physical laws. It instead models what agency looks like “from the inside”. If you are modeling a part of the world as capable of making choices, then a Cartesian frame is appropriate to use to understand the perspective of that choice-making entity.

Formally, a Cartesian frame consists of a set of agent options A, a set of environment options E, a set of possible worlds W, and an interaction function that, given an agent option and an environment option, specifies which world results. Intuitively, the agent can “choose” an agent option, the environment can “choose” an environment option, and together these produce some world. You might notice that we’re treating the agent and environment symmetrically; this is intentional, and means that we can define analogs of all of our agent notions for environments as well (though they may not have nice philosophical interpretations).

The full sequence uses a lot of category theory to define operations on these sorts of objects and show various properties of the objects and their operations. I will not be summarizing this here; instead, I will talk about their philosophical interpretations.

First, let’s look at an example of using a Cartesian frame on something that isn’t typically thought of as an agent: the atmosphere, within the broader climate system. The atmosphere can “choose” whether to trap sunlight or not. Meanwhile, in the environment, either the ice sheets could melt or they could not. If sunlight is trapped and the ice sheets melt, then the world is Hot. If exactly one of these is true, then the world is Neutral. Otherwise, the world is Cool.

(Yes, this seems very unnatural. That’s good! The atmosphere shouldn’t be modeled as an agent! I’m choosing this example because its unintuitive nature makes it more likely that you think about the underlying rule, rather than just the superficial example. I will return to more intuitive examples later.)


A property of the world is something like “it is neutral or warmer”. An agent can ensure a property if it has some option such that no matter what environment option is chosen, the property is true of the resulting world. The atmosphere could ensure the warmth property above by “choosing” to trap sunlight. Similarly the agent can prevent a property if it can guarantee that the property will not hold, regardless of the environment option. For example, the atmosphere can prevent the property “it is hot”, by “choosing” not to trap sunlight. The agent can control a property if it can both ensure and prevent it. In our example, there is no property that the atmosphere can control.

Coarsening or refining worlds

We often want to describe reality at different levels of abstraction. Sometimes we would like to talk about the behavior of various companies; at other times we might want to look at an individual employee. We can do this by having a function that maps low-level (refined) worlds to high-level (coarsened) worlds. In our example above, consider the possible worlds {YY, YN, NY, NN}, where the first letter of a world corresponds to whether sunlight was trapped (Yes or No), and the second corresponds to whether the ice sheets melted. The worlds {Hot, Neutral, Cool} that we had originally are a coarsened version of this, where we map YY to Hot, YN and NY to Neutral, and NN to Cool.


A major upside of Cartesian frames is that given the set of possible worlds that can occur, we can choose how to divide it up into an “agent” and an “environment”. Most of the interesting aspects of Cartesian frames are in the relationships between different ways of doing this division, for the same set of possible worlds.

First, we have interfaces. Given two different Cartesian frames <A, E, W> and <B, F, W> with the same set of worlds, an interface allows us to interpret the agent A as being used in place of the agent B. Specifically, if A would choose an option a, the interface maps this to one of B’s options b. This is then combined with the environment option f (from F) to produce a world w.

A valid interface also needs to be able to map the environment option f to e, and then combine it with the agent option a to get the world. This alternate way of computing the world must always give the same answer.

Since A can be used in place of B, all of A’s options must have equivalents in B. However, B could have options that A doesn’t. So the existence of this interface implies that A is “weaker” in a sense than B. (There are a bunch of caveats here.)

(Relevant terms in the sequence: morphism)

Decomposing agents into teams of subagents

The first kind of subagent we will consider is a subagent that can control “part of” the agent’s options. Consider for example a coordination game, where there are N players who each individually can choose whether or not to press a Big Red Button. There are only two possible worlds: either the button is pressed, or it is not pressed. For now, let’s assume there are two players, Alice and Bob.

One possible Cartesian frame is the frame for the entire team. In this case, the team has perfect control over the state of the button -- the agent options are either to press the button or not to press the button, and the environment does not have any options (or more accurately, it has a single “do nothing” option).

However, we can also decompose this into separate Alice and Bob subagents. What does a Cartesian frame for Alice look like? Well, Alice also has two options -- press the button, or don’t. However, Alice does not have perfect control over the result: from her perspective, Bob is part of the environment. As a result, for Alice, the environment also has two options -- press the button, or don’t. The button is pressed if Alice presses it or if the environment presses it. (The Cartesian frame for Bob is identical, since he is in the same position that Alice is in.)

Note however that this decomposition isn’t perfect: given the Cartesian frames for Alice and Bob, you cannot uniquely recover the original Cartesian frame for the team. This is because both Alice and Bob’s frames say that the environment has some ability to press the button -- we know that this is just from Alice and Bob themselves, but given just the frames we can’t be sure that there isn’t a third person Charlie who also might press the button. So, when we combine Alice and Bob back into the frame for a two-person team, we don’t know whether or not the environment should have the ability to press the button. This makes the mathematical definition of this kind of subagent a bit trickier though it still works out.

Another important note is that this is relative to how coarsely you model the world. We used a fairly coarse model in this example: only whether or not the button was pressed. If we instead used a finer model that tracked which subset of people pressed the button, then we would be able to uniquely recover the team’s Cartesian frame from Alice and Bob’s individual frames.

(Relevant terms in the sequence: multiplicative subagents, sub-tensors, tensors)

Externalizing and internalizing

This decomposition isn’t just for teams of people: even a single “mind” can often be thought of as the interaction of various parts. For example, hierarchical decision-making can be thought of as the interaction between multiple agents at different levels of the hierarchy.

This decomposition can be done using externalization. Externalization allows you to take an existing Cartesian frame and some specific property of the world, and then construct a new Cartesian frame where that property of the world is controlled by the environment.

Concretely, let’s imagine a Cartesian frame for Alice that represents her decision on whether to cook a meal or eat out. If she chooses to cook a meal, then she must also decide which recipe to follow. If she chooses to eat out, she must decide which restaurant to eat out at.

We can externalize the high-level choice of whether Alice cooks a meal or eats out. This results in a Cartesian frame where the environment chooses whether Alice is cooking or eating out, and the agent must then choose a restaurant or recipe as appropriate. This is the Cartesian frame corresponding to the low-level policy that must pursue whatever subgoal is chosen by the high-level planning module (which is now part of the environment). The agent of this frame is a subagent of Alice.

The reverse operation is called internalization, where some property of the world is brought under the control of the agent. In the above example, if we take the Cartesian frame for the low-level policy, and then internalize the cooking / eating out choice, we get back the Cartesian frame for Alice as a unified whole.

Note that in general externalization and internalization are not inverses of each other. As a simple example, if you externalize something that is already “in the environment” (e.g. whether it is raining, in a frame for Alice), that does nothing, but when you then internalize it, that thing is now assumed to be under the agent’s control (e.g. now the “agent” in the frame can control whether or not it is raining). We will return to this point when we talk about observability.

Decomposing agents into disjunctions of subagents

Our subagents so far have been “team-based”: the original agent could be thought of as a supervisor that got to control all of the subagents together. (The team agent in the button-pressing game could be thought of as controlling both Alice and Bob’s actions; in the cooking / eating out example Alice could be thought of as controlling both the high-level subgoal selection as well as the low-level policy that executes on the subgoals.)

The sequence also introduces another decomposition into subagents, where the superagent can be thought of as a supervisor that gets to choose which of the subagents gets to control the overall behavior. Thus, the superagent can do anything that either of the subagents could do.

Let’s return to our cooking / eating out example. We previously saw that we could decompose Alice into a high-level subgoal-choosing subagent that chooses whether to cook or eat out, and a low-level subgoal-execution subagent that then chooses which recipe to make or which restaurant to go to. We can also decompose Alice as being the choice of two subagents: one that chooses which restaurant to go to, and one that chooses which recipe to make. The union of these subagents is an agent that first chooses whether to go to a restaurant or to make a recipe, and then uses the appropriate subagent to choose the restaurant or recipe: this is exactly a description of Alice.

(Relevant terms in the sequence: additive subagents, sub-sums, sums)

Committing and assuming

One way to think about the subagents of the previous example is that they are the result of Alice committing to a particular subset of choices. If Alice commits to eating out (but doesn’t specify at what restaurant), then the resulting frame is equivalent to the restaurant-choosing subagent.

Similarly to committing, we can also talk about assuming. Just as commitments restrict the set of options available to the agent, assumptions restrict the set of options available to the environment.

Just as we can union two agents together to get an agent that gets to choose between two subagents, we can also union two environments together to get an environment that gets to choose between two subenvironments. (In this case the agent is more constrained: it must be able to handle the environment regardless of which way the environment chooses.)

(Relevant terms in the sequence: product)


The most interesting (to me) part of this sequence was the various equivalent definitions of what it means for something to be observable. The overall story is similar to the one in Knowledge is Freedom: an agent is said to “observe” a property P if it is capable of making different decisions based on whether P holds or not.

Thus we get our first definition of observability: a property P of the world is observable if, for any two agent options a and b, the agent also has an option that is equivalent to “if P then a else b”.

Intuitively, this is meant to be similar to the notion of “inputs” to an agent. Intuitively, a neural net should be able to express arbitrary computations over its inputs, and so if we view the neural net as “choosing” what computation to do (by “choosing” what its parameters are), then the neural net can have its outputs (agent options) depend in arbitrary ways on the inputs. Thus, we say that the neural net “observes” its inputs, because what the neural net does can depend freely on the inputs.

Note that this is a very black-or-white criterion: we must be able to express every conditional policy on the property for it to be observable; if even one such policy is not expressible then the property is not observable.

One way to think about this is that an observable property needs to be completely under the control of the environment, that is, the environment option should completely determine whether the resulting world satisfies the property or not -- nothing the agent does can matter (for this property). To see this, suppose that there was some environment option e that didn’t fully determine a property P, so that there are agent options a and b such that the world corresponding to (a, e) satisfies P but the one corresponding to (b, e) does not. Then our agent cannot implement the conditional policy “if P then b else a”, because it would lead to a self-referential contradiction (akin to “this sentence is false”) when the environment chooses e. Thus, P cannot be observable.

This is not equivalent to observability: it is possible for the environment to fully control P, while the agent is still unable to always condition on P. So we do need something extra. Nevertheless, this intuition suggests a few other ways of thinking about observability. The key idea is to identify a decomposition of the agent based on P that should only work if the environment has all the control, and then to identify a union step that puts the agent back together, that automatically adds in all of the policies that are conditional on P. I’ll describe these definitions here; the sequence proves that they are in fact equivalent to the original definition above.

First, recall that externalization and internalization are methods that allow us to “transfer” control of some property from the agent to the environment and vice versa. Thus, if all the control of P is in the environment, one would hope that internalization followed by externalization just transfers the control back and forth. In addition, when we externalize P, the externalization process will enforce that the agent can condition on P arbitrarily (this is how it is defined). This suggests the definition: P is observable if and only if internalizing P followed by externalizing P gives us back the original frame.

Second, if the environment has all of the control over P, then we should be able to decompose the agent into two parts: one that decides what to do when P is true, and one that decides what to do when P is false. We can achieve this using assumptions, that is, the first agent is the original agent under the assumption that P is true, and the second is under the assumption that P is false. Note that if the environment didn’t have perfect control over P, this would not work, as the environment options where P is not guaranteed to be true or false would simply be deleted, and could not be reconstructed from the two new agents.

We now need to specify how to put the agents back together, in a way that includes all the conditional policies on P. There are actually two variants in how we can do this:

In the first case, we combine the agents by unioning the environments, which lets the environment choose whether P is true or not. Given how this union is defined, the new agent is able to specify both what to do given the environment’s choice, as well as what it would have done in the counterfactual case where the environment had decided P differently. This allows it to implement all conditional policies on P. So, P is observable if and only if decomposing the frame using assumptions on P, and then unioning the environments of the resulting frames gives back the original frame.

In the second case, after getting agents via assumption on P, you extend each agent so that in the case where its assumption is false, it is as though it takes a noop action. Intuitively, the resulting agent is an agent that is hobbled so that it has no power in worlds where P comes out differently than was assumed. These agents are then combined into a team. Intuitively, the team selects an option of the form “the first agent tries to do X (which only succeeds when P is true) and the second agent tries to do Y (which only succeeds when P is false)”. Like the previous decomposition, this specifies both what to do in whatever actual environment results, as well as what would have been done in the counterfactual world where the value of P was reversed. Thus, this way of combining the agents once again adds in all conditional policies on P. So, P is observable if and only if decomposing the frame using assumptions on P, then hobbling the resulting frames in cases where their assumptions are false, and then putting the agents back in a team, is equivalent to the original frame.


Cartesian frames do not have an intrinsic notion of time. However, we can still use them to model sequential processes, by having the agent options be policies rather than actions, and having the worlds be histories or trajectories rather than states.

To say useful things about time, we need to broaden our notion of observables. So far I’ve been talking about whether you can observe binary properties P that are either true or false. In fact, all of the definitions can be easily generalized to n-ary properties P that can take on one of N values. We’ll be using this notion of observability here.

Consider a game of chess where Alice plays as white and Bob as black. Intuitively, when Alice is choosing her second move, she can observe Bob’s first move. However, the property “Bob’s first move” would not be observable in Alice’s Cartesian frame, because Alice’s first move cannot depend on Bob’s first move (since Bob hasn’t made it yet), and so when deciding the first move we can’t implement policies that condition on what Bob’s first move is.

Really, we want some way to say “after Alice has made her first move, from the perspective of the rest of her decisions, Bob’s first move is observable”. But we know how to remove some control from the agent in order to get the perspective of “everything else” -- that’s externalization! In particular, in Alice’s frame, if we externalize the property “Alice’s first move”, then the property “Bob’s first move” is observable in the new frame.

This suggests a way to define a sequence of frames that represent the passage of time: we define the Tth frame as “the original frame, but with the first T moves externalized”, or equivalently as “the T-1th frame, but with the Tth move externalized”. Each of these frames are subagents of the original frame, since we can think of the full agent (Alice) as the team of “the agent that plays the first T moves” and “the agent that plays the T+1th move and onwards”. As you might expect, as “time” progresses, the agent loses controllables and gains observables. For example, by move 3 Alice can no longer control her first two moves, but she can now observe Bob’s first two moves, relative to Alice at the beginning of the game.

Rohin's opinion: I like this way of thinking about agency: we’ve been talking about “where to draw the line around the agent” for quite a while in AI safety, but there hasn’t been a nice formalization of this until now. In particular, it’s very nice that we can compare different ways of drawing the line around the agent, and make precise various concepts around this, such as “subagent”.

I’ve also previously liked the notion that “to observe P is to be able to change your decisions based on the value of P”, but I hadn’t really seen much discussion about it until now. This sequence makes some real progress on conceptual understanding of this perspective: in particular, the notion that observability requires “all the control to be in the environment” is not one I had until now. (Though I should note that this particular phrasing is mine, and I’m not sure the author would agree with the phrasing.)

One of my checks for the utility of foundational theory for a particular application is to see whether the key results can be explained without having to delve into esoteric mathematical notation. I think this sequence does very well on this metric -- for the most part I didn’t even read the proofs, yet I was able to reconstruct conceptual arguments for many of the theorems that are convincing to me. (They aren’t and shouldn’t be as convincing as the proofs themselves.) However, not all of the concepts score so well on this -- for example, the generic subagent definition was sufficiently unintuitive to me that I did not include it in this summary.


I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email.


An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.


Recursive Quantilizers II

3 часа 59 минут назад
Published on December 2, 2020 3:26 PM GMT

I originally introduced the recursive quantilizers idea here, but didn't provide a formal model until until my recent Learning Normativity post. That formal model had some problems. I'll correct some of those problems here. My new model is closer to HCH+IDA, and so, is even closer to Paul Christiano style systems than my previous.

However, I'm also beginning to suspect that quantilizers aren't the right starting point. I'll state several problems with quantilizers at the end of this post.

First, let's reiterate the design criteria, and why the model in Learning Normativity wasn't great.


Here are the criteria from Learning Normativity, with slight revisions. See the earlier post for further justifications/intuitions behind these criteria.

  1. No Perfect Feedback: we want to be able to learn with the possibility that any one piece of data is corrupt.
    1. Uncertain Feedback: data can be given in an uncertain form, allowing 100% certain feedback to be given (if there ever is such a thing), but also allowing the system to learn significant things in the absence of any certainty.
    2. Reinterpretable Feedback: ideally, we want rich hypotheses about the meaning of feedback, which help the system to identify corrupt feedback, and interpret the information in imperfect feedback. To this criterion, I add two clarifying criteria:
      1. Robust Listening: in some sense, we don't want the system to be able to "entirely ignore" humans. If the system goes off-course, we want to be able to correct that.
      2. Arbitrary Reinterpretation: at the same time, we want the AI to be able to entirely reinterpret feedback based on a rich model of what humans mean. This criterion stands in tension with Robust Listening. However, the proposal in the present post is, I think, a plausible way to achieve both.
  2. No Perfect Loss Function: we don't expect to perfectly define the utility function, or what it means to correctly learn the utility function, or what it means to learn to learn, and so on. At no level do we expect to be able to provide a single function we're happy to optimize. This is largely due to a combination of Goodhart and corrupt-feedback concerns.
    1. Learning at All Levels: Although we don't have perfect information at any level, we do get meaningful benefit with each level we step back and say "we're learning this level rather than keeping it fixed", because we can provide meaningful approximate loss functions at each level, and meaningful feedback for learning at each level. Therefore, we want to be able to do learning at each level.
    2. Between-Level Sharing: Because this implies an infinite hierarchy of levels to learn, we need to share a great deal of information between levels in order to learn meaningfully. For example, Occam's razor is an important heuristic at each level, and information about what malign inner optimizers look like is the same at each level.
  3. Process Level Feedback: we want to be able to give feedback about how to arrive at answers, not just the answers themselves.
    1. Whole-Process Feedback: we don't want some segregated meta-level which accepts/implements our process feedback about the rest of the system, but which is immune to process feedback itself. Any part of the system which is capable of adapting its behavior, we want to be able to give process-level feedback about.
    2. Learned Generalization of Process Feedback: we don't just want to promote or demote specific hypotheses. We want the system to learn from our feedback, making generalizations about which kinds of hypotheses are good or bad.
Failed Criteria

The previous recursive-quantilization model failed some criteria:

  • No reinterpretable feedback. I didn't provide any method for achieving that.
  • No whole-process feedback. The way I set things up, the initial distributions are judged only on their later consequences. This leaves them wide open to inner optimizers and other problematic reasoning steps.
    • We can fix this by allowing the user to give direct feedback on the initial distributions as well, but then there's no mechanism for Learned Generalization of that particular feedback. So we're caught in the middle, unable to satisfy both those criteria at once.

The current proposal solves both problems, and due to an analogy to iterated amplification, may also be more computationally feasible.

The New Proposal

 Like iterated amplification, the new proposal consists of both an idealized definition of aligned behavior (HCH, in the context of iterated amplification) and a candidate approximation of this ideal (like iterated amplification itself, which is supposed to approximate HCH).

The Ideal

The object which quantilization will select on will be referred to as "question-answering systems", or QAS for short. This is what I called a "UTAA" in the previous post. As before, this is one object which has opinions about the safe distribution for quantilization (you can ask it "what's a safe distribution over QAS to quantilize on?"), and as value function ("give me a value function to judge the quality of QAS") and as the object-level solution to whatever problem you're trying to get this whole setup to solve (you ask it your object-level questions).

So the goal is to get a really good QAS, where "really good" means highly capable and highly aligned. Since we're specifying the ideal here, we get to use unbounded processing power in doing so.

Humans provide an initial "safe" distribution on QASs, and an initial loss function which will be used to judge QASs for their quality. We then quantilize. This yields a new distribution over QASs, which we use to define another stage of quantilization (this time with safe distribution and value function coming from the QASs themselves), and so on, until we reach an equilibrium where an additional step does not add anything.

As before, we can imagine this as an infinite or very large tree of quantilization:

(Note that, previously, I treated Q as a three-argument stochastic function, requiring a safe distribution, a value function (AKA loss function), and a parameter telling it how much to optimize. Here, I am suppressing the third parameter. Perhaps we keep it fixed, or perhaps we ask a third QAS for it, as before -- I have no improved suggestions for how we estimate that parameter at the moment.)

The tree structure means that a QAS is never evaluating itself; it is being evaluated by another QAS sampled from an independent multi-level quantilization, which can spot mistakes in its own way.

Now, loss functions in general (including both the initial human-given loss function, and the improved loss functions proposed by the QASs) need access to the entire experience of the system. The initial human-gives loss function may be something like "pick out explicit training feedback humans give, and translate it into a numerical score for a QAS, ignoring natural-language comments made by the humans, evidence from the context, etc." The loss functions proposed by the QAS are supposed to get cleverer, interpreting natural-language feedback given by humans, using contextual clues, etc.

You might ask, "shouldn't we build simple natural-language understanding into the initial loss function?" -- we could, but that's supposed to be redundant. There should be no reason to train a separate natural-language understanding system, rather than use explicit feedback to train the QAS to understand natural-language feedback.

Uncertain Feedback

The initial loss function should have the ability to learn from uncertain feedback, to satisfy the Uncertain Feedback criterion. (We can implement one of the options mentioned in Learning Normativity.)

Balancing Robust Listening and Arbitrary Reinterpretation

The idealized objective presented in this section offers a nice combination of "robust listening" (not ever reaching a state where it totally fails to respond to human feedback) and "arbitrary reinterpretation" (being able to use rich models to totally reinterpret the meaning of human feedback). These two objectives would seem to be at odds, since a system with an arbitrary model of the meaning of human feedback can ignore anything.

In the proposal here, direct human feedback always shapes the initial dynamic. If you see a system do something stupid and add direct feedback against this to the data, then QASs which engage in this behavior will be ruled out in the first step of quantilization. You can also add strong evidence against QAS which fail to judge QASs harshly for violating such principles (and which fail to punish non-punishers, recursively). So, human feedback always has total control to shape the ultimate result.

At the same time, however, only the initial dynamic is required to obey this feedback in a naive way. At later stages, QASs are supposed to understand human feedback better, identifying feedback which should be interpreted less literally. 

This means the equilibrium result is both fully dependent on a 100% literal interpretation of human feedback, and at the same time, fully the result of a more careful, nuanced interpretation of what humans meant.

Process-Level Feedback

Also, it's important to note that the initial loss function needs to understand process-level feedback. You can imagine that the dataset includes the following:

  • Extensive human-generated Q&A behavior, much like would be needed for Iterative Amplification. This is the bulk of what loads general world-knowledge into QAS that come out of the first round of quantilization.
  • Extensive labeling of QAS as "ok" and "bad", to demonstrate how to examine QAS source code (/ neural network / etc) and evaluate when something looks wrong. This could include many toy examples which illustrate bad behavior, as well as real examples gathered during training. The QAS should be trained to imitate this behavior when evaluating other QAS.
  • Similarly, examples of humans examining QAS execution and labeling steps as bad, rather than just examining the source code directly.

You can imagine humans engaging in an iterative process where they examine the behavior of many QAS created by early attempts at this process, which creates more training data for later iterations. However, since we're still only describing the ideal, that's not realistic; humans never have access to the true output of the ideal process.

Let's move on to describing the iterative-amplification analogue.

The Implementation Proposal

The basic idea is to take iterative amplification, and replace the HCH-like "amplification" operation with quantilization-based amplification.

At every stage of the process, we're not learning a single QAS, but rather, learning a distribution over QAS.

Define the distribution A1.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  as the result of quantilizing the safe distribution given by humans, using the value function given by humans.

The distribution D1 is the distillation of A1; ie, we learn a more computationally efficient approximation of the distribution A1 using some distribution-modeling technique such as GANs.

The distribution A2 is defined by sampling two QAS from D1, asking one for the value function and the other for the safe distribution, and quantilizing.

The distribution D2 is obtained by learning an efficient approximation of A2.

And so on. An is defined as the output of a quantilizer which takes its value function and safe distribution from (independent) samples from Dn−1. Dn is a learned approximation of An.

If the distillation step were perfect, this would converge to the desired equilibrium.

Note, however, that this assumes that all human feedback is available from the beginning. Realistically, we want more interactive training, so that humans can observe the intermediate outputs of the process and provide new training data to account for problems. The vast majority of training data involving QAS evaluation may come from such interaction.

I visualize this as "iterated amplification takes us upward, but not rightward" in the following coordinates:

If we start amplifying something with very little human-input knowledge, we're likely to get total nonsense out; the QASs just don't know enough to optimize any coherent thing over successive amplification steps. If we start with a moderate amount of knowledge, the QASs might sputter into nonsense, or might get on the right track, but also have a high risk of producing highly-capable but malign agents. At some level of sufficient human input, we fall into the "basin of corrigibility" and get aligned outputs no matter what (perhaps even the same exact benign output).

Yet, it's hard to provide everything up-front. So, more realistically, the picture might look like this:

The golden line is the standard set by the idealized model of recursive quantilizers, where we have all the training data before starting the amplification process. The green line is a more realistic line, where we make amplification moves and increase training data in a mixed way, but avoid over-amplifying an under-trained model.

It's really important to stay close to the x-axis early in the process, because the system itself is determining how to evaluate the loss function -- so it's evaluating the very meaning of the training data (in line with the Reinterpretable Feedback criterion). It is therefore very important that we don't let the system drift too far in the direction of an extreme reinterpretation of the training data (in line with the Robust Listening criterion). At the very start of the training process, I imagine you'd often restart the training process from scratch with all the new feedback in place, rather than trust the system to understand new data.

In other words, we don't expect x-dimension moves to work if we're too high in the y-dimension:

Unfortunately, it's difficult to know what the region is where x-dimension moves work, so it's difficult to know when amplification would keep us within that region vs take us out of it.

Another way to put it: this implementation puts the "robust listening" criterion at serious risk. The partially amplified agent can easily stop listening to human feedback on important matters, about which it is sure we must be mistaken.

Really, we would want to find an engineering solution around this problem, rather than haphazardly steering through the space like I've described. For example, there might be a way to train the system to seek the equilibrium it would have reached if it had started with all its current knowledge.

Comparison to Iterated Amplification

Because this proposal is so similar to iterated amplification, it bears belaboring the differences, particularly the philosophical differences underlying the choices I've made.

I don't want this to be about critiquing iterated amplification -- I have some critiques, but the approach here is not mutually exclusive with iterated amplification by any means. Instead, I just want to make clear the philosophical differences.

Both approaches emphasize deferring the big questions, setting up a system which does all the philosophical deliberation for us, rather than EG providing a correct decision theory.

Iterated Amplification puts humans in a central spot. The amplification operation is giving a human access to an (approximate) HCH -- so at every stage, a human is making the ultimate decisions about how to use the capabilities of the system to answer questions. This plausibly has alignment and corrigibility advantages, but may put a ceiling on capabilities (since we have to rely on the human ability to decompose problems well, creating good plans for solving problems).

Recursive quantilization instead seeks to allow arbitrary improvements to the deliberation process. It's all supervised by humans, and initially seeded by imitation of human question-answering; but humans can point out problems with the human deliberation process, and the system seeks to improve its reasoning using the human-seeded ideas about how to do so. To the extent that humans think HCH is the correct idealized reasoning process, recursive quantilization should approximate HCH. (To the extent it can't do this, recursive quantilization fails at its goals.)

One response I've gotten to recursive quantilization is "couldn't this just be advice to the human in HCH?" I don't think that's quite true.

HCH must walk a fine line between capability and safety. A big HCH tree can perform well at a vast array of tasks (if the human has a good strategy), but in order to be safe, the human must operate under set of restrictions, such as "don't simulate unrestricted search in large hypothesis spaces" -- with the full set of restrictions required for safety yet to be articulated. HCH needs a set of restrictions which provide safety without compromising capability.

In Inaccessible Information, Paul draws a distinction between accessible and inaccessible information. Roughly, information is accessible if we have a pretty good shot at getting modern ML to tell us about it, and inaccessible otherwise. Inaccessible information can include intuitive but difficult-to-test variables like "what Alice is thinking", as well as superhuman concepts that a powerful AI might invent.

A powerful modern AI like GPT-3 might have and use inaccessible information such as "what the writer of this sentence was really thinking", but we can't get GPT-3 to tell us about it, because we lack a way to train it to.

One of the safety concerns of inaccessible information Paul lists is that powerful AIs might be more capable than aligned AIs due to their ability to utilize inaccessible information, where aligned AIs cannot. For example, GPT-5 might use inhuman concepts, derived from its vast experience predicting text, to achieve high performance. A safe HCH would never be able to use those concepts, since every computation within the HCH tree is supposed to be human-comprehensible. (Therefore, if the result of Iterated Amplification was able to use such concepts, we should be worried that it did not successfully approximate HCH.)

Paul proposes learning the human prior as a potential solution. As I understand it, the basic idea is that HCH lacks Deep Learning's ability to absorb vast quantities of data and reach new conclusions. By learning the human prior, Paul seeks to learn the human response to those vast quantities of data. This would allow an HCH-like approach to learn the same "alien concepts" that a misaligned AI might learn.

I don't really understand how Paul sees HCH and learned-priors as fitting together, Recursive Quantilization seeks to get around this difficulty by training the QASs on lots of data in a way similar to big-data ML. As I emphasized before, recursive quantilization seeks to allow arbitrary improvements to the reasoning process, so long as they are improvements.

Of course, recursive quantilization also needs to restrict the QAS reasoning processes via safety constraints (using process-level feedback to rule out types of reasoning which can't be verified as safe). This could lead to the same problems with inaccessible information. But if it does, that's because no human-approved reasoning can approve of reasoning which utilizes the inaccessible information -- not even after many rounds of amplifying the human-seeded knowledge about how to judge reasoning as safe/unsafe. So, I would argue that if an approach broadly like recursive quantilization can't do it, then no robustly aligned method can do it

Reasons why quantilizers might not be appropriate:

As I mentioned at the beginning, I've been feeling like quantilizers might be the wrong starting point for the Learning Normativity agenda. There's still some hope, but there are a few problems which make me want to look for other directions.

Quantilizers bound risk, but iteration increases risk arbitrarily.

The whole argument behind quantilization is as follows:

  1. The safe distribution has below c probability of catastrophe.
  2. The limited optimization samples from the top q of the safe distribution.
  3. So, the risk of catastrophe from the limited optimization is at most c/q.

In other words, the chance of catastrophic risk is increased exactly in proportion to how much we optimize.

Recursive quantilization does this over and over and over again. Naively, as we seek the fixed point, it seems like our bound on catastrophic risk gets worse and worse until we have no bound at all.

There could be a way around this. We could get a better risk bound, which took into account the improved optimization from each step in the recursive quantilization. The story could be something like this:

  • In each quantilization step, we have an overwhelming chance of increasing the quality of value functions and safe distributions proposed in the next round.
  • This means (with overwhelming odds) we get better at detecting catastrophe, so we can actively optimize against problems which the previous round of quantilization had to merely hedge against by avoiding over-optimizing.
  • Therefore, the chance of catastrophe from each round of quantilization can be dramatically lower than the previous. Although these chances do add up over successive rounds of quantilization, they can sum to a negligible probability overall, even in the limit.

It would be nice to put together such a story, but it isn't clear how to argue that the improvement in value function and proposal distribution cuts into the probability of catastrophe by a sufficient amount.

Quantilizers assume catastrophes are rare in random choices, but in the recursive setting, too much randomness can create value drift.

Quantilization assumes we can get a "safe distribution" which might produce mediocre actions, but at least has a very low probability of catastrophic risk.

Supposing the scheme could produce highly capable systems at all, the recursive quantilizer environment seems like one where randomized actions have a good chance of producing catastrophic risk, so safe distributions would be incredibly difficult to engineer.

Imagine we're computing a big recursive quantilization tree. Randomly selecting a poor value function at one point will have a cascading effect. It will feed into a quantilization step which selects a bad QAS, which will go on to produce either the value function or the safe distribution for another quantilization, and so on. There's not fundamentally any corrective effect -- if either of the two inputs to quantilization is poor enough, then the output will be poor.

There's a basin of attraction, here: if the proposed safe distribution always contains good proposals with non-negligible probability, and the value function always has enough of the right meta-principles to correct specific errors that may be introduced through random error. But it's quite important that the output of each quantilization be better than the previous. If not, then we're not in a basin of attraction.

All of this makes it sound quite difficult to propose a safe distribution. The safe distribution needs to already land you within the basin of attraction (with very high probability), because drifting out of that basin can easily create a catastrophe.

Here's a slightly different argument. At each quantilization step, including the very first one, it's important that we find a QAS which actually fits our data quite well, because it is important that we pin down various things firmly in order to remain in the basin of attraction (especially including pinning down a value function at least as good as our starting value function). However, for each QAS which fits the data quite well and points to our desired basin of attraction, there are many alternative QAS which don't fit our data well, but point to very different, but equally coherent, basins of attraction. (In other words, there should be many equally internally consistent value systems which have basins of attraction of similar size.)

Since these other basins would be catastrophic, this means c, the probability of catastrophe, is higher than q, the amount of optimization we need to hit our narrow target.

This means the safe distributions has to be doing a lot of work for us.

Like the previous problem I discussed, this isn't necessarily a showstopper, but it does say that we'd need some further ideas to make recursive quantilization work, and suggests to me that quantilization might not be the right way to go.

Other Concerns
  • Quantilizers don't have the best handles for modeling human philosophical deliberation over time. In other words, I don't think recursive quantilization absorbs the lesson of radical probabilism. In particular, although recursive quantilization involves iteratively improving a picture of "good reasoning", I think it lacks a kind of stability -- the picture of good reasoning must be entirely reconstructed each time, from "first principles" (IE from the principles developed in the previous step). I currently see no guarantee that recursive quantilization avoids being Dutch-Book-able over these stages, or any other such dynamic optimality notion.
  • Quantilizers aren't great for modeling a collection of partial models. Since a quantilizer spits out one (moderately) optimized result, I have to focus on single QASs, rather than collections of experts which cover different areas of expertise. This means we don't get to break down the problem of reasoning about the world.
  • Quantilizers don't put world models in a central spot. By putting optimization in a central spot, we sort of sideline reasoning and belief. This obscures the mechanism of updating on new information.


Prizes for Last Year's 2018 Review

8 часов 4 минуты назад
Published on December 2, 2020 11:21 AM GMT

About a year ago, LessWrong held it's first annual review, where we looked over the best posts from 2018. The LessWrong team offered $2000 in prizes for the top post authors, and (up to) $2000 in prizes for the best reviews of those posts.

For our top post authors, we have decided to award.... *drumroll*

For Reviews, there are three tiers of prize ($300, $200, $100):

Not for reviews, but for discussion in the review, $50 apiece goes to Richard Ngo and Rohin Shah.

Prizewinners, we'll reach out to you in a week or so to give you your prize-money.

Congratulations to all the winners!


The LessWrong 2019 Review

8 часов 4 минуты назад
Published on December 2, 2020 11:21 AM GMT

Today is the start of the 2019 Review, continuing our tradition of checking which things that were written on LessWrong still hold up a year later, and to help build an ongoing canon of the most important insights developed here on LessWrong.

The whole process will span 8 weeks, starting on December 1st: 

  • From December 1st to the 14th, any user that was registered before January 1st 2019 can nominate any post written in 2019 to be considered for the review.
  • From December 14th to January 11th, any user can leave reviews on any posts with at least two nominations, ask questions of other users and the author, and make arguments for how a post should be voted on in the review.
  • From January 11th to January 25th any LessWrong user can vote on the nominated posts, using a voting system based on quadratic voting. (There will be two votes, one for 1000+ karma users, and one for all users)

But before I get more into the details of the process, let's go up a level.

Why run a review like this?

The Review has three primary goals: 

  1. Improve our incentives, feedback, and rewards for contributing to LessWrong. 
  2. Create a highly curated "Best of 2019" sequence and physical book
  3. Create common knowledge about the LW community's collective epistemic state about the most important posts of 2019
Improving our incentives and rewards

Comments and upvotes are a really valuable tool for allocating attention on LessWrong, but they are ephemeral and frequently news-driven, with far-from-perfect correlation to the ultimate importance of an idea or an explanation. 

I want LessWrong to be a place for Long Content. A place where we can build on ideas over decades, and an archive that helps us collectively navigate the jungle of infinite content that spews forward on LessWrong every year.

One way to do that is to take some time between when you first see a post and when you evaluate it. That's why today we are starting the 2019 review, not the 2020 review. A year is probably enough time to no longer be swept away in the news or excitement of the day, but recent enough that we can still remember and write down how an idea or explanation has affected us. 

I also want LessWrong to not be overwhelmed by research debt

Research debt is the accumulation of missing interpretive labor. It’s extremely natural for young ideas to go through a stage of debt, like early prototypes in engineering. The problem is that we often stop at that point. Young ideas aren’t ending points for us to put in a paper and abandon. When we let things stop there the debt piles up. It becomes harder to understand and build on each other’s work and the field fragments.

There needs to be an incentive to clean up ideas that turned out to be important but badly presented. This is the time for authors to get feedback on which of their posts turned out to be important and to correct minor errors, clean up prose and polish them up. And the time for others to see what concepts still lack a good explanation after at least a whole year has passed, and to maybe take the time to finally write that good canonical reference post. 

Creating a highly curated sequence and book

The internet is not great at preserving things for the future. Also, books feel real to me in a way that feels very hard to achieve for a website. Also, they look beautiful: 

One of the books printed for the 2018 Review.

Of course, when you show up to LessWrong, you can read Rationality: A-Z, you can read The Codex, and you can read HPMoR, but historically we haven't done a great job at archiving and curating the best content of anyone who isn't Scott or Eliezer (and even for Scott and Eliezer, it's hard to find any curation of the content they wrote in recent years). When I showed up, I wish there was a best of 2012 book and sequence that would have helped me find the best content from the years from before I was active (and maybe we should run a "10-year Review" so that I can figure out what the best posts from 2010 and beyond are).

Create common knowledge

Ray says it pretty well in last year's Review announcement post

Some posts are highly upvoted because everyone agrees they're true and important. Other posts are upvoted because they're more like exciting hypotheses. There's a lot of disagreement about which claims are actually true, but that disagreement is crudely measured in comments from a vocal minority.

Now is the time to give your opinions much more detail, distinguish between a post being an interesting hypothesis versus a robust argument, and generally help others understand what you think, so that we can discover exciting new disagreements and build much more robustly on past and future work. 

What does it look like concretely?Nominating

Nominations really don't have to be very fancy. Some concrete examples from last year: 

Reading Alex Zhu's Paul agenda FAQ was the first time I felt like I understood Paul's agenda in its entirety as opposed to only understanding individual bits and pieces. I think this FAQ was a major contributing factor in me eventually coming to work on Paul's agenda. – evhub on "Paul's research agenda FAQ"


This post not only made me understand the relevant positions better, but the two different perspectives on thinking about motivation have remained with me in general. (I often find the Harris one more useful, which is interesting by itself since he had been sold to me as "the guy who doesn't really understand philosophy".) 

– Kaj Sotala on "Sam Harris and the Is-Ought Gap"

But sometimes can be a bit more substantial:

This post:

  • Tackles an important question. In particular, it seems quite valuable to me that someone who tries to build a platform for intellectual progress attempts to build their own concrete models of the domain and try to test those against history
  • It also has a spirit of empiricism and figuring things out yourself, rather than assuming that you can't learning anything from something that isn't an academic paper
  • Those are positive attributes and contribute to good epistemic norms on the margin. Yet at the same time, a culture of unchecked amateur research could end up in bad states, and reviews seem like a useful mechanism to protect against that

This makes this suitable for a nomination.

– jacobjacob on "How did academia ensure papers were correct in the early 20th Century?"

Overall, a nomination doesn't need to require much effort. It's also OK to just second someone else's nomination (though do make sure to actually add a new top-level nomination comment, so we can properly count things). 


We awarded $1500 in prizes for reviews last year. The reviews that we awarded the prizes for really exemplify what I hope reviews can be. The top prize went to Vanessa Kosoy, here's an extract from one of her reviews

From Vanessa Kosoy on "Clarifying AI Alignment":

In this essay Paul Christiano proposes a definition of "AI alignment" which is more narrow than other definitions that are often employed. Specifically, Paul suggests defining alignment in terms of the motivation of the agent (which should be, helping the user), rather than what the agent actually does. That is, as long as the agent "means well", it is aligned, even if errors in its assumptions about the user's preferences or about the world at large lead it to actions that are bad for the user.


In contrast, I will argue that the "motivation-competence" decomposition is not as useful as Paul and Rohin believe, and the "definition-optimization" decomposition is more useful.


The review both makes good arguments against the main thrust of the post it is reviewing, while also putting the article into a broader context that helps me place it in relation to other work in AI Alignment. She argues for an alternative breakdown of the problem where you instead of modeling it as the problems of "motivation and competence", model it as the problems of "definition and optimization". She connects both the decomposition proposed in the essay she is critiquing, and the one she proposed to existing research (including some of her own), and generally makes a point I am really glad to see surfaced during the review. 

To be more concrete, this kind of ontology-level objection feels like one of the most valuable things to add during the review phase, even if you can't propose any immediate alternative (i.e. reviews of "I don't really like the concepts this post uses, it feels like reality is more neatly carved by modeling it this way" seem quite valuable and good to me).

Zack M. Davis was joint-second winner of prizes for reviews last year. Here's an extract from a review of his.

Zack's review "Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think": 

Reply to: Meta-Honesty: Firming Up Honesty Around Its Edge-Cases


A potential problem with this is that human natural language contains a lot of ambiguity. Words can be used in many ways depending on context. Even the specification "literally" in "literally false" is less useful than it initially appears when you consider that the way people ordinarily speak when they're being truthful is actually pretty dense with metaphors that we typically don't notice as metaphors because they're common enough to be recognized legitimate uses that all fluent speakers will understand.


Zack wrote a whole post that I really liked, that made the core argument that while it might make sense to try really hard to figure out the edge-cases of lying, it seems that it's probably better to focus on understanding the philosophy and principles behind reducing other forms of deception like using strategic ambiguity, heavily filtering evidence, or misleading metaphors.

Arguing that a post, while maybe making accurate statements, appears to put its emphasis in the wrong place, and encouraging action that seems far from the most effective, also seems generally valuable, and a good class of review. 


You can trial-run the vote UI here (though you can't submit any votes yet). Here is also a screenshot of what it looked like last year:

UI when first opening the review, during the basic voting passUI when using quadratic voting mode and selecting a postHow do I participate? 

Now for some more concrete instructions on how to participate:


Starting today (December 1st), if you have an account that was registered before the 1st of January 2019, you will see a new button on all posts from 2019 that will allow you to nominate them for the 2019 review:

It's at the top of the triple-dot menu.

Since a major goal of the review is to see which posts had a long-term effect on people, we are limiting nominations to users who signed up before 2019. If you were actively reading LessWrong before then, but never registered an account, you can ping me on Intercom (the small chat bubble in the bottom right corner on desktop devices), and I will give your account nomination and voting privileges.

I recommend using the All Posts page for convenience, where you can group posts by year, month, week, and day. Here's the two I use the most:


Starting on December 14th, you can write reviews on any post that received more than 2 nominations. For the following month, my hope is that you read the posts carefully, write comments on them, and discuss: 

  • How has this post been useful?
  • How does it connect to the broader intellectual landscape?
  • Is this post epistemically sound?
  • How could it be improved?
  • What further work would you like to see on top of the ideas proposed in this post?

I would consider the gold-standard for post reviews to be SlateStarCodex book reviews (though obviously shorter, since posts tend to be less long than books). 

As an author, my hope is that you take this time to note where you disagree with the critiques, help other authors arrange followup work, and, if you have the time, update your post in response to the critiques (or just polish it up in general, if it seems like it has a good chance of ending up in the book).

This page will also you to see all posts above two nominations, and how many reviews they have, together with some fancy UI to help you navigate all the reviews and nominations that are coming in.


Starting on January 11th, any user who registered before 2019 can vote on any 2019 post that has received at least one review. The vote will use quadratic voting, with each participant having 500 points to distribute. To help handle the cognitive complexity of the quadratic voting, we also provide you with a more straightforward "No", "Neutral", "Good", "Important", "Crucial" scale that you can use to prepopulate your quadratic voting scores.

You can give the vote system a spin here on the posts from 2018, to get a sense of how it works and what the UI will look like.

Last year, only users above 1000 karma could participate in the review and vote. This year, we are going to break out the vote into two categories, one for users above a 1000 karma, and one for everyone. I am curious to see if and how they diverge. We might make some adjustments to how we aggregate the votes for the "everyone" category, like introducing some karma-weighting. Overall I expect we will give substantial prominence to both rankings, but favoring the 1000+ karma user ranking somewhat higher in our considerations for what to include in the final sequence and book. To be more concrete, I am imagining something like a 70:30 split of attention and prominence favoring the 1000+ karma users vote.

Prizes and Rewards 

I think this review process is really important. To put the LessWrong's Team's money where it's mouth is, we are again awarding $2000 in prizes to the top posts as judged by the review, and up to $2000 in prizes for the best reviews and nominations (as judged by the LW mod team). These are the nominations and reviews from last year that we awarded prizes.

Public Writeup and Aggregation

At the end of the vote, we are going to publish an analysis with all the vote results again. 

Last year, we also produced an (according to me) really astonishingly beautiful book with all the top essays (thanks to Ben Pace and Jacob Lagerros!) and some of the best comments on reviews. I can't promise we are going to spend quite as much time on the book this year, but I expect it to again be quite beautiful. See Ben's post with more details about the books, and with the link to buy last year's book if you want to get a visceral sense of them.

The book might look quite different for this year than it did for last year's review, but still anyone who is featured in the book will get a copy of it. So even just writing a good comment can secure your legacy.

Good luck, think well, and have fun!

This year, just as we did last year, we are going to replace the "Recommendations & From the Archives" section of the site with a section that just shows you posts you haven't read that were written in 2019. 

I really enjoyed last year's review, and am looking forward to an even greater review this year. May our epistemics pierce the heavens!


The LessWrong Book is Available for Pre-order

8 часов 4 минуты назад
Published on December 2, 2020 11:21 AM GMT

For the first time, you can now buy the best new ideas on LessWrong in a professionally designed book set.

The set is named:

A Map that Reflects the Territory: Essays by the LessWrong Community

It is available for pre-order here.

The standard advice for creating things is "show, don't tell", so first some images of the books, followed by a short FAQ by me (Ben).

The full five-book set. Yes, that’s the iconic Mississippi river flowing across the spines.Each book has a unique color.The first book: Epistemology.The second book: Agency.The third book: Coordination.The fourth book: Curiosity.The fifth book: Alignment.FAQ

What are the books in the set?

The essays have been clustered around five topics relating to rationality: Epistemology, Agency, Coordination, Curiosity, and Alignment.

How small are the books?

Each book is 4x6 inches, small enough to fit in your pocket. This was the book size that, empirically, most beta-testers found that they actually read.

Can I order a copy of the book?

Pre-order the book here for $29. If you are in the US, it will arrive before Christmas. We currently sell to North America, Europe, and Australia. You'll be able to buy the book on Amazon in a couple of weeks.

Does this book assume I have read other LessWrong content, like The Sequences?

No. It's largely standalone, and does not require reading other content on the site, although it will be enhanced by having engaged with those ideas.

Can I see an extract from the book?

Sure. Here is the preface and first chapter of Curiosity, specifically the essay Is Science Slowing Down? by Scott Alexander.

What exactly is in the book set?

LessWrong has an annual Review process (the second of which is beginning today!) to determine the best content on the site. We reviewed all the posts on LessWrong that year, and users voted to rank the best of them, the outcome of which can be seen here.

Of the over 2000 LessWrong posts reviewed, this book contains 41 of the top voted essays, along with some comment sections, some reviews, a few extra essays to give context, and some preface/meta writing.

I'm new — what is this all about? What is 'rationality'?

A scientist is not simply someone who tries to understand how biological life works, or how chemicals combine, or how physical objects move, but is someone who uses the general scientific method in all areas, that allows them to empirically test their beliefs and discover what's true in general.

Similarly, a rationalist is not simply someone who tries to think clearly about their personal life, or who tries to understand how civilization works, or who tries to figure out what's true in a single domain like nutrition or machine learning; a rationalist is someone who is curious about the general thinking patterns that allows them to think clearly in all such areas, and understand the laws and tools that help them make good decisions in general.

Just as someone seeking to understand science and the scientific method might look into a great number of different fields (electromagnetism, astronomy, medicine, and so on), someone seeking to understand generally accurate and useful cognitive algorithms would explore a lot of fields and areas. The essays in this set explore questions about arguments, aesthetics, artificial intelligence, introspection, markets, game theory, and more, which all shed light on the core subject of rationality.

Who is this book for?

This book is for people who want to read the best of what LessWrong has to offer. It's for the people who read best away from screens, away from distractions. It's for people who do not check the site regularly, but would still like to get the top content. For many people this is the best way to read LessWrong.

I think there's a lot of people who find the discussion on LessWrong interesting, or are interested in the ideas, or found LessWrong's early discussion of the coronavirus personally valuable, or who know Scott Alexander got started on LessWrong, and would like to see we're about. This book is one of the best ways to do that.

Show me the table of contents?

Sure thing. Here's each book in order.

A Sketch of Good CommunicationBen Pace BabbleAlkjash Local Validity as a Key to Sanity and CivilizationEliezer Yudkowsky The Loudest Alarm is Probably FalsePatrick LaVictoire Varieties of Argumentative ExperiecneScott Alexander More BabbleAlkjash Naming the NamelessSarah Constantin Toolbox-thinking and Law-thinkingEliezer Yudkowsky PruneAlkjash Toward a New Technical Explanation of Technical ExplanationAbram Demski Noticing the Taste of LotusMichael 'Valentine' Smith The Tails Coming Apart As Metaphor For LifeScott Alexander Meta-Honesty: Firming up Honesty Around its Edge-CasesEliezer Yudkowsky Explaining Insight Meditation and Enlightenment in Non-Mysterious TermsKaj Sotala Being a Robust AgentRaymond Arnold Anti-social PunishmentMartin Sustrik The Costly Coordination Mechanism of Common KnowledgeBen Pace Unrolling Social Metacognition: Three Levels of Meta are not EnoughAndrew Critch The Intelligent Social WebMichael 'Valentine' Smith Prediction Markets: When Do They Work?Zvi Mowshowitz Spaghetti TowersGeorgia Ray On the Loss and Preservation of KnowledgeSamo Burja A Voting Theory PrimerJameson Quinn The Pavlov StrategySarah Constantin Inadequate Equilibria vs Governance of the CommonsMartin Sustrik Is Science Slowing Down?Scott Alexander What Motivated Rescuers during the Holocaust?Martin Sustrik Is There an Untrollable Mathematician?Abram Demski Why Did Everything Take So Long?Katja Grace Is Clickbait Destroying Our General Intelligence?Eliezer Yudkowsky What Makes People Intellectually Active?Abram Demski Are Minimal Circuits Daemon-Free?Paul Christiano Is There Something Beyond Astronomical Waste?Wei Dai Do Birth Order Effects Exist?Eli Tyre, Bucky, Raymond Arnold Hyperbolic GrowthPaul Christiano Specification Gaming Examples in AIViktoria Krakovna Takeoff SpeedsPaul Christiano The Rocket Alignment ProblemEliezer Yudkowsky Embedded AgentsAbram Demski 
& Scott Garrabrant FAQ about Iterated AmplificationAlex Zhu Challenges to Christiano's Iterated Amplification ProposalEliezer Yudkowsky Response to FAQ on Iterated AmplificationEliezer Yudkowsky Robustness to ScaleScott Garrabrant Coherence Arguments Do Not Imply Goal-Directed BehaviorRohin Shah 

Who made this book set?

I (Ben Pace) and Jacob Lagerros made these books, alongside our colleagues on the LessWrong Team: Oliver Habryka, Raymond Arnold, Ruby Bloom, and Jim Babcock.

Can I give this book as a gift?

Yes. This is a well-designed, beautiful set of books, designed to be relatively self-contained and not require having read LessWrong before, and that look attractive on coffee-tables and bookshelves, suitable for friends, partners, and family members who read non-fiction.

What about the book called 'Alignment'? Isn't that going to be very technical and have lots of assumptions about AI?

For those who have no knowledge of the subject of AI alignment, the book is structured to help motivate the topic, starting with questions about AI progress and risks, before moving into the meat of open questions about the subject.

The Alignment book will be tough reading for those not acquainted with the ongoing discourse around the topic, but I think it will still be rewarding for those who read it.

I have a blog, and might want to review the book. Can I get a review copy?

Yes! I'm offering free copies of the book for review. I'd love to get reviews from critics of the rationality community, members of the rationality community, people who don't really know what the community is about but know that SlateStarCodex is awesome, and more.

If you'd like to review the book and would like a free copy, fill out this form and I'll get back to you. (Or you can just email me at benitopace@gmail.com if that works better for you.) If you're not sure if your blog is cool enough, your blog is probably cool enough.

Also, you should know that if you write a public review of the essay collection I'll put a link to your review on the official landing page for the book, no matter if it's positive, negative, or not-even-on-that-spectrum.

(No, tweets don't count, though I guess tweet threads can, but I prefer blog posts. I reserve the right to not include things I read as primarily trolling.)

I have a podcast and might be interested in talking with you about LessWrong. Are you interested in coming on?

Yes. I'm interested in appearing on a few podcasts to let people know about the book. Concretely, I'd propose a joint-appearance with myself and Oliver Habryka, where we can talk about LessWrong, our vision for its future as an institution, how we think it fits into the broader landscape of intellectual progress, the challenges of managing internet forums, and more. No podcast too small (or too big, I guess). If you like LessWrong and you'd like us to come on, we're happy to do it. Email me at benitopace@gmail.com.

I'd like something from you that's not a podcast or a book. Can I reach out?

Yeah, reach out. If you run a newsletter, a mailing list, a google group, or something, and think some of your users would like to know about the book, I'd appreciate you sharing it there with a sentence or two about why you think LessWrong is interest or worth reading. And if you'd like my input on something, happy to give it via email.

I have a question not answered here?

There's a comment box right below.

Remind me again, how can I pre-order it?

Here it is. If you pre-order to a US shipping address, it will arrive before Christmas.

If you need to chat with us about your order or anything, use the intercom widget in the bottom right.


Aphorisms on motivation

10 часов 30 минут назад
Published on December 2, 2020 8:55 AM GMT

Epistemic status: I have a good track record for keeping myself motivated, but this wasn't always the case. These are the insights that I've had over the years while I was figuring this out. My beliefs haven't changed much lately and I find myself referring to them all the time. 

As always with this kind of murky folk psychology, YMMV, but don't discard these ideas too quickly.

The grand unified theory of motivation: Your gas pedal is always 100% pushed. If you don't go forward, it means your subagents are pushing you in opposite directions. Integrate your conflicting aliefs and you'll start moving forward. Integrate your aliefs completely and you will have solved motivation.

As with people, your s1 will not listen to you if you don't listen to it. The best way to make it feel heard is to pass it's ITT. If you can risk it, endorse the alief publicly. Then you can change it.

Aliefs will be 10x less reasonable if you shut them out. If you embrace them, only the most nuanced part of it remains.

Your s1 won't stop complaining about that marshmallow as long as it's in front of you. Eat it or throw it away, but don't let it sit there slowly deconstructing your internal consistency.

Fatigue can be physical, but it is nearly always emotional, arising from internal strife.

"Ego depletion" only makes sense if you don't self-identify with your s1. If you do, you will simply experience it as a preference

People are just a bag of subagents. If you interact with them, your bags just mix. If they offend you, that's often because they've expressed a subagent that you repress in yourself. Your reaction is how you would react to this part of you internally.

100% of your "will power" comes from your incentive landscape and 0% comes from training. If you want more willpower, make it impossible to do the bad thing. Put your distractions in a time-locked box.

Most of your hidden motivations will be hidden because they're not socially desirable, and/or because they're not consistent with your identity. I am not a paragon of virtue. I am just scared to own up to my vices, and it's keeping me from integrating myself. The same is true for you and for everyone that isn't completely enlightened

You are not smarter than your s1. The truth is somewhere in between. Be humble, listen to it, find a middle ground, become integrated.


Should I take glucosamine?

14 часов 5 минут назад
Published on December 2, 2020 5:20 AM GMT

This paper came up on my news feed today: https://www.jabfm.org/content/jabfp/33/6/842.full.pdf

According to the research, daily intake of glucosamine/chondroitin supplements for at least a year is associated with a 39% reduction in all-cause mortality.

This is a cohort study, not a randomized trial, so the evidence is not conclusive. Individuals taking supplements were self-selected. Perhaps there is another factor not controlled for in the paper. The researchers are calling for a more thorough study.

However a 39% reduction appears to be a pretty large effect. And glucosamine/condroitin supplements are fairly inexpensive and available over-the-counter, and seem likely to be relatively harmless even if the effect is illusory. Maybe the kind of person who takes this kind of bet comes out ahead, even if they're wrong sometimes.

I'm not looking for mere social permission or a gut-feel answer here. I want to understand your reasoning. How should a rationalist reason about this? Do we plug numbers into a Bayesian net? Should I take the bet and start supplementing daily? Is the effect too small to bother with? Should I wait for more evidence?


Eth2 Staking explained as a Financial Instrument

15 часов 52 минуты назад
Published on December 2, 2020 3:33 AM GMT

A major cryptocurrency can now be applied as an income generating asset.

If you don’t keep up with cryptocurrency news, you probably don’t know that today marks the beginning of one of the most important updates in one of the largest cryptocurrencies out there.

Ethereum, the largest cryptocurrency by transactions per day, and the second largest cryptocurrency by market capitalization, has started transitioning its consensus protocol from Proof of Work to Proof of Stake. This marks the beginning of Eth2 (also known as Ethereum 2.0 or Serenity).

I am not going to explain what Ethereum is or the main differences between Proof of Work and Proof of Stake. There are plenty of resources out there that explain what Ethereum is far better than I ever will. Instead, I am going to explain why Eth2 Staking is a financial instrument using finance terminology.

Eth2 Staking is very similar to purchasing a bond.

Staking (Depositing) Eth2 in the Beacon Chain is the equivalent of calling your broker and buying a corporate or government bond. That being said, today it is far easier to buy a bond today than it is to deposit Eth2. To buy a bond today, you would call your broker/financial advisor and they would ask a trading desk to source the bond. After agreeing on the price, the trade is executed and you will see the bond in your account the next day. To stake Eth2 today, the process is far more complex. That being said, some intermediaries have started to provide staking services and I would not be surprised if in a couple of years major financial institutions also offer staking services, which would make the Eth2 Staking process just as easy as buying a bond.

There is no end date to Eth2 Staking. This is similar to perpetual bonds.

To stake Eth2, you need 32 ETH ($19,611 USD). This is the equivalent of the Face Value (FV) Minimum Piece when buying a bond. Most FV minimum pieces for bonds are either 1,000, 2,000, 10,000 or 200,000.

If you want to stake more Eth2, you need to do it in 32 ETH increments. This is the equivalent of the FV Minimum Increment when buying a bond. Most bonds have minimum increments of 1,000, although rarely some bonds have higher increments. The Santander 5.179% 2025 bond is one of those rare bonds that has a minimum piece of 200,000 and a minimum increment of 200,000.

Today, you only can stake (deposit) Eth2. Once the Eth2 protocol reaches phase 1.5, you would be able to exit (withdraw) your stake and the returns generated. There will not be a secondary market for Eth2 stakes. This would be the equivalent of only being able to buy or sell the bond directly from the issuer. Every time you buy the bond from the issuer, the issuer creates a new piece and every time you sell your bond back to the issuer, the issuer takes that piece out of circulation. This is one of the major differences between Eth2 Staking and trading a traditional bond. Traditional bonds have secondary markets.

The return you receive for staking Eth2 is inversely correlated with the amount of Eth2 staked in the protocol.

The current APR for Staking Eth2 is 16.7%. Source: Eth2 Launchpad

Over time, the amount of Eth2 staked will increase, reducing the returns for everyone in the protocol. The system was designed to reduce the return through exponential decay, meaning it will never reach 0% APR. The APR is expected to never fall below 1.2%. Considering that today the US Treasury Bill expiring in a month pays 0.084%, The minimum APR is quite attractive. This is another major difference between Eth2 Staking and a traditional bond. A traditional bond has either a fixed rate (Like the Santander example above, which has a fixed rate of 5.179%) or a variable (floating) rate which is typically based on a reference rate (LIBOR being the most common one in recent years) + a spread. The return earned staking Eth2 will oscillate over time, and will be dependent on the participants trust on the Eth2 consensus protocol.

This makes Eth2 a novel income generating financial instrument. In the future, once the Eth2 is fully running (Phase 1.5 and beyond), an individual will be able to enter and exit the protocol based on what the current return is on staking Eth2. It’s too early to know what the return equilibrium will be, but if I had to guess, it would be a combination of the trust of the market on the Eth2 protocol (You can call this the ‘credit risk’ of Eth2) and the current risk free rate.

You can stake Eth2 today if you have at least 32 ETH. However that comes with significant risks listed below.

  1. In phase 0, there are no withdrawals. Your ETH will be locked for an undetermined period of time.
  2. As explained in this link, setting up an Eth2 Stake is quite complicated at the moment. Messing up the setup phase could mean losing your entire stake.
  3. Once your Eth2 stake is successfully setup, you need to ensure that your validator is properly running most of the time. Failure do to so would incur penalties.
  4. The Eth2 consensus protocol could not work as intended, significantly reducing the value of Eth2 relative to USD.

Most of these risks will disappear once the protocol reaches phase 1.5 and more intermediaries begin to set up staking services.

As we begin phase 0 of this new journey, investors will be watching closely how this new protocol develops. If Eth2 works as intended, Staking Eth2 will become a new type of asset to invest in, one that theoretically should have a low correlation with the market in general and should generate outsize returns relative to the risk free rate.


Poll: Any interest in an editing buddy system?

17 часов 7 минут назад
Published on December 2, 2020 2:18 AM GMT

There was a popular post today linked from Hacker News on the benefits of hiring a freelance editor. Looking into it, I think that what's described is really "Developmental editing" and "structural" editing as described here.

Maybe there's no replacement for a professional editor. But for a community like this with (I think) such a high ratio of contributors, I wondered if it might be valuable to try to arrange a kind of "buddy system" where we give feedback to each other to improve our writing.

For now, please don't worry about how that might work. At the moment I'd just like to gauge if there is enough interest that this is something that might be worth trying to organize.

Would you be interested in participating in this?

[poll]{No}{I'd edit but not submit}{I'd edit and submit}

How good do you think you would be at editing? Take the average front-page post at Lesswrong. If you took 30 minutes to give comments, and the author used your feedback, how much better would that post get, as measured by upvotes?

[poll]{Less than 1.5x}{1.5x-2x}{2x-4x}{4x or more}


In a multipolar scenario, how do people expect systems to be trained to interact with systems developed by other labs?

1 декабря, 2020 - 23:04
Published on December 1, 2020 8:04 PM GMT

I haven't seen much discussion of this, but it seems like an important factor in how well AI systems deployed by actors with different goals manage to avoid conflict (cf. my discussion of equilibrium and prior selection problems here).

For instance, would systems be trained

  • Against copies of agents developed by other labs (possibly with measures to mask private information)?
  • Simultaneously with other agents in a simulator that each developer has access to?
  • Against copies of themselves?
  • Against distributions of counterpart policies engineered to have certain properties? What would those properties be?


Open & Welcome Thread - December 2020

1 декабря, 2020 - 20:03
Published on December 1, 2020 5:03 PM GMT

(The usual boilerplate:)

  • If it’s worth saying, but not worth its own post, here's a place to put it.
  • And, if you are new to LessWrong, here's the place to introduce yourself.
    • Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here. The tag is here.


Forecasting Newsletter: November 2020

1 декабря, 2020 - 20:00
Published on December 1, 2020 5:00 PM GMT

Highlights Index
  • Highlights
  • In The News
  • Prediction Markets & Forecasting Platforms
  • United States Presidential Elections Post-mortems
  • Hard To Categorize
  • Long Content

Sign up here or browse past newsletters here.

In the News

DeepMind claims a major breakthrough in protein folding (press release, secondary source)

DeepMind has developed a piece of AI software called AlphaFold that can accurately predict the structure that proteins will fold into in a matter of days.

This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.

Figuring out what shapes proteins fold into is known as the "protein folding problem", and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world.

In the results from the 14th CASP assessment, released today, our latest AlphaFold system achieves a median score of 92.4 GDT overall across all targets. This means that our predictions have an average error (RMSD) of approximately 1.6 Angstroms, which is comparable to the width of an atom (or 0.1 of a nanometer). Even for the very hardest protein targets, those in the most challenging free-modelling category, AlphaFold achieves a median score of 87.0 GDT.

Crucially, CASP chooses protein structures that have only very recently been experimentally determined (some were still awaiting determination at the time of the assessment) to be targets for teams to test their structure prediction methods against; they are not published in advance. Participants must blindly predict the structure of the proteins.

The Organization of the Petroleum Exporting Countries (OPEC) forecasts slower growth and slower growth in oil demand (primary source, secondary source.) In particular, it forecasts long-term growth for OECD countries — which I take to mean that growth because of covid recovery is not counted — to be below 1%. On the one hand, their methodology is opaque, but on the other hand, I expect them to actually be trying to forecast growth and oil demand, because it directly impacts the amount of barrels it is optimal for them to produce.

Google and Harvard's Global Health Institute update their US covid model, and publish it on NeurIPS 2020 (press release), aiming to be robust, interpretable, extendable, and to have longer time horizons. They're also using it to advertise various Google products. It has been extended to Japan.

Prediction Markets & Forecasting Platforms

Gnosis announces the GnosisDAO (announcement, secondary source), an organization governed by prediction markets (i.e., a futarchy): "The mission of GnosisDAO is to successfully steward the Gnosis ecosystem through futarchy: governance by prediction markets."

Metaculus have a new report on forecasting covid vaccines, testing and economic impact (summary, full report). They also organized moderator elections and are hiring for a product manager.

Prediction markets have kept selling Trump not to be president in February at $0.85 to $0.9 ($0.9 as of now, where the contract resolves to $1 if Trump isn't president in February.) Non-American readers might want to explore PolyMarket or FTX, American readers with some time on their hands might want to actually put some money into PredictIt. Otherwise, some members of the broader Effective Altruism and rationality communities made a fair amount of money betting on the election.

CSET recorded Using Crowd Forecasts to Inform Policy with Jason Matheny, CSET's Founding Director, previously Director of IARPA. I particularly enjoyed the verbal history bits, the sheer expertise Jason Matheny radiated, and the comments on how the US government currently makes decisions.

Q: Has the CIA changed its approach to using numbers rather than words?
A: No, not really. They use some prediction markets, but most analytic products are still based on verbiage.

As a personal highlight, I was referred to as "top forecaster Sempere" towards the end of this piece by CSET. I've since then lost the top spot, and I'm back to holding the second place.

I also organized the Forecasting Innovation Prize (LessWrong link), which offers $1000 for research and projects on judgemental forecasting. For inspiration, see the project suggestions. Another post of mine, Predicting the Value of Small Altruistic Projects: A Proof of Concept Experiment might also be of interest to readers in the Effective Altruism community. In particular, I'm looking for volunteers to expand it.

Negative Examples

Release of Covid-19 second wave death forecasting 'not in public interest', claims Scottish Government

The Scottish Government has been accused of "absurd" decision making after officials blocked the release of forecasting analysis examining the potential number of deaths from a second wave of Covid-19.

Officials refused to release the information on the basis that it related to the formulation or development of government policy and was "not in the public interest" as it could lead to officials not giving "full and frank advice" to ministers.

The response also showed no forecasting analysis had been undertaken by the Scottish Government over the summer on the potential of a second wave of Covid-19 on various sectors.

United States Presidential Election Post-mortems

Thanks to the Metaculus Discord for suggestions for this section.

Independent postmortems
  • David Glidden's (@dglid) comprehensive spreadsheet comparing 538, the Economist, Smarkets and PredictIt in terms of Brier scores for everything. tl;dr: Prediction Markets did better in closer states. (see here for the log score.)

  • Hindsight is 2020; a nuanced take.

  • 2020 Election: Prediction Markets versus Polling/Modeling Assessment and Postmortem.

    "We find a market that treated day after day of good things for Biden and bad things for Trump, in a world in which Trump was already the underdog, as not relevant to the probability that Trump would win the election."

    Markets overreacted during election night.

    [On methodology: ] You bet into the market, but the market also gets to bet into your fair values. That makes it a fair fight." [Note: see here for a graph through time, and here for the orginal, though less readable source]

    ...polls are being evaluated, as I've emphasized throughout, against a polls plus humans hybrid. They are not being evaluated against people who don't look at polls. That's not a fair comparison.

  • Partisans, Sharps, And The Uninformed Quake US Election Market. tl;dr: "I find myself really torn between wanting people to be more rational and make better decisions. And then also, like, well, I want people to offer 8-1 on Trump being in office in February."

Amerian Mainstream Media

Mostly unnuanced.

FiveThirtyEight. Andrew Gelman.

As we've discussed elsewhere, we can't be sure why the polls were off by so much, but our guess is a mix of differential nonresponse (Republicans being less likely than Democrats to answer, even after adjusting for demographics and previous vote) and differential turnout arising from on-the-ground voter registration and mobilization by Republicans (not matched by Democrats because of the coronavirus) and maybe Republicans being more motivated to go vote on election day in response to reports of 100 million early votes.

Hard to Categorize

Forbes on how to improve hurricane forecasting:

...to greatly improve the hurricane intensity forecast, we need to increase the subsurface ocean measurements by at least one order of magnitude...

One of the most ambitious efforts to gather subsurface data is Argo, an international program designed to build a global network of 4,000 free-floating sensors that gather information like temperature, salinity and current velocity in the upper 2,000 meters of the ocean.

Argo is managed by NOAA's climate office that monitors ocean warming in response to climate change. This office has a fixed annual budget to accomplish the Argo mission. The additional cost of expanding Argo's data collection by 10 times doesn't necessarily help this office accomplish the Argo mission. However, it would greatly improve the accuracy of hurricane forecasts, which would benefit the NOAA's weather office — a different part of NOAA. And the overall benefit of improving even one major hurricane forecast would be to save billions [in economic losses], easily offsetting the entire cost to expand the Argo mission.

In wake of bad salmon season, Russia calls for new forecasting approach:

In late October, Ilya Shestakov, head of the Russian Federal Agency for Fisheries, met with Russian scientists from the Russian Research Institute of Fisheries and Oceanography (VNIRO) to talk about the possible reasons for the difference. According to scientists, the biggest surprises came from climate change.

"We have succeeded in doing a deeper analysis of salmon by the combination of fisheries and academic knowledge added by data from longstanding surveys," Marchenko said. "No doubt, we will able to enhance the accuracy of our forecasts by including climate parameters into our models."

Political Polarization and Expected Economic Outcomes (summary)

"87% of Democrats expect Biden to win while 84% of Republicans expect Trump to win"

"Republicans expect a fairly rosy economic scenario if Trump is elected but a very dire one if Biden wins. Democrats ... expect calamity if Trump is re- elected but an economic boom if Biden wins."

Dart Throwing Spider Monkey proudly presents the third part of his Intro to Forecasting series: Building Probabalistic Intuition

A gentle introduction to information charts: a simple tool for thinking about probabilities in general, but in particular for predictions with a sample size of one.

A youtube playlist with forecasting content h/t Michal Dubrawski.

Farm-level outbreak forecasting tool expands to new regions

An article with some examples of Crime Location Forecasting, and on whether it can be construed as entrapment.

Why Forecasting Snow Is So Difficult: Because it is very sensitive to initial conditions.

Google looking for new ways to predict cyber-attackers' behavior.

Long Content

Taking a disagreeing perspective improves the accuracy of people's quantitative estimates, but this depends on the question type.

...research suggests that the same principles underlying the wisdom of the crowd also apply when aggregating multiple estimates from the same person – a phenomenon known as the "wisdom of the inner crowd"

Here, we propose the following strategy: combine people's first estimate with their second estimate made from the perspective of a person they often disagree with. In five pre-registered experiments (total N = 6425, with more than 53,000 estimates), we find that such a strategy produces highly accurate inner crowds (as compared to when people simply make a second guess, or when a second estimate is made from the perspective of someone they often agree with). In explaining its accuracy, we find that taking a disagreeing perspective prompts people to consider and adopt second estimates they normally would not consider as viable option, resulting in first- and second estimates that are highly diverse (and by extension more accurate when aggregated). However, this strategy backfires in situations where second estimates are likely to be made in the wrong direction. Our results suggest that disagreement, often highlighted for its negative impact, can be a powerful tool in producing accurate judgments.

..after making an initial estimate, people can be instructed to base their additional estimate on different assumptions or pieces of information. A demonstrated way to do this has been through "dialectical bootstrapping" where, when making a second estimate, people are prompted to question the accuracy of their initial estimate. This strategy has been shown to increase the accuracy of the inner crowd by getting the same person to generate more diverse estimates and errors... ...as a viable method to obtain more diverse estimates, we propose to combine people's initial estimate with their second estimate made from the perspective of a person they often disagree with... ...although generally undesirable, research in group decision-making indicates that disagreement between individuals may actually be beneficial when groups address complex problems. For example, groups consisting of members with opposing views and opinions tend to produce more innovative solutions, while polarized editorial teams on Wikipedia (i.e., teams consisting of ideologically diverse sets of editors) produce higher quality articles...

These effects occur due to the notion that disagreeing individuals tend to produce more diverse estimates, and by extension errors, which are cancelled out across group members when averaged. ...we conducted two (pre-registered) experiments...

People who made a second estimate from the perspective of a person they often disagree with benefited more from averaging than people who simply made a second guess.

... However, although generally beneficial, this strategy backfired in situations where second estimates were likely to be made in the wrong direction. [...] For example, imagine being asked the following question: "What percent of China's population identifies as Christian?". The true answer to this question is 5.1% and if you are like most people, your first estimate is probably leaning towards this lower end of the scale (say your first estimate is 10%). Given the position of the question's true answer and your first estimate, your second estimate is likely to move away from the true answer towards the opposite side of the scale (similar to the scale-end-effect45), effectively hurting the accuracy of the inner crowd.

We predicted that the average of two estimates would not lead to an accuracy gain in situations where second estimates are likely to be made in the wrong direction. We found this to be the case when the answer to a question was close to the scale's end (e.g., an answer being 2% or 98% on a 0%-100% scale).

A 2016 article attacking Nate Silver's model, key to understanding why Nate Silver is often so smug.

Historical Presidential Betting Markets, in the US before 2004.

...we show that the market did a remarkable job forecasting elections in an era before scientific polling. In only one case did the candidate clearly favored in the betting a month before Election Day lose, and even state-specific forecasts were quite accurate. This performance compares favorably with that of the Iowa Elec-tronic Market (currently [in 2004] the only legal venue for election betting in the United States). Second, the market was fairly efficient, despite the limited information of participants and attempts to manipulate the odds by political parties and newspapers. The extent of activity in the presidential betting markets of this time was astonishingly large. For brief periods, betting on political outcomes at the CurbExchange in New York would exceed trading in stocks and bonds.

Covering developments in the Wall Street betting market was a staple of election reporting before World War II. Prior to the innovative polling efforts of Gallup, Roper and Crossley, the other information available about future election outcomes was limited to the results from early-season contests, overtly partisan canvasses and straw polls of unrepresentative and typically small samples. The largest and best-known nonscientific survey was the Literary Digest poll, which tabulated millions of returned postcard ballots that were mass mailed to a sample drawn from telephone directories and automobile registries. After predicting the presidential elections correctly from 1916 to 1932, the Digest famously called the 1936 contest for Landon in the election that F. Roosevelt won by the largest Electoral College landslide of all time. Notably, although the Democrat's odds prices were relatively low in 1936, the betting market did pick the winner correctly The betting quotes filled the demand for accurate odds from a public widely interested in wagering on elections. In this age before mass communication technologies reached into America's living rooms, election nights were highly social events, comparable to New Year's Eve or major football games. In large cities,crowds filled restaurants, hotels and sidewalks in downtown areas where newspapers and brokerage houses would publicize the latest returns and people withsporting inclinations would wager on the outcomes. Even for those who could not afford large stakes, betting in the run-up to elections was a cherished ritual. Awidely held value was that one should be prepared to "back one's beliefs" either with money or more creative dares. Making freak bets—where the losing bettor literally ate crow, pushed the winner around in a wheelbarrow or engaged in similar public displays—was wildly popular

Gilliams (1901, p. 186) offered "a moderate estimate" that in the 1900 election "there were fully a half-million such [freak]bets—about one for every thirty voters." In this environment, it is hardly surprising that the leading newspapers kept their readership well informed about the latest market odds.

The newspapers recorded many betting and bluffing contests between Col. Thomas Swords, Sergeant of Arms of the National Republican Party, and Democratic betting agents representing Richard Croker, Boss of Tam-many Hall, among others. In most but not all instances, these officials appear to bet in favor of their party's candidate; in the few cases where they took the other side, it was typically to hedge earlier bets.

...In conclusion, the historical betting markets do not meet all of the exacting conditions for efficiency, but the deviations were not usually large enough to generate consistently profitable betting strategies using public information

The newspapers reported substantially less betting activity in specific contests and especially after 1940. In part, this reduction in reporting reflected a growing reluctance of newspapers to give publicity to activities that many considered unethical. There were frequent complaints that election betting was immoral and contrary to republican values. Among the issues that critics raised were moral hazard, election tampering, information withholding and strategic manipulation.

In response to such concerns, New York state laws did increasingly attempt to limit organized election betting. Casual bets between private individuals always remained legal in New York. However, even an otherwise legal private bet on elections technically disqualified the participants from voting—although this provision was rarely enforced—and the legal system also discouraged using the courts to collect gambling debts. Anti-gambling laws passed in New York during the late 1870s and the late 1900s appear to put a damper on election betting, but in both cases, the market bounced back after the energy of the moral reformers flagged. Ultimately, New York's legalization of parimutuel betting on horse races in 1939 may have done more to reduce election betting than any anti-gambling policing. With horseracing, individuals interested in gambling could wager on several contests promising immediate rewards each day, rather than waiting through one long political contest.

The New York Stock Exchange and the CurbMarket also periodically tried to crack down. The exchanges characteristically did not like the public to associate their socially productive risk-sharing and risk-taking functions with gambling on inherently zero-sum public or sporting events. In the 1910s and again after the mid-1920s, the stock exchanges passed regulations to reduce the public involvement of their members. In May 1924, for example, both the New York Stock Exchange and the Curb Market passed resolutions expressly barring their members from engaging in election gambling. After that, while betting activity continued to be reported in the newspapers, the articles rarely named the participants. During the 1930s, the press noted that securities of private electrical utilities had effectively become wagers on Roosevelt (on the grounds that New Deal policy initiatives such as the formation of the Securities and Exchange Commission and the Tennessee Valley Authority constrained the profits of existing private utilities).

A final force pushing election betting underground was the rise of scientific polling. For newspapers, one of the functions of reporting Wall Street betting odds had been to provide the best available aggregate information [...] The scientific polls, available on a weekly basis, provided the media with a ready substitute for the betting odds, one not subject to the moral objections against gambling.

In summer 2003, word leaked out that the Department of Defense was considering setting up a Policy Analysis Market, somewhat similar to the Iowa Electronic Market, which would seek to provide a market consensus about the likelihood of international political developments, especially in the Middle East. Critics argued that this market was subject to manipulation by insiders and might allow extremists to profit financially from their actions.

Note to the future: All links are added automatically to the Internet Archive. In case of link rot, go there and input the dead link.

"I'd rather be a bookie than a goddamned poet." — Sherman Kent, 1964, when pushing for more probabilistic forecasts and being accused of trying to turn the CIA into "the biggest bookie shop in town."


Luna Lovegood and the Chamber of Secrets - Part 3

1 декабря, 2020 - 15:43
Published on December 1, 2020 12:43 PM GMT

"I solemnly swear that I am up to no good."

Fred and George had carried concealed broomsticks at all times since the previous year's troll attack. When they heard Harry Potter's prophecy at the Quidditch final they flew just over the castle walls and then Apparated to the graveyard where they salvaged their map along with several rings, amulets and strange devices.

"Have you visited every room on this map?" Luna asked.

"Excuse me?" George said.

"What part of the school have you never visited?" Luna enunciated each word.

"Did she just insult us?" Fred said.

"Perhaps our reputation is in disrepair," George said.

"What makes you think there exists a single room in the school we hadn't explored by the end of our first year?" Fred said.

"Are you telling me you visited every room in Hogwarts in your first year?" Luna said.

"We make no such claim," George said, "For if were we to make such a claim then we would include not just mere rooms but also secret passages, pocket dimensions, secret dimensions, pocket passages, and docket sassages."

"Surely someone of your reputation must have visited every room, passage, dimension, sassage and chamber by the end of your first year," Luna said.

"Surely," Fred said.

"Had you visited the Chamber of Secrets before Headmistress McGonagall announced its existence today?" Luna said.

Someone said a rude word.

"Where is the Chamber of Secrets?" Luna asked.

"It's this complex of tunnels," George said, "It connects to this painting of Salazar Slytherin to this girls' bathroom and these places over here. This path goes to the Hogsmeade graveyard."

"Has the Chamber of Secrets always been on this map?" Luna asked.

"Yes," Fred said.

"How do you know?" Luna asked.

"I remember it," Fred said, "I just never really noticed it before the McGonagall's announcement."

"Are there any other rooms on this map you haven't noticed?" Luna asked.

The students failed to find anything they hadn't found before.

"I have an idea," Luna said, "First we are going to look at this map normally. We are going to list every room and secret passage we know of, including the Chamber of Secrets. We are going to count them. Then you are going to conjure a grid over this map. We are going to count every single room without identifying them. Then we will compare the two numbers."

The two numbers came up exactly the same.

"Wait a minute. I have another idea. Give me that list. Is there any room in this castle you haven't been to?" Luna said.

"No," George said, "We've been everywhere important except the Chamber of Secrets."

"Let's go over the map again," Luna said, "List each room you've been inside."

"…and that's the broom closet we trapped Percy in along with his girlfriend, his ex-girlfriend, his ex-ex-girlfriend and Peeves," Fred finished.

Luna tore up little bits of parchment and covered up each room Fred and George had visited. She had covered nearly all of the Marauder's Map. There were just a few unimportant rooms that didn't really count. Then Luna caught herself. This must be what a Muggle-Repelling Charm felt like.

Luna deliberately read off the unimportant rooms she had just nearly written off.

  • The Chamber of Secrets
  • The Stone Citadel (under construction)
  • The Room of Requirement
  • The Forgotten Library


Book Review: 'History Has Begun: The Birth of a New America' by Bruno Maçães

1 декабря, 2020 - 14:47
Published on December 1, 2020 11:47 AM GMT

This is a linkpost for https://www.themetasophist.com/chapter/on-history-has-begun-by-bruno-macaes

One of the iconic events during the fall of Hellenic-Classical civilisation was the burning of the library of Alexandria. The conquering Muslims apparently thought that the books therein would “either contradict the Koran, in which case they are heresy, or they will agree with it, so they are superfluous.”

The story may be apocryphal, but it seems the tendency is very real, particularly in the US. Removing authors from reading lists, the knocking of statues, the renaming of buildings – is this a passing fad, or a world-historical event? Just as a nascent Arabic culture once cast off the forms of Hellenism, is a nascent American culture throwing off European patterns of thought and behaviour?

To help answer this question I turned to the new book “History Has Begun: The Birth of a New America” by Bruno Maçães. The book is not really seeking to solve a problem, but rather to describe a phenomenon called political virtualism. 

The Idea of Political Virtualism

The underlying view of the book seems to be that the US is not undergoing a decline, but a metamorphosis: liberalism is mutating, and a new dispensation is starting to form. Maçães terms the emerging outlook political virtualism, meaning an immersion in stories and fantasies, none of which are held to be final. Importantly, fiction is not used to mask reality, as in Russia, but to replace it.

He uses a number of examples to illustrate this. Whereas politicians such as Reagan and Schwarzenegger used their acting skills to appear more credible as politicians, newer politicians use their political skills to perform in a way more suited to actors. Trump’s approach to governance was driven by what would look good on television. Of his own election night, he said “it was one of the greatest nights in the history of television”, and his obsession with cable news coverage has continued ever since.

Maçães also indicts Ocasio-Cortez on the same count — judging that the Green New Deal is lacking as a policy plan but complete as a movie script, he quotes the politician herself once stating that: “We have to become master storytellers. Everyone in public service needs to be a master storyteller.”

But according to Maçães, this tendency also manifests in many other forms such college students pretending to be drunk in order to live up to the role they are expected to play.

He also offers Silicon Valley as an example, whose elite “is so enamoured of the eureka moment of pure insight that it resists coming back to Earth in order to give a final shape to social relations” and “dream of a future world but is not necessarily interested in finding a role for everyone else in those dreams.” He believes the support for a universal basic income by some Valley thinkers amounts to an effort “to create a safe playground where every technological dream can be freely pursued because no one will be seriously damaged or harmed by it.”

But from where did this trend emerge?

According to Maçães, the trend was foreshadowed in the literary and philosophical works of Sinclair Lewis, William James and John Rawls. For William James, there were many truths, and he sought to create a liberalism where “people can go on with their separate lives and their separate philosophies”. For Rawls, the effort to make everyone make everyone think and act identically is unworkable. His solution was to “allow many different doctrines to flourish, provided they all agreed on some fundamental political principles.”

But the most accurate incarnation of the new dispensation is to be found in Lewis’s novel, the main character of which begins life by simply imitating everyone around him. He sees this as a good thing, as it pushes people to “produce — produce — produce!” Eventually realising how meaningless this situation is, his solution is to “preserve the social and economic fabric in its current form but grant each person the freedom to break with convention in his or her actions.” Maçães’s interpretation is that the tendency to “enhance or embellish reality becomes a vital operating system, and it is precisely this longing that Lewis suggests is the difference between Americans and Europeans.”

Is Political Virtualism unique to today’s America?

The idea of political virtualism is certainly a powerful lens: upon learning about it, it is difficult not to notice it repeatedly. Two examples include the common references to the election as a season finale complete with plot twists such as Trump contracting the virus, and the viral meme depicting the victorious Democrats as the Avengers. 

How new is the phenomenon? It has possibly existed in miniature before. Was not a notion already in Shakespeare, when he said “all the world's a stage, and all the men and women merely players”? And Trump is not the first politician to compare himself to an actor: Suetonius remarked that on the last day of the Emperor Augustus, he asked his friends if he played his part well in the comedy of life. Later, he said in Greek “since well I’ve played my part, all clap your hands, and from the stage dismiss me with applause.”

And the world has been embellished before, namely through religion. Pre-enlightenment superstitions included holy wells, fairies, and guardian spirits floating over a city. Is the tendency described by Maçães simply another episode in the rise and fall of rationalism which, according to Oswald Spengler, every society will experience? Because in political virtualism there are certainly some echoes of what Weber described as the “great enchanted garden” which was the traditional world.

To explain the phenomenon, elsewhere Maçães has offered wokeness as an example of virtualism:

Wokeness is virtual, not radical politics. The woke left has more or less deliberately abandoned every project of social transformation. Instead, it creates a public performance, a reality show of social progress and asks us to play our role. That it is not radical is amply shown by the way it seamlessly fits with corporate America. 

But perhaps a better explanation for Wokeism is re-religiofication, as described by Toynbee and Spengler. Many have already noticed the similarities between Wokeness and certain strands of Christianity.

But why should we be witnessing such a phenomena now? Because as described by Eric Hoffer in The True Believer, when the lives of a section of society begin to worsen, their expectations are frustrated and so they turn to mass movements to give meaning to their frustrated lives – especially those movements that take on the guise of a holy cause. When conventional ways of obtaining meaning such as career and family are harder to obtain, fanatical movements thus spread like disease in a nutritionally deprived population. While the displacement and unemployment caused by the virus probably accelerated the growth of these movements, their formation was facilitated by prior discontent — probably caused by a hopefully temporary mixture of inequality and elite overproduction as described by Peter Turchin.

The fact that wokeness does not have any concrete idea of social transformation is another similarity with Christianity, which itself avoided any concrete criticism of the Roman social order and preferred to focus on the rich in general.

Notwithstanding the above points, it does seem that there is something qualitatively different about this episode. Perhaps the best example of virtualisation is QAnon, a conspiracy theory in which Donald Trump is pitted against a paedophiliac deep state. Game designer Adrian Hon has even likened it to an alternate-reality game as people must follow and discover clues to solve mysteries. Indeed those whose relatives became caught up in the game sensed that they were “inhabiting a different reality.” Maçães does not mention QAnon in the version of book I read, presumable because it is a relatively recent phenomenon. 

Perhaps what needs to be explained is the degree of dramatisation and immersion, as opposed to the phenomenon itself. Technology is the obvious factor here, providing another datapoint for the materialist conception of history: social media gave us the stage to exercise our newfound dramatic sense, acquired over decades of exposure to TV and cinema. Maçães rightly points out that online “everyone is a television character.”

But is there another reason why virtualism is particularly prevalent in the US? The American media market is the biggest in the West. We know from standard economic theory that the greater the market size, the greater the degree of product differentiation. And the greater the degree of product differentiation, the more the product will match the tastes of the consumer. American media is thus likely to be more appealing, or immersive, to the average American then the media in any other Western country.

The major US media companies have actively nurtured this. Trump may have been bad for their mental health, but he was great for their revenues. Maçães offers an interesting anecdote about Dean Baquet, executive editor of the New York Times, where he describes the pivot they would make: readers had grown tired of the Russian story. He described that as Chapter One of the Trump story; the next one would be about racism. As Maçães notes, the allusion to literature rather than journalism is telling.

To convey what a political virtualism might look like, Maçães is fond of referring to the TV series Westworld, where robots almost indistinguishable from humans populate and play host to the wealthy in a Western-themed park. But in Westworld, the owner of the park, the Man in Black, became obsessed with discovering something true in it. Meanwhile, the architect of the park, Robert Ford, secretly prepared his creations to go out into the real world

Such redeeming features are rarely found in the contemporary US media class, who seem blithely unconcerned with reality and intent on keeping their audience in a fictional and confrontational world. Their true resemblance is perhaps not to any Westworld character but to Elliot Carver, the Bond villain from Tomorrow Never Dies who tries to start a war between the UK and China in order to increase media revenues

On this count, some of them are not doing too badly. In the past five years, the revenues of The New York Times have increased by around 15pc; the share price by 200pc.  If the phenomenon was virtual, the rewards were very much material. 

The Future of Political Virtualism

As technology continues to develop in the future, one can only imagine that Maçães believes that the tendency towards virtualisation will accelerate. 

Can we imagine that an entire populace could become enchanted by political virtualism? Let’s return to Westworld. When the robots became aware of their situation, many of them naturally wanted to leave. As for the one human more immersed in the Park than anyone else, the Man in Black, one of the rare times we see him smile is when he gets shot in the arm a rebelling host — because now, finally, the consequences were real.

This reflects a general principle: in a world too real, people will want to escape from it. But the same may be true for a world that is too unreal, in which case the world proposed by Maçães will not be an equilibrium. To some degree the author accounts for this by saying there should be a kill switch for any given fiction. But the existence of such a switch indicates that the overall realisation of the model will be curtailed.

This is not a bad thing, as a full application of virtualism could weaken society. To survive, a society must do more then entertain and entrance: it must solve problems. Substantial amounts of attention devoted to the unreal would suck attention away from pressing issues. Worryingly, virtualism could even be self-reinforcing: problems unattended to will go unresolved, deepening the desire to escape from a worsening reality.

Another doubt relates to the fact that for the political virtualist model to work, nobody should try to impose their story on others. But normally, the most engaging stories have an ending that feeds into some world-historical narrative with a clear enemy. Evangelical Christians talk about the Antichrist and the Left talks about fascism. Although both threats are to some degree exaggerated or even manufactured, they are nonetheless projected onto disbelieving others. While this tendency remains intact, it is difficult to see how either side could live and let live in the way they would need to for virtualism to be a stable set of affairs. But Maçães provides insufficient indication of how such a modus vivendi might be established.

There are also issues with the internationalisation of his vision. His chapter on foreign policy is standard realist analysis in slightly different language — whereas an offensive realist such as John Mearsheimer would say the US should prevent any single Eurasian power from dominating the Eurasian continent, Maçães says the US should ensure that each part can pursue their own way of life.

But would a US elite immersed in fantasy really be able to do that? Mental models consciously developed in one context will unconsciously be applied in another; and stories developed at home will inevitably be projected abroad — we have already seen this with the domestic narrative of political liberalisation being projected on to other countries such as China and Turkey, despite nothing of the sort happening. Belief in narratives inhibited perception of the facts, the sin qua non of a realist policy. 

Moreover, pursuing realism requires one to focus on the capabilities of other nations and not their intentions, as the latter can shift rapidly and so cannot be relied upon. And yet stories centre intentions, character development, and plot twists above all. That could mean that policy will bias towards a focus on oscillating intentions and fleeting emotions, whereas a realist foreign policy should discount or even ignore the same.

However, in another sense, such unpredictability or inscrutability could help the US pursue a realist policy. If allied countries know that the US will always be there to provide a security guarantee, they will be slower to invest in their own defence. An unpredictable US could thus be necessary for an offshore balancing style strategy to work, as containing a powerful rival is more viable if neighbouring countries are willing and able to defend themselves.

Implications for Metasophism

Are there any favourable features of political virtualism that could be used to enhance societal cohesion and creativity? At this stage it is important to note that the book mostly aims to describe what Maçães sees as an emergent phenomenon; he does not advocate it per se. However, he is enthusiastic:

I believe the quest for total immersion is the holy grail of modern politics. A society of stories would be able to create new experiences and genuine feelings and thoughts in a completely artificial environment. The possibilities are endless.

But perhaps the key advantage of political virtualism he describes as follows:

Democracy is less the incorporation of input from voters than the constant appeal to viewers with new content, new projects and new possibilities. Even if one movement or concept is often dominant, having more access to the public and greater resources, there are other alternatives struggling to survive and new ones being prepared. The goal of the state is similar to that of a scriptwriter: to bring all the different characters and stories together, deciding which should have room to grow and which should play a supporting role or move to the background.

I find this part of the vision attractive, namely because it is important to ensure a certain competition among narratives and worldviews. As Pareto saw with his theory of the ciruclation of elites, it is when one worldview monopolises institutions that things start to go wrong. However, it is precisely state institutions that have been monopolised in the past. This is why the Metasophist system includes a decentralised way for selecting elites and allocating resources to creative groups, with the explicit goal of ensuring a healthy competition among worldviews and narratives. Some of the young would propose projects, but it is their peers who would decide which get to go ahead.

Moreover, as mentioned above, there is a problem with incommensurable narratives. In order to make the narratives commensurable, they need to lead to the same point — in Aristotelian terms, they need to have the same telos. In Metasophism, the common destination of all narratives and actions is the ensuring the survival of humanity and discovery of the meaning of life (assuming such a thing is possible). As I have previously described, that is something to which each individual and group could contribute; their narrative could therefore be centred around it. 

For the above two reasons, it seems that Metasophism already has integrated the fundamental advantages of virtualism.


In an arresting thought, the philosopher Roger Scruton once wrote that the person who set fire to the library of Alexandria did the greatest service to civilisation, as it meant that scholars in subsequent ages no longer needed to pore over mediocre tomes. He is probably right: people must be free to set off on their own path rather than constantly rethreading the well-worn paths of the past. If America set off on its own path, while Europe refines its own outlook, both may end up gaining from the informed perspective of the other.

In the same spirit, I am somewhat positive about political virtualism, even if I have an overpowering bias against the fact that it might ignore or even deny the idea of reality. It probably cannot work exactly as Maçães describes, but it could move at least one part of the West somewhat closer to a Metasophist condition where the validity of different paths and outlooks are recognised — and this shift in mindset is urgently needed. 

In any case, the book is a highly engaging and thought-provoking read.


My Fear Heuristic

1 декабря, 2020 - 11:00
Published on December 1, 2020 8:00 AM GMT

My friends and family call me "risk tolerant". I wasn't always this way. It is the result of a 3-year-long scientific experiment.

You must do everything that frightens you…Everything. I’m not talking about risking your life, but everything else. Think about fear, decide right now how you’re doing to deal with fear, because fear is going to be the great issue of your life, I promise you. Fear will be the fuel for all your success, and the root cause of all your failures, and the underlying dilemma in every story you tell yourself about yourself. And the only chance you’ll have against fear? Follow it. Steer by it. Don’t think of fear as the villain. Think of fear as your guide, your pathfinder.

The Tender Bar by J.R. Moehringer

When I was 18 I discovered a useful heuristic. Whenever I didn't know what to do I would pick whatever not-obviously-stupid[1] option frightened me the most.

My indecisions always centered around choosing between a scary unpredictable option and a comfortable predictable option. Since the comfortable option was always predictable, I always knew what the counterfactual would have been whenever I chose the scary option. If I chose the scary option then I could weigh the value of both timelines after the fact.

As an experiment, I resolved to choose the scarier option whenever I was undecided about what to do. I observed the results. Then I recorded whether the decision was big or little and whether doing what scared me more was the right choice in retrospect. I repeated the procedure 30-ish times for small decisions and 6-ish times for big decisions. If I were properly calibrated then picking the scary option would result in the correct choice 50% of the time.


  • For my 30-ish small decisions, picking the scary option was correct 90% of the time.
  • For my 6-ish big decisions, picking the scary option was correct 100% of the time.

The above results underestimate the utility of my fear heuristic. My conundrums were overwhelming social. The upsides earned me substantial value. The downsides cost me trivial embarrassments.

I terminated the experiment when my fear evaporated. The only things I still feared were obviously stupid activities like jumping off of buildings and unimportant activities like handling large arthropods. I had deconditioned myself out of fear.

I didn't lose the signal. I had just recalibrated myself.

  1. "Stupid" includes anything that risks death or permanent injury. ↩︎


Notes on Sincerity and such

1 декабря, 2020 - 08:09
Published on December 1, 2020 5:09 AM GMT

This post examines a cluster of virtues that includes straightforwardness, frankness, sincerity, earnestness, candor, and parrhêsia. I hope it will help people who want to know more about these virtues and how to nurture them. I am a technical writer by trade and have developed some strong opinions about the value of, and the techniques of, clear and accurate communication, and so I will also draw on that experience to inform this post.

What are these virtues?

“[L]et sincerity and ingenuousness be thy refuge, rather than craft and falsehood: for cunning borders very near upon knavery. Wisdom never uses nor wants it. Cunning to the wise, is as an ape to a man.” ―William Penn

These virtues have to do with communicating in a way that is clear, precise, efficient, and useful. They show respect for those you are communicating with by “giving it to them straight” and not forcing a lot of second-guessing and interpretation.

I briefly mentioned some of them in my post on the related virtue of honesty, but now I want to look at them more closely.

In short, what these virtues have in common is “saying what you mean, and meaning what you say” (but also not talking a lot of rot that’s not to the point). If honesty covers “the truth,” the rest of these virtues help to cover “the whole truth, and nothing but the truth.”

Other closely related virtues include trustworthiness, reliability, and authenticity.

In opposition to these virtues are things like beating around the bush, candy-coating, ambiguity, euphemism, flattery, winks-and-nods, insinuations, exaggerations, incantations, ostentation, deflection, pretension, evasion, false modesty, irony, sarcasm, manipulativeness, insincerity, flippancy (making light of serious matters), changing the subject, playing rhetorical motte and bailey, or being “all hat and no cattle.”

There is some tension between these virtues and the virtues of tact and discretion (see below).

These virtues span a spectrum of outspokenness. On the more reserved end, you may rarely speak, but when you do, you speak sincerely and straightforwardly to the point. Towards the middle, you may try to anticipate what people would want to know and, with frankness and candor express this, warts and all, whether they ask or not. At the unreserved extreme, you may feel compelled to reveal those things that people don’t want to know but need to be confronted with: this is the parrhêsia that made the Cynic philosophers notorious (and sometimes unwelcome).

I think maybe if we all exhibited parrhêsia we’d get sick of it pretty quick, but in small doses it’s valuable. It’s the person with a bit of parrhêsia who is the first to call out someone on their racist joke or sexist assumption, or to mention the elephant in the room, or to laugh at the emperor’s new clothes, or to confront someone about their drinking problem while everyone else keeps to the conspiracy of silence.

Being sincere isn’t always about what you communicate, but sometimes about what you won’t. If you feel the need to be mysterious, if you like to keep people guessing, if you present yourself as something of a code and judge your friends by their ability to crack it… well, you might want to consider how to be more straightforward instead.

But what about tact and discretion?

“[He] looked from me to the forms and back again, giving me the exact kind of smile of someone who, on Christmas morning, has just unwrapped an expensive present he already owns.” ―David Foster Wallace, The Pale King

Tact has to do with communicating in a way that will not hurt feelings, step on taboos, or in other ways be impolite or off-putting. Discretion can mean steering clear of topics that might raise hackles or open old wounds, or it can also mean keeping secrets and not being a blabbermouth about things that weren’t your business to begin with.

These things seem at first glance to be in conflict with candor. One possibility is that they are, and that maybe this shows that tact, discretion, frankness, and candor are not all virtues after all. Another possibility is that they are all virtues, but that we should not always expect virtues to fit together flawlessly in a mutually-compatible way: they are after all not commandments handed down from on high, but merely generalizations about human character traits shaped by generations of folk psychology. Another possibility is that they are compatible after all, but that it takes a little extra discernment and skill to make them work together nicely.

Let’s consider this last possibility:

There’s a fine line between giving someone an unpleasant answer accompanied with “a spoonful of sugar to help the medicine go down” and giving them the answer so candy-coated that the bitter truth can no longer be tasted at all. Sometimes not talking about something is a peaceful way to agree to disagree, or to mind one’s own business; other times it can be complicity with a foolish demand to ignore the elephant in the room.

This is to say that tact and discretion can certainly be deployed in the service of insincerity and cover-up; but that doesn’t prove that they are necessarily always deployed that way. If you can learn to be tactful in a sincere way, to be candid about your discretion, you will have found a way to improve both sets of virtues.

Consider the phone-call-ending phrase “I guess I should let you get back to work.” It usually means, more frankly, “I think we’re done now; let’s end this call.” The first phrase is often (and usually pretty obviously) insincere, if only a little grating; the second phrase somehow seems too blunt, maybe a little rude, implying that you’re eager to be rid of an unpleasant duty. I can’t help but think that a skillful person, well-practiced in both sincerity and tact, could come up with more graceful ways to bring such a call to an end.

Consider also this remarkable essay by philosopher Agnes Callard. She writes of some sort of trauma she endured long ago, and of some sort of neuroatypicality she experiences, but she steadfastly refuses to give names to either of those things. She wants to talk about them, but if she names them, she suspects we will use those names to apply a familiar template to her and her experiences, and then we will interpret what she says according to that template.

“And that means I can’t talk to you. No one can sincerely assert words whose meaning she knows will be garbled by the lexicon of her interlocutor.…

“[I]t chafes at me that you have decided that if I want to talk… with you, I have to follow your rules, and let you trample all over me.”

In this way she uses her discretion, even her blunt lack of candor as a way of being straightforward and sincere in a way that she feels would be otherwise unavailable to her.

But what about irony, sarcasm, and stuff like that?

“When people speak in a very elaborate and sophisticated way, they either want to tell a lie, or to admire themselves. You should not believe such people. Good speech is always clear, clever, and understood by all.” ―Leo Tolstoy

People use irony and sarcasm, understatement and hyperbole, parody and caricature, modest proposals and other such rhetorical devices to express themselves creatively in different registers. This can be entertaining, witty, and clever of course, but also sometimes insightful and poignant and biting in a way that would be difficult to match with more straightforward ways of speaking.

This raises the question of whether the virtues in this collection are ones that threaten to make us dull and to limit our expressive range.

If you are hesitant to give up these shades of your conversational palette, consider instead that there may be better and worse ways to use them. For example, if you speak ironically but in a way that is understood as such by those you are speaking with, that’s very different from speaking ironically in a way that some of your audience gets, and gets to feel superior about, at the expense of those whose heads you’re speaking over.

If you use hyperbole in a fun way, as another form of shared-irony, that may be innocent enough. But if you use it excessively or unthinkingly — if you seemingly can never be concerned by what you see in the news without being “deeply troubled” by it, for instance — consider recalibrating your verbiage.

I suspect that this is one of those cases where we will have to rely on the spirit of the law rather than any firm prohibitions. Use these rhetorical registers, but use them carefully, and question your motives for using them. If you resort to caricature to make a joke or simplify an example, maybe that’s all fun and games, but if you use it to reinforce a stereotype or paint a complex grey area in black-and-white to hide its complexity, consider that you may have taken a step too far.

There is a lot of rotten use of this stuff going around these days. Someone says something insincere and offensive, and the next thing out of their mouths is something along the lines of “I was only putting it out there,” “That’s just what some people are saying,” “I only wanted to see how you’d react,” “Hah; you thought I was being literal,” “I sure triggered the outgroup with that!” Don’t be that guy.

But what about flirtation?

  It were as
possible for me to say I loved nothing so well as
you: but believe me not; and yet I lie not; I
confess nothing, nor I deny nothing.

―Beatrice, in Much Ado About Nothing

Flirtation is a form of communication that is indirect and ambiguous and that stubbornly talks around the main topic without addressing it directly. You remain coy and veiled, hinting and feinting at your intentions and feelings rather than stating them outright. Rather than being sincere and straightforward, you create and amplify uncertainties. Irony abounds and the earnest are out of their depth: Playing hard-to-get is sometimes the only way to get got.

Flirtation has been likened to a game. As in a game, the players are sincere in trying to play well; but the game itself is a sort of make-believe. The moves in the game refer to the game, and not to the world outside. (The hockey player does not play the game in order to put the puck in the net, but puts the puck in the net because that is how you play the game.)

Telling someone who is flirting that they ought to be more straightforward and candid is like telling a courting peacock he’ll never get off the ground flapping his tail feathers like that, or like telling someone playing Monopoly that they should probably switch to real money if they want to have any hope of buying seaside property in New Jersey. It’s missing the point.

My guess is that flirting of this sort is ubiquitous, cross-cultural, and ancient. But I may be overgeneralizing from the culture I grew up with and what it has found meaningful from other cultures and times. If flirting is one of the essential things human language is for, and it operates by its own set of rules, perhaps it is best to sandbox it appropriately, like our other games, and play it as best we can. But if flirting is merely a sort of insincerity we allow ourselves to indulge in in courtship scenarios, maybe we should see if we can disarm it with surprising candor and begin our romances on a more sincere note. That might work out better. To me, it’s an open question. I’ve experimented with both modes, and with many experiments under my belt by now, all I can say for certain is that you’d be a fool to take any romance and relationship advice from me.

But these days, with an increasing percentage of couples meeting through on-line dating sites, it seems that at least some flirting has become moot. There’s little point in being coy with the person you’ve met through looking-for-love-dot-com. The secret’s out.

I had gotten through writing this whole section before I remembered the whole pickup artist scene and its ruthless cultivation of insincerity in pursuit of the ol’ in-and-out. I don’t want to dwell on it, but one way of looking at it is that it just takes some of the logic of flirtation to extreme conclusions: if flirtation includes pretension, manipulation, evasion, flattery, and the like, why not just declare no-holds-barred and play to win on your terms?

But what about culture jamming?

For a long while I was a fan, scholar, and practitioner of “culture jamming,” and I admiringly cataloged historical pranks, hoaxes, frauds, impostures, counterfeits, tricksters, trolls, fakery, hacktivism, performance art, fauxvertising, forgeries, scams, modest proposals, and things of that sort at a 20th century web 1.0 site called Sniggle.net: The Culture Jammer’s Encyclopedia.

I explained:

Most of this site highlights deception, but it’s not because I have a thing for liars and cheats. I think there’s a brand of immunizing deception that helps us to expose and correct the lies we tell ourselves and the webs of falsehood that make up our societies. Harmless fibs can remind us that we’ve dropped our guard and let the Big Lies in.

In an interview I doubled down on the therapeutic explanation of culture jamming:

A whole lot of the evil of the last century was conducted by people who followed rather sheep-like the twisted consensus reality of their societies. What the trickster does is to find flaws in that consensus reality and to construct creative performances to exploit and uncover those flaws. If this happens enough, perhaps people will come to develop an instinctive distrust of consensus reality and will be more likely to see reality as it is.

I don’t think time has been kind to my theories. The increasing prominence of outrageous lies and fakery has not immunized people against their effects or caused people to become more vigilant, as I had naïvely hoped might be the case. Instead of the collapse of illegitimate authority leading to people thinking for themselves, people seem to have said “well, I guess there’s no way to know what nonsense to believe, so I’ll just believe whichever nonsense I like the best.”

I still think there is something useful in the art of culture jamming, and I still admire a well-delivered hoax in a good spirit. I wasn’t all wrong, I don’t think. But today, older and wiser (I hope), I choose to err on the side of sincerity instead.

But what about framing?

“Political language — and with variations this is true of all political parties, from Conservatives to Anarchists — is designed to make lies sound truthful and murder respectable, and to give an appearance of solidity to pure wind.” ―George Orwell, Politics and the English Language

Framing (or “spin”) is the attempt to fit revealed or asserted facts into a rhetorical framework in such a way that they will lead people to desired conclusions or away from undesirable ones. When this is called “spin” it usually implies purposeful dishonesty; when it is called “framing” its proponents sometimes claim it can be done in the service of clarity and honesty, or to defend against spin.

However, I more often see framing deployed as a way of trying to manipulate the audience into adopting a certain belief without putting that belief forward and defending it explicitly: in other words, as just spin with a new name. Here, for example, is George Lakoff, a scholar of framing, explaining a textbook example:

The phrase “Tax relief” began coming out of the White House starting on the very day of Bush’s inauguration. It got picked up by the newspapers as if it were a neutral term, which it is not. First, you have the frame for “relief.” For there to be relief, there has to be an affliction, an afflicted party, somebody who administers the relief, and an act in which you are relieved of the affliction. The reliever is the hero, and anybody who tries to stop them is the bad guy intent on keeping the affliction going. So, add “tax” to “relief” and you get a metaphor that taxation is an affliction, and anybody against relieving this affliction is a villain.

So he recommends this frame instead:

It is an issue of patriotism! Are you paying your dues, or are you trying to get something for free at the expense of your country? It’s about being a member. People pay a membership fee to join a country club, for which they get to use the swimming pool and the golf course. But they didn’t pay for them in their membership. They were built and paid for by other people and by this collectivity. It’s the same thing with our country — the country as country club, being a member of a remarkable nation.

This is just trading one unstated and poorly-defended set of background assumptions for another. To me, describing tax cuts as “relief” seems more accurate and honest than describing them as “reduced country club membership fees,” but they’re both examples of manipulative spin. A more straightforward and respectful way of discussing this issue would be simply to describe the proposed tax law changes, what effects they could reasonably be expected to have, and whether or not you think that would be a good thing.

This then brings up the larger debate — which I don’t want to wade into now — about whether one has to leave one’s virtues behind and fight dirty when one steps into the ring of political action.

If you are thinking of your argument in terms of what framing and spin to apply to it, you’ve probably left the field of candor, frankness, sincerity, and so forth, and it’s time to take a U-turn if you want those virtues back.

Manipulative framing and other forms of insincerity are so ubiquitous in political and culture-war discussions that I think it’s usually best to avoid them entirely so as not to be infected. Mute every pundit. Kill your television. Ignore politicians. “Read not the Times. Read the Eternities.”

Appendix: The trouble with passive voice sentences

Professionally I am a technical writing consultant in the software industry. A company will sometimes call me in to fix things when nobody can understand the documentation that was written by the engineers who created the software. If I have time, one of the things I will do for such a company is write up (or occasionally present a class on) “technical writing for software engineers in one easy lesson.”

That lesson is essentially this: hunt down and kill every passive voice sentence. If they can grok that, they’re 80% of the way to being able to write reasonable technical documentation themselves.

Quick grammar review: An active voice sentence has a subject, verb, and object: “Elon unwisely tweeted insider information.” (Imperative sentences are an exception: they do not have an explicit subject, but the subject is implicitly whomever the sentence is uttered to.) A passive voice sentence may leave out the subject entirely: “Insider information was unwisely tweeted.” A missing subject means missing information, which means ambiguity, which makes trouble for precise technical communication.

“The transmit box should be checked.” Who is responsible for checking it? Do I need to check it somehow? or ought I to make sure someone else has checked it? or is it not being checked a suspicious error condition I should be on guard for? There is no way for me to know based on that passive voice description.

Passive voice sentences are often accurate without being complete or precise. Their accuracy makes them seem non-problematic to those who write them, while their imprecision makes them non-helpful to those who read them. People often write in the passive voice out of laziness, or because it can sound a little more formal and academic and so has a false air of sophistication to it. But people sometimes also use it to hide their ignorance or to sweep things under the rug.

The so-called “past exonerative tense” usually takes advantage of the blame-dodging obscurity of the passive voice. It’s most notoriously put to use by police department press releases that test the limits of grammar to avoid straightforwardly describing officer misconduct. “Handcuffed Man Fatally Shot Inside Police Cruiser” reads the headline. “The man was shot multiple times by the officer’s service weapon, the police spokeswoman said.”

In summary: I recommend that you look very skeptically on any passive-voice sentence you intend to deploy that excludes the subject — not just in technical writing, but in any communication that you want to be clear and precise.


What is “protein folding”? A brief explanation

1 декабря, 2020 - 05:46
Published on December 1, 2020 2:46 AM GMT

Today Google DeepMind announced that their deep learning system AlphaFold has achieved unprecedented levels of accuracy on the “protein folding problem”, a grand challenge problem in computational biochemistry.

What is this problem, and why is it hard?

I spent a couple years on this problem in a junior role in the early days of D. E. Shaw Research, so it’s close to my heart. Here’s a five-minute explainer.

Proteins are long chains of amino acids. Your DNA encodes these sequences, and RNA helps manufacture proteins according to this genetic blueprint. Proteins are synthesized as linear chains, but they don’t stay that way. They fold up in complex, globular shapes:

A protein from the bacteria Staphylococcus aureus.Wikimedia / E A S

One part of the chain might coil up into a tight spiral called an α-helix. Another part might fold back and forth on itself to create a wide, flat piece called a β-sheet:

Wikimedia / Thomas Shafee

The sequence of amino acids itself is called primary structure. Components like this are called secondary structure.

Then, these components themselves fold up among themselves to create unique, complex shapes. This is called tertiary structure:

An enzyme from the bacteria Colwellia psychrerythraea. Flickr / Argonne National LabThe protein RRM3. Wikimedia / Biasini et al

This looks like a mess. Why does this big tangle of amino acids matter?

Protein structure is not random! Each protein folds in a specific, unique, and largely predictable way that is essential to its function. The physical shape of a protein gives it a good fit to targets it might bind with. Other physical properties matter too, especially the distribution of electrical charge within the protein, as shown here (positive charge in blue, negative in red):

Surface charge distribution of Oryza sativa Lipid Transfer Protein 1. Wikimedia / Thomas Shafee

If a protein is essentially a self-assembling nanomachine, then the main purpose of the amino acid sequence is to produce the unique shape, charge distribution, etc. that determines the protein’s function. (How exactly this happens, in the body, is still not fully understood, and is an active area of research.)

In any case, understanding structure is crucial to understanding function. But the DNA sequence only gives us the primary structure of a protein. How can we learn its secondary and tertiary structure—the exact shape of the blob?

This problem is called “protein structure determination”, and there are two basic approaches: measurement and prediction.

Experimental methods can measure protein structure. But it isn’t easy: an optical microscope can’t resolve the structures. For a long time, X-ray crystallography was the main method. Nuclear magnetic resonance (NMR) has also been used, and more recently, a technique called cryogenic electron microscopy (cryo-EM).

X-ray diffraction pattern of a SARS protease. Wikimedia / Jeff Dahl

But these methods are difficult, expensive, and time-consuming, and they don’t work for all proteins. Notably, proteins embedded in the cell membrane—such as the ACE2 receptor that COVID-19 binds to—fold in the lipid bilayer of the cell and are difficult to crystallize.

Wikimedia / CNX OpenStax

Because of this, we have only determined the structure of a tiny percentage of the proteins that we’ve sequenced. Google notes that there are 180M protein sequences in the Universal Protein database, but only ~170k structures in the Protein Data Bank.

We need a better method.

Remember, though, that the secondary and tertiary structures are mostly a function of the primary structure, which we know from genetic sequencing. What if, instead of measuring a protein’s structure, we could predict it?

This is “protein structure prediction”, or colloquially, the “protein folding problem,” and computational biochemists have been working on it for decades.

How could we approach this?

The obvious way is to directly simulate the physics. Model the forces on each atom, given its location, charge, and chemical bonds. Calculate accelerations and velocities based on that, and evolve the system step by step. This is called “molecular dynamics” (MD).

The problem is that this is extremely computationally intensive. A typical protein has hundreds of amino acids, which means thousands of atoms. But the environment also matters: the protein interacts with surrounding water when folding. So you have more like 30k atoms to simulate. And there are electrostatic interactions between every pair of atoms, so naively that’s ~450M pairs, an O(N2) problem. (There are smart algorithms to make this O(N log N).) Also, as I recall, you end up needing to run for something like 109 to 1012 timesteps. It’s a pain.

OK, but we don’t have to simulate the entire folding process. Another approach is to find the structure that minimizes potential energy. Objects tend to come to rest at energy minima, so this is a good heuristic. The same model that gives us forces for MD can calculate energy. With this approach, we can try a whole bunch of candidate structures and pick the one with lowest energy. The problem, of course, is where do you get the structures from? There are just way too many—molecular biologist Cyrus Levinthal estimated 10300 (!) Of course, you can be much smarter than trying all of them at random. But there are still too many.

So there have been many attempts to get faster at doing these kinds of calculations. Anton, the supercomputer from D. E. Shaw Research, used specialized hardware—a custom integrated circuit. IBM also has a computational bio supercomputer, Blue Gene. Stanford created Folding@Home to leverage the massively distributed power of ordinary home computers. The Foldit project from UW makes folding a game, to augment computation with human intuition.

Still, for a long time, no technique was able to predict a wide variety of protein structures with high accuracy. A biannual competition called CASP, which compares algorithms against experimentally measured structures, saw top scores of 30–40%… until recently:

Median accuracy of predictions in the free modelling category for the best team each year. Google DeepMind

So how does AlphaFold work? It uses multiple deep neural nets to learn different functions relevant to each protein. One key function is a prediction of the final distances between pairs of amino acids. This guides the algorithm to the final structure. In one version of the algorithm (described in Nature and Proteins), they then derived a potential function from this prediction, and applied simple gradient descent—which worked remarkably well. (I can’t tell from what I’ve been able to read today if this is still what they’re doing.)

A general advantage of AlphaFold over some previous methods is that it doesn’t need to make assumptions about the structure. Some methods work by splitting the protein into regions, figuring out each region, then putting them back together. AlphaFold doesn’t need to do this.

DeepMind seems to be calling the protein folding problem solved, which strikes me as simplistic, but in any case this appears to be a major advance. Experts outside Google are calling it “fantastic”, “gamechanging”, etc.

Between protein folding and CRISPR, genetic engineering now has two very powerful new tools in its toolbox. Maybe the 2020s will be to biotech what the 1970s were to computing.

Congrats to the researchers at DeepMind on this breakthrough!


Links for Nov 2020

1 декабря, 2020 - 04:31
Published on December 1, 2020 1:31 AM GMT

I started collecting various links on FB and figured I'd cross-post. This is the second post, there's a link at the top to the first post a month ago. 


Giving Tuesday 2020

1 декабря, 2020 - 01:30
Published on November 30, 2020 10:30 PM GMT

As they did in 2018 and 2019, Facebook is running donation matching for Giving Tuesday. The match is "real" in that you can choose which charity receives the money, but not "real" in that the money will go to some charity whether or not you donate. Because I think some charities are much higher priority than other, however, from my perspective the match is real.

As in previous years, they have a limit of $20k/person and $2,499/donation. More details and instructions at EA Giving Tuesday.

I am planning to (nearly) max out all of my credit cards, and donate over $20k:

  • We are not sure exactly when the matching clock actually starts, so I'm planning to start several seconds early.

  • Some of my donations may get declined.

  • You can donate with a credit card, and Facebook is covering the processing fees:

This means that you can get cash back on donations, which is 1-2% of potentially quite a lot of money.

I made some practice donations today to refamiliarize myself with the interface. While these donations are not eligible for the match, they still have credit card processing fees waived, so they are still a good deal. I donated the $2,499 maximum for each test, in the vague hope that this might make my credit card processors think that $2,499 charges from Facebook are normal.

I've set an alarm for tomorrow morning, half an hour before the timer begins. The main thing I want to check tomorrow is that Facebook has not reinstated "confirm your donation" dialog box from previous years.

This is a mad scramble for free money, but with practice and preparation you have a good chance of directing some of it. I managed the full $20k in 2018, but only $12.5k in 2019 due to increasing competition. If I even get one donation matched this year, however, an extra $2,499 to a charity I think is doing really good work is still worth the hassle.

Comment via: facebook


Non-Book Review: Patterns of Conflict

1 декабря, 2020 - 00:05
Published on November 30, 2020 9:05 PM GMT

Soviet MiG-15 (left) and American F-86 (right)

From 1951-53, the Korean war saw the first major dogfights between jet aircraft - mainly the American F-86 and the Soviet MiG-15. On paper, the MiG-15 looked like the superior aircraft: it could out-climb, out-accelerate, out-turn, and generally out-perform the F-86 by most performance measures.

US Air Force colonel and fighter pilot John Boyd, however, would argue that these are the wrong metrics to consider. Winning a dogfight isn’t about out-accelerating the enemy, it’s about outsmarting the enemy. It’s about being able to predict their next move faster than they can predict your next move. It’s about “getting inside their OODA loop” - observing, orienting, deciding and acting faster than they can, making a decisive move before they have an opportunity to respond.

To evaluate the F-86 vs the MiG-15, we need to look at the whole OODA loop - observation, orientation, decision and action. Looking at aircraft performance metrics only tells us about the “action” component - i.e. which actions the aircraft allows the pilot to take. If we look at the other components, the MiG looks less dominant:

  • The F-86 had better visibility, with a full “bubble” cockpit; pilots had fewer blind spots
  • F-86 pilots had more training on average, allowing them to better track the fight in their mind (i.e. “orient”) and make faster decisions

Even in terms of performance, the MiG wasn’t strictly dominant - the F-86 could dive faster, for instance. The MiG could gain energy faster, but the F-86 could lose energy faster - and if the goal is to move in a way the enemy didn’t expect or can’t match, that’s often just as good.

So under this view, the F-86 had the advantage in many ways. An F-86 pilot could see the enemy better, and keep track of what was going on better. Even if the MiG could outmaneuver them in theory, the F-86 pilots could make good decisions faster, leaving the MiG pilots confused and eventually gunning them down. Statistics from the war back up this picture - both sides wildly exaggerated kill ratios in their favor, but however we slice the data, the F-86 beat the MiG-15 at least as often as it lost, and a kill ratio 2:1 in favor of the F-86 is a typical estimate. Even against more experienced MiG pilots, the F-86 traded slightly better than 1:1.

Beyond Dogfights

The real insight is that these ideas apply beyond dogfights. Boyd’s real claim to fame - and the topic of his Patterns of Conflict presentation - is to turn “get inside the enemy’s OODA loop” into the foundational idea of military doctrine, from the tactical level (i.e. maneuvering on the battlefield) to the strategic (i.e. choosing which battles to fight). Patterns of Conflict illustrates the idea through six hours of examples from military history. In the interest of time, I’ll instead use an abstract example.

A group of soldiers in the Blue Army has been ordered to take a hill currently occupied by Green artillery. Unbeknownst to them, it’s a trap: a larger group of Green soldiers hides in some trees near the hill. Ideally, from the Blue perspective, the trap will be seen through and their soldiers will pull back rather than be wiped out. But there’s a lot of ways that can go wrong…

No Observation

The simplest failure mode is that the hidden Greens may not be seen at all, at least not before the Blue forces are past the point of no return. The Greens can camouflage themselves, set up smokescreens, or actively target Blue scouts/surveillance/intelligence.

Lack of information is deadly.

No Integration of Information

A more subtle failure mode is that the Blues may “see” the Green trap, but may not put the pieces together. Perhaps a scout on the other side of the trees notices tracks leading in, but that part of the force is under a separate command chain and the information never gets where it needs to go. Perhaps the Greens are able to interfere with communications, so the information can’t get to anyone in time, or isn’t trusted. Perhaps the Greens leave fake tracks into many different clumps of trees, so the Blues know that there may be some Greens hiding somewhere but have no idea where or whether there’s actually anyone in the trees at all. Perhaps the Blues can’t see into the trees, and know that the Greens like to set up ambushes, but the Blues don’t think about the fact that they can’t see into the trees - they forget to account for the uncertainty in their own map. Or maybe they do account for the uncertainty, but take the risk anyway.

Ambiguity and lack of communication are, from the Greens’ perspective, very suitable substitutes for outright lack of information or deception. Exactly this sort of tactic was prominent in WWII - for instance, the US fielded an entire “ghost army” of 1100 actors and set designers, complete with inflatable tanks and fake radio transmissions.

Inflatable tank. That’s some Scooby-Doo shit right there.No Decision

Now we get into the classic problem of hierarchical management: the Blue attackers notice the ambush waiting in the trees, but they’ve been ordered to take the hill, so they charge in anyway. The people on the ground don’t have the authority to make decisions and change plans as new information arrives, and passing decisions back up the command chain would be prohibitively slow.

At first glance, this looks like simple inefficiency in the Blues’ organization, but the Greens do actually have some methods to create this sort of inefficiency. In order to counter the problem, the Blues need to delegate significant authority to lower-level officers at the front - but this requires that they trust those lower-level officers to make the right decisions. If the Greens pick off competent officers, or place agents in the Blues’ organization, then Blues won’t be able to trust their low-level officers’ decisions as much - so they’ll be forced to rely on more centralized, command-and-control decision making.

This sort of strategy plays a major role in guerrilla warfare, a la Mao. One does not pick off the incompetent people within the enemy’s hierarchy; they are assets. And if the enemy knows you have agents within their organization, that’s not a bad thing - their own attempts to tighten control will leave the organization less flexible and responsive, and the lack of flexibility and responsivity are exactly the weaknesses which guerrilla warfare attempts to leverage.

On the flip side, the German command structure in WWII provides a positive example of an organizational structure which does this well. Officers were generally given thorough and uniform training, allowing them to understand how other officers thought and coordinate with minimal communication. Low-level officers were given considerable decision latitude - their orders came with a “schwerpunkt”, a general objective, with the details left flexible. One good example of this in everyday life: a group of friends is meeting at a movie theater for a new release. If a few are late, the others know to buy them tickets and save seats - they coordinate even without communication.

No Action

Finally, even if the Blues see the Green ambush, pass that information where it needs to go, and decide to abort the attack, they may be physically incapable of calling it off. Maybe the attackers are already in too deep, or maybe lines of communication have been cut between the decision-makers and the attackers.

At the end of the day, nothing else matters unless there’s at least some action we can take which is less bad than the others.

Lessons for Organizations

Going really abstract for a moment: conflict is a game, in which “players” make moves. The point of an OODA loop is to see what move the other players are making, and choose our own best move in response. We want to make our move a function of their move, rather than just blindly choosing a move independent of what the other players are doing. In order to do that, we need to observe whatever we can about other players’ behavior, integrate that information into a model/map to predict what they’re going to do, decide what to do in response, and then act on it. Observation, orientation, decision, action - and it all has to happen fast enough that we can act after we have enough information to figure out the other players’ actions, while still acting either before they do or at the same time.

The “action” parts are usually pretty obvious - we have to be physically able to make a move. The “observation” parts are sometimes overlooked (as in the F-86 vs MiG-15 example, it’s often hard to notice value of information), but also pretty obvious once you think about it. Personally, I think the real meat of OODA loops is in the “orient” and “decide” steps, especially when applying the idea to organizations.

The point of orientation and decision is to make our own actions a function of the actions of others, while still acting before or at the same time as they do. Drawing an analogy to biology: what’s interesting about living things is that they act differently in response to different environments, performing actions which are tailored to their circumstances. Their actions are a function of their environment (including other organisms in the environment). Even single-celled organisms (think e-coli) observe their environment and act differently depending on what they observe.

How do we make organizations behave like organisms? How do we make the actions of the organization a function of the environment? The hard parts of this, in the OODA framework, are orientation and decision. Orientation: information needs to be passed to the right people, and integrated together to answer decision-relevant questions about the environment. Decision: the people who have the information need to decide what to do in order to further the organization’s goals. And the real challenge: for ideal performance, this all needs to be done locally; a centralized command will have neither the speed nor the capacity to integrate all information and make all decisions about everything.

Boyd provides insights on this mainly by thinking about how to break it. How can we make an organization unable to effectively pass around and integrate information? We can cut lines of communication. We can bombard them with noise - i.e. fake information, like the ghost army. We can take hard-to-predict actions in general - keep our options open, maybe flip a coin every now and then. How can we make an organization unable to make decisions based on their information? Make them paranoid - pick off competent people, insert agents, generally force them to rely less on low-level decision making and more on centralized command-and-control. On our side, we want our people to understand each other well and be able to coordinate without extensive communication or centralized intervention - communication is slow and expensive, and central command is a bottleneck.

Moving beyond military applications, incentives are one piece of the puzzle - we want to incentivize individuals to act in the organization’s interest. But I think the bigger message here is that it’s really an information processing problem. How do we get information where it needs to go, without centralized integration? How can distributed decision-makers coordinate without slow, expensive communication? Even the incentives problem is largely an information problem - how do we communicate a general objective (i.e. schwerpunkt) to low-level decision makers, while still leaving them the freedom to make decisions based on new information as it comes up? Even assuming everyone is acting in the organization’s best interest, it’s still hard to pass information and coordinate when there’s too much information (including background knowledge of different specialists) for everyone to know everything all the time. These are the information-processing problems involved in designing an organization to have a fast, efficient OODA loop.