Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 26 минут 24 секунды назад

An Exercise in Rational Cooperation and Communication: Let's Play Hanabi

14 апреля, 2021 - 11:36
Published on April 14, 2021 8:36 AM GMT

Why Play Hanabi?

Hanabi is a game requiring modeling others' minds, communication, and strategy. Its unique challenges and cooperative (rather than adversarial) objective have piqued the interest of the AI community, which recently began using it as a testing environment for AI agents. 

Hanabi is a card game in which 2-5 players must cooperate to put on a dazzling firework display. The cards in the game represent different stages of fireworks. The most basic version of the game has five different colors of fireworks (cards), each of which has five stages, numbered 1-5, that must be set up in ascending order. The players must play cards from their hands to add them to the communal display on the table. The players' score at the end of the game is the sum of the latest stage of each color they successfully deployed; reaching stage 5 for all 5 colors receives the maximum score of 25.

However, a slight wrinkle: players must hold their cards so that they face away from them; no player is allowed to look at their own cards, only those of other players. To play their cards at the right time, each player must rely on clues from their teammates.

I like this game a lot. I also think it is good for people to play in that it forces you to think about how others will interpret your communications when they don't have the same information or perspective that you do. The communication skills and theory of mind required for Hanabi are also good for real life - in handling interpersonal conflicts with people you ultimately want to cooperate with, or in explaining your expertise to a layperson, as some examples. (I believe this strongly enough that I will volunteer myself as a partner for any reader interested in trying the game online; just leave a comment or send a message.)

Hanabi Rules

Instead of or in addition to reading this section, you can watch a humorous explainer video.

To start, shuffle all the cards together and deal 5 cards to each player in a 2 or 3 player game or 4 cards to each player in a 4 or 5 player game. Place the rest of the cards in the middle of the table and set up the 8 clock tokens and three fuse tokens face-up.

Players take turns until the game is over. On their turn, a player must take one of three actions:

  1. Give a hint to another player
  2. Play a card from their hand
  3. Discard a card from their hand

Give a Hint: To give a hint, a player picks one other player, then points to cards in that player's hand that match either a number or color (e.g., "this card is a 5" or "these three cards are red"). They must point to all the cards that match the color or number. This is the only way to communicate card values or colors in Hanabi. To give a hint, the player must flip over one of 8 clock tokens. If there is no token to flip over, the player cannot give a hint.

Example: Bob is holding a red 3, a green 2, a blue 2, and a blue 1. Alice wants to give a Bob a clue. Two of the clues she is allowed to give are "This card is a 3" and "These two cards are blue," while pointing at the corresponding card(s). She is not allowed to say "This card is a two" while pointing to only one of the two 2s; she must indicate both cards if she wants to tell Bob which cards are 2s.

Play a Card: If a player thinks one of their cards is ready to be added to the display, they can announce they are playing a card, and then put it on the table. If it is ready to be added to the display, great! The card is added to its color's pile and the players' score increases by 1. If instead it is too early to play that card, or another copy of that card has already been played, it is discarded and the players lose one of the three fuse tokens. If all three fuse tokens are lost, the display explodes and the players score 0 points.

Example: The game has just begun and there are no cards on the table. On the first turn, Alice tells Bob he is holding a 1. On Bob's subsequent turn, he plays the card Alice pointed out. It is a blue 1, which is added to the display.

Example: In the middle of the game the green fireworks are at stage 3 and the yellow fireworks are at stage 2. Bob is holding a card that he thinks is a green 4, so he announces he is playing a card and puts it on the table. However, Bob's card was actually a yellow 4, which cannot be played before a yellow 3. Bob discards his yellow 4 and one fuse token.

Playing a 5 of a particular color suit completes that color firework and un-flips a clue token as a bonus.

Discard a Card: Players can announce they are discarding a card and then remove one of their cards in their hand from the game. Doing this returns one clue token to its unflipped state, allowing the team to give another clue in the future. Other than playing 5s, this is the only way to un-flip a clue token. Most cards in the game have multiple copies. Since each color suit only needs one card of each value to be played, there are plenty of cards that can be "safely" discarded.

Whenever a player loses a card by playing or discarding, they draw another at the end of their turn. When the deck runs out, everyone takes one last turn, and then the game ends. The game can also end early if the display is completed.

Optional Rules

There are lots of variations on the base game which range from fun to challenging to absolutely diabolical. I will outline the two most common below. 

Rainbow Cards: The game also includes a rainbow suit, which can be included as a 6th suit along the other 5 solid colors. Depending on the desired difficulty, players can choose to have the rainbow cards be their own distinct color along the basic 5, or they can choose for the rainbow cards to match all color clues. This way, a player who is told one of their cards is red, for instance, cannot be sure if the indicated card is red or rainbow until they receive more information.

Black Powder: An expansion to the base game adds a "black powder" suit of cards, which are added to the deck like the rainbow cards. Black cards differ from the other cards in two ways. First, they cannot be indicated by any color clues (e.g. "this card is black" is an illegal clue). Second, they must be played in descending order, instead of the normal ascending order that the other suits use.

Why is this on LessWrong?

Playing Hanabi online over the past month or two taught me lessons about communication. It's also a case study in how an anonymous crowd finds and uses Schelling points. 

People who play Hanabi on a particular site (or in person with the same group of people) will gradually evolve, by group consensus, norms and expectations about how to play. For example, in online settings:

  • Most experienced players will discard from the right side of their hand, and will expect others to do the same. 
  • If a color clue indicates multiple cards, in the absence of any other information, it is assumed that the leftmost card indicated by that clue is ready to be played. 

The benefit of having such norms is hopefully obvious; players can take advantage of "pre-loaded" information to correctly indicate which cards to play while using fewer clues. Most, if not all, of the norms are arguably the "best" norms that could be chosen, often because they are Schelling points in player strategy that arise naturally from the rules of the game. For example, with respect to the above norms: 

  • The "oldest" cards - that a player has held for the longest time - are displayed on the right in online games. If a player's rightmost card were important, other players would have had the most opportunity to indicate that to be the case. Since they chose not to, it is likely safe to discard. 
  • Newly drawn cards are displayed on the left side of a player's hand. Color clues usually mean "play." So if a color clue touches multiple cards, it is most natural to assume it is intended for the newest card that it touches - if your teammate had wanted you to play the older card, they would have said so, instead of waiting until now.

The casual reference to Schelling points is not to be overlooked - anonymous players online approaching Hanabi with a common strategy is not unalike the New York City question.

A highly ranked player will be expected by other highly ranked players to know these informal rules, to the point where not following them is met with confusion (if not outright hostility - it is internet gaming, after all). However, there isn't perfect consensus about particular edge cases of these norms, and there are sometimes situations where perfectly communicating information to another player is simply not possible. With respect to this reality, there are basically two types of players, which I will nickname "Goofus" and "Gallant."

Goofus believes that the correct way to play Hanabi is to perfectly understand and follow the informal rules. If a player he gave a clue to misunderstands it and misplays, Goofus is likely to be confused or angry. To Goofus, to play Hanabi is to execute an algorithm that he has only partial control of. He relies on other players to correctly execute their parts of the algorithm, and feels helpless when they don't.

Gallant understands that what makes communication good or bad is whether or not it is correctly understood by its recipient. If a player he gave a clue to misunderstands it and misplays, Gallant asks himself, "why didn't my clue mean what I thought it meant?" To Gallant, to play Hanabi is to set an objective; the algorithm of clue-giving and interpreting can and must change to reach it.

(An aside: This post was inspired by my experience after I had the poor fortune to be matched with a Goofus yesterday. During our game, he/she gave me a clue that was ambiguous: I could tell that the indicated card was important, but not if it was ready to be played immediately or if it should be saved for later. In fact, he/she had meant "play this card now," but I instead took the risk-averse action and discarded some other card instead of playing. In response the player sent me an angry message, intentionally (I think) lost the game for us, and then wrote a negative comment about me, visible to other players, about how I don't follow the right informal norms of play. I glanced at the player's recent activity: it contained two full pages of the player writing negative comments about people with whom he/she had played Hanabi, stretching back weeks. It struck me as impressive that one can consistently fail at communicating and yet insist that they are communicating correctly.)

I see this as the ultimate "moral of the story" when it comes to Hanabi: the value of good communication lies in whether or not it is properly understood, and ultimately is measured by whether or not it produces the desired behavior in its recipient. 

Don't be Goofus. Be Gallant.

Where to Play or Buy Hanabi

I play Hanabi regularly online (usually with strangers) on Board Game Arena; there are a few other websites with different features available as well. You can also order a physical copy of the game from your favorite online retailer or board game shop. The game components are simple enough that you could make your own set, with some effort.

A benefit of playing online instead of in person is that most online implementations of Hanabi track clues for you, eliminating the memory aspect of the game and allowing you to focus only on the logic and communication. (In my opinion, this is a serious plus.)

Hanabi can be played in less than half an hour, and I recommend it for adults and families with children aged 8-10 and up, or perhaps even younger if they are particularly clever.


Against "Context-Free Integrity"

14 апреля, 2021 - 11:20
Published on April 14, 2021 8:20 AM GMT

Sometimes when I talk to people about how to be a strong rationalist, I get the impression they are making a specific error.

The error looks like this: they think that good thinking is good thinking irrespective of environment. If they just learn to avoid rationalization and setting the bottom-line first, then they will have true beliefs about their environment, and if there's something that's true and well-evidenced, they will come to believe it in time.

Let me give an extreme example.

Consider what a thoughtful person today thinks of a place like the Soviet Union under Stalin. This was a nation with evil running through their streets. People were vanished in the night, whole communities starved to death, the information sources were controlled by the powerful, and many other horrendous things happened every day.

Consider what a strong rationalist would have been like in such a place, if they were to succeed at keeping sane. 

(In reality a strong rationalist would have found their ways out of such places, but let us assume they lived there and couldn't escape.) 

I think such a person would be deeply paranoid (at least Mad-Eye Moody level), understanding that the majority of their world was playing power games and trying to control them. They'd spend perhaps the majority of their cognition understanding the traps around them (e.g. what games they were being asked to play by their bosses, what sorts of comments their friends would report them for, etc) and trying to build some space with enough slack to occasionally think straight about the narratives they had to live out every day. It's kind of like living in The Truman Show, where everyone is living a narrative, and punishing you / disbelieving you when you deviate. (Except much worse than what happened in that show.)

Perhaps this is too obvious to need elaborating on, but the cognition of a rationalist today who aims to come to true beliefs about the Soviet Union, and the cognition of a rationalist in the Soviet Union who aims to come to true beliefs about the Soviet Union, are not the same. They're massively different. The latter of them is operating in an environment where basically every force of power around you is trying to distort your beliefs on that particular topic – your friends, your coworkers, the news, the police, the government, the rest of the world.

(I mean, certainly there are still today many distortionary forces about that era. I'm sure the standard history books are altered in many ways, and for reasons novel to our era, but I think qualitatively there are some pretty big differences.)

No, coming to true beliefs about your current environment, especially if it is hostile, is very different from coming to true beliefs about many other subjects like mathematics or physics. Being in the environment can be especially toxic, depending on the properties of that environment and what relationship you have to it.

By analogy, I sometimes feel like the person I'm talking to thinks that they just practice enough fermi estimates and calibration training and notice rationalization in themselves and practice the principle of charity, then they'll probably have a pretty good understanding of the environment they live in and be able to take positive, directed action in it, even if they don't think carefully about the political forces acting upon them.

And man, that feels kinda naive to me.

Here's a related claim: you cannot get true beliefs about what are good actions to take in your environment without good accounting, and good record-keeping. 

Suppose you're in a company that has an accounting department that tells you who is spending money and how. This is great, you can reward/punish people for things like being more/less cost-effective. 

But suppose you understand one of the accounting people is undercounting the expenses of their spouse in the company. Okay, you need to track that. (Assume you can't fire them for political reasons.) Suppose another person is randomly miscounting expenses depending on which country the money is being spent. Okay, you need to track that. Suppose some people are filing personal expenses as money they spent supporting the client. Okay, now you need to distrust certain people's reports more-so.

At some point, to have accurate beliefs here, it is again not sufficient to avoid rationalization and be charitable and be calibrated. You need to build a whole accounting system for yourself to track reality.

[A]s each sheep passes out of the enclosure, I drop a pebble into a bucket nailed up next to the door. In the afternoon, as each returning sheep passes by, I take one pebble out of the bucket. When there are no pebbles left in the bucket, I can stop searching and turn in for the night. It is a brilliant notion. It will revolutionize shepherding.

The Simple Truth

I sometimes see quite thoughtful and broadly moral people interact with systems I know to have many power games going internally. Moral Mazes, to some extent or another. The system outputs arguments and trades, and the person sometimes engages with the arguments and sometimes engages in the trade, and thinks things are going well. But I feel like, if they knew the true internal accounting mechanisms in that entity, then they would be notably more disgusted with the parts of that system they interacted with. 

(Imagine someone reading a scientific paper on priming, and seeking deep wisdom in how science works from the paper, and then reading about the way science rewards replications.)

Again, I sometimes talk to such a person, and they can't "see" anything wrong with the system, and if they introspect they don't find a trace of any rationalization local to the situation. And if they've practiced their calibration and fermis and charity, they think they've probably come to true beliefs and should expect that their behavior was net positive for the world. And yet I sometimes feel that it clearly wasn't.

Sometimes I try to tell the people what I can see, and that doesn't always go well. I'm not sure why. Sometimes they have a low prior on that level of terrible accounting, so don't believe me slash think it's more likely that I'm attempting to deceive them. 

More often I think they're just not that interested in building that detailed of a personal accounting system for the thing they're only engaging with some of the time, it's more work than it's worth to them, so they get kind of tired of talking about it. They'd rather believe the things around them are pretty good rather than kinda evil. Evil means accounting, and accounting is boooring.

Anyway. All this is me trying to point to an assumption that I suspect some people make, an assumption I call "Context-Free Integrity", where someone believes they can interact with complex systems, and as long as they themselves are good and pure, their results will be good and pure. But I think it's required that you to actually build your own models of the internals of the complex systems before you can assess this claim.

...writing that down, I notice it's too strong. Eliezer recommends empirical tests, and I think you can get a broad overall sense of the morality of a system with much less cost than something like "build a full-scale replica accounting model of the system in google sheets". You can run simple checks to see what sorts of morality the people in the system have (do they lie often? do they silence attempts to punish people for bad things? do they systematically produce arguments that the system is good, rather than trying to simply understand the system?) and also just look at its direct effects in the world.

(In my mind, Zvi Mowshowitz is the standard-bearer on 'noping out' of a bad system as soon as you can tell it's bad. The first time was with Facebook, where he was way in advance of me coming to realize what was evil about it.)

Though of course, the more of a maze the system is, the more it will actively obscure a lot of these checks, which itself should be noted and listed as a major warning. Just as many scientific papers will not give you their data, only their conclusions, many moral mazes will not let you see their results, or tell you metrics that are confusing and clearly goodharted (again on science, see citation count).

I haven't managed to fully explain the title of this post, but essentially I'm going to associate all the things I'm criticizing with the name "Context-Free Integrity". 

Context-Free Integrity (noun): The notion that you can have true beliefs about the systems in your environment you interact with, without building (sometimes fairly extensive) models of the distortionary forces within them.


Auctioning Off the Top Slot in Your Reading List

14 апреля, 2021 - 10:11
Published on April 14, 2021 7:11 AM GMT

Or, A Nicer Way to Commodify Attention

Observation 1: I read/listen to a lot of public intellectuals--podcasters, authors, bloggers, and so on--and I frequently find myself thinking:

Ugh, I hate it when this guy spouts ignorant nonsense about [some domain]. I wish he would just read [relevant book]. Then he’d at least be aware of the strongest counterarguments.

Sometimes a single public thinker will trigger this reaction repeatedly, year after year, apparently never confronting [relevant book].

Observation 2: Sometimes podcasters/bloggers will poll their followers with questions like, “Hey guys, who should I interview next?” or “what book should I review next?”


Having observed these things I wonder if they could be improved by monetary transactions.

For a specific example, it would be cool for David Deutsch to let the highest bidder choose a book for him to read and review. If we're lucky (or spendthrift), we get to see him finally give a considered response to the particular claims in Human Compatible or Superintelligence or Life 3.0.

Less specifically, there are a lot of cognoscenti who command substantial influence while holding themselves to disappointingly low epistemic standards. For example, Sean Carroll is a science communicator who dabbles in a little bit of everything on the side; and although I consider his epistemic standards to be above-average, I can tell he has not read the best of Slate Star Codex. I think if he did read a post such as "Asymmetrical Weapons", there’s a decent chance he would feel compelled to raise the bar for himself.

I have some close friends who sometimes spew (what I perceive to be) ignorant drivel. For some of them, I might be willing to pay a surprisingly high price to see them write a review of a book that cogently challenges their stupid priors. I would pay the highest price for friends that I already know can update on sound arguments, I would pay a lower price to find out if a friend has that ability, and for the attention of the hopelessly obstinate I would not want to pay anything.

Why This Wouldn’t Work
  • It's easy to underestimate how pervasive and sticky those signaling/tribal motivations are. The gains from trade available here might not be enough to overcome the pressure to protect narratives and maintain appearances.
  • There might be perverse incentives. Maybe the cognoscenti want to charge more, so they reduce supply (that is, they read less). Maybe they spout more ignorant nonsense in order to increase the bids.

Those are just off the top of my head. I would appreciate it if someone who understands economics could give better reasons for why this wouldn't work.


Intermittent Distillations #2

14 апреля, 2021 - 09:47
Published on April 14, 2021 6:47 AM GMT

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making (Andrew Critch and Stuart Russell)

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making


A policy (over some partially observable Markov decision process (POMDP)) is Pareto optimal with respect to two agents with different utility functions if it is not possible to construct a policy that achieves higher utility for one of the agents without doing worse for the other agent. A result by Harsanyi shows that for agents that have the same beliefs, Pareto optimal policies act as if they are maximizing some weighted sum of the two agents' utility functions. However, what if the agents have different beliefs?

Interestingly, if two agents disagree about the world, it is possible to construct policies that are better for both of them from that agent's perspective. For example, suppose that Alice and Bob are deciding how to split a cake. Suppose also that the cake is either red or green. Alice believes that the cake is red with 0.9 and Bob believes the cake is green with 0.9. A policy that says "If the cake is red, give it to Alice. If the cake is green, give it to Bob." will be viewed favorably by both of them. In fact, the sum of the utility Alice expects to get and the utility Bob expects to get is greater than can be achieved by any policy maximizing a weighted linear combination of their two utility functions.

Intuitively, when Alice and Bob both agree to the conditional policy, they're betting against each other about empirical facts about the world. More specifically, Alice can be viewed as offering to bet Bob that the cake is red, which Bob readily accepts. In this way, the conditional policy ties the expected utility of the two agents to previously irrelevant facts about their world models, giving them both higher expected utility from their perspectives.

The key result of the paper shows that all Pareto-optimal policies will have an implicit "bet settling" mechanism. One way of thinking about this is that since any difference in empirical beliefs can produce positive-sum bets between agents, a Pareto-optimal policy must implicitly make all such bets between those agents. Loosely speaking, the result shows that any policy that is Pareto-optimal with respect to a collection of agents will maximize a weighted linear combination of "how much does this agent benefit" and "how well did this agent predict empirical observations." Since Harsanyi assumes the agents have the same beliefs about the world, the second component is identical for all agents, so Harsanyi's theorem is a special case of the authors' result.

The result implies that if a contract between parties is designed to be Pareto-optimal, it will tend to "settle bets" amongst empirical beliefs of those parties (provided they have different beliefs.) The authors suggest making this "bet settling" explicit might improve contract efficiency and transparency.


A perspective I've been developing recently is something along the lines of "betting is fundamental." For instance, no Dutch book arguments pin down Bayes' rule as the proper update formula (given some assumptions). If you relax the Dutch books to those that are efficiently computable, you get Logical Induction (sorta).

Abram Demski writes in The Bayesian Tyrant:

It is a truth more fundamental than Bayes' Law that money will flow from the unclever to the clever.

This paper represents another place where betting arises out of seemingly unrelated considerations.

I also appreciate the frequent intuitive motivation of the result.

I have a slight lingering confusion about how the assumption that agents have knowledge about other agents' beliefs interacts with Aumann's Agreement theorem, but I think it works because they don't have common knowledge about each other's rationality? I suspect I might also be misunderstanding the assumption or the theorem here.

AI Governance: A Research Agenda (Allan Dafoe)

AI Governance: A Research Agenda

Alignment Newsletter Summary


This research agenda divides the field of AI Governance into three rough categories: the technical landscape, the political landscape, and ideal governance. The technical landscape roughly seeks to answer questions like "what's technologically feasible?", "how quickly is AI going to develop?", and "what safety strategies are possible?"

The political landscape seeks to answer roughly the same set of questions for governance: what's politically possible? What's the economic/political impact of AI going to be?

Finally, the ideal governance section points out that the previous two sections depend on an understanding of what the "good futures" are going to look like and that we currently don't have a clear picture.

Given that this is already a summary of an entire field, my summary is hopelessly lossy, so I have not made much of an effort to be thorough.


Before reading this, I had a bunch of questions about what the key problems in AI governance were. After reading this, it turns out that the key problems are just the obvious things: what's going to happen? is that going to be good or bad? what do we want to happen? how can we use what we have to get what we want? It's both heartening and disappointing to learn that I wasn't missing anything major.

I expected some sort of broader vision about how the entire landscape fit together, but instead, I just got an endless series of questions. This feeling might be an artifact of how this is a research agenda and so isn't trying to provide answers to questions. If anything, this agenda gave me a much better sense of how tangled the entire field is. Turns out that trying to take actions that have positive consequences a million years down the line is a very difficult problem.

Also, note that I skimmed large portions of this because I was familiar with the problems, so I might have missed key sections that made connections to a broader picture.

Alignment By Default (John S. Wentworth)

Alignment By Default

Alignment Newsletter Summary


To what extent is AI Alignment going to be a big problem? More specifically, if we just made a powerful AI system by pre-training a model on an extremely large dataset, then fine-tuned it on something like "do the thing the human wants", what are the chances that it's going to be aligned?

A number of results in neural network transparency demonstrate that image classifiers/generators seem to learn abstractions like "curve", "tree" and "dog." Since the data we use to train the AI system contains a lot of information about human values, it is likely that a powerful model will learn "human values" as an abstraction that gives it better predictive power. For example, GPT-N will be able to produce compelling moral philosophy papers, a task that is made easier by having a strong conception of human values.

One might hope, then, that the model would learn human values as a useful abstraction during pre-training, then this concept would get "pointed to" during fine-tuning. However, Goodhart's law virtually guarantees that whatever we fine-tune our model on will not be maximally satisfied by human values, so the hope is that pre-training will create a basin of attraction where human values are a good enough proxy for our training metric (which is itself a proxy for human values), that the model ends up using its proxy of human values as a proxy for our proxy for human values.

If this works, what's next? One key task our AI systems will be used for is to build the next generation of AI systems. If our systems are aligned using a learned abstraction of human values, how useful will they be for this task?

We can roughly outline two approaches for building the next generation of AI systems. The first approach relies on rigorous theory and provable correctness. The second approach relies on experimentation and empirical evidence about generalization capabilities. A system aligned using a proxy as a proxy for a proxy does not seem reliable enough to use for rigorous alignment. However, if we condition upon pre-training conveying a robust abstraction of human values and fine-tuning reliably finding that abstraction, then the second, more approximate approach to AI alignment is a lot more promising. In other words, if systems are aligned by default, the alignment problem is easy, suggesting that even shoddily aligned systems can be used to make additional progress.


The author gives 10% to systems being aligned by default, which is around the same range that I give. I think work like aligning superhuman models will give empirical evidence as to whether pre-training conveys some notion of "human values" to a model and how difficult "pointing to" that abstraction turns out to be.

The point where I think the above story for alignment by default is most likely to go wrong is in "pointing to" human values. In particular, I am concerned that agents will become deceptively aligned faster than they will start using their model of human values as a proxy. See Does SGD Produce Deceptive Alignment for further discussion.

Zoom In: An Introduction to Circuits (Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter)

Zoom In: An Introduction to Circuits

Alignment Newsletter Summary


The Circuits agenda seeks to understand the functioning of neural networks by analyzing the properties of individual neurons (features), then analyzing how they combine together into algorithms (circuits). This agenda claims that features are the fundamental units of neural networks, they can be understood, and that they combine into circuits that are themselves understandable. Furthermore, the agenda claims that these features and circuits will be universal across networks, e.g. if a circuit is found that appears to detect curves in one vision network, similar circuits will be found in other vision networks. Other motifs that appear in multiple places are the "union motif", where two features are combined by a union operation, and a "mirror motif", where neurons often have mirror images. For instance, a dog detector might be composed of "leftwards facing dog" and "rightwards facing dog" combined with a union operation.

But how do we know that a "curve detector" is really detecting curves? Couldn't it be detecting something else? The authors offer 7 pieces of evidence.

  • If you optimize an image that maximally activates the "curve neuron", it looks like a curve.
  • If you take the dataset examples that maximally activate the "curve neuron", they all have curves in them.
  • If you manually draw curves, they strongly activate the neuron.
  • If you rotate the examples that strongly activate the neuron, the activation looks approximately bell-shaped, which is what you would expect if the activation was how much the curves lined up.
  • If you look at the weights, they look like a curve.
  • If you look at the neurons that are downstream of the "curve neuron", they typically involve curved things, e.g. circles.
  • If we hand-implement a curve detection algorithm that is based on our understanding of the "curve neuron", it detects curves.
  • The authors also explore high/low-frequency detector neurons and naturally occurring symmetries in neural networks in more detail.

I once heard Anna Salamon describe the "the map/territory game", which is a form of 3rd person mental narrational that involves inserting the phrase "whose map was not the territory" after time you said your own name. For example, I might say "Mark, whose map was not the territory, sat at his computer describing the map/territory game." The point of the game is to repeatedly emphasize that your beliefs about the world are not the world itself.

As I was reading the circuits thread, I found myself making similar mental motions. "The authors, whose map is not the territory, write that this set of neurons are high/low-frequency detectors." Are they? Are they really? How do they know?

Overall, I'm impressed by the rigor of the analysis. The evidence isn't quite overwhelming, but it is highly suggestive. I currently think that the ability to understand neural networks will be critical to building aligned systems, so I'm excited about furthering this work.

In terms of the style of this work, I am reminded of a line from Hamming's famous talk You and Your Research:

One of the characteristics of successful scientists is having courage. Once you get your courage up and believe that you can do important problems, then you can. If you think you can't, almost surely you are not going to. Courage is one of the things that Shannon had supremely. You have only to think of his major theorem. He wants to create a method of coding, but he doesn't know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, "What would the average random code do?" He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. Who but a man of infinite courage could have dared to think those thoughts? That is the characteristic of great scientists; they have courage. They will go forward under incredible circumstances; they think and continue to think.

Imagine wanting to understand neural networks and deciding the best way to do that is to look at the individual weights of the network until you understand exactly what they're doing. Who but a person of infinite courage would have thought that this was possible? And yet it was!

Thoughts on Human Models (Ramana Kumar and Scott Garrabrant)

Thoughts on Human Models

Alignment Newsletter Summary


A common way to assess the ability of software engineering candidates is to have them solve simple coding problems. Imagine two candidates, one that knows they're going to be assessed in this way and one that does not. Intuitively, the results of the assessment are more reliable for the candidate that does not know about it. More specifically, the candidate that doesn't know about the assessment is likely to have applied less optimization pressure to doing well on it, so the results are less the result of adversarial Goodharting. In other words, we want the assessment to provide an independent check on the engineer's performance; if the engineer knows about the check, it is no longer independent.

In machine learning, engineers are careful to maintain a clear separation of the training set and the test set. This separation allows the test set to be used and an independent check for overfitting. If the model was optimized for doing well on the test set, then the check would no longer be independent.

If we're trying to train models to do tasks that humans cannot do easily, we might not have a "test set" in the strictest sense. For example, we might train a predictive model, but we might care about its solution in terms of human approval. How can we get an independent check for performance?

Naively, since we're not training on human approval, any check will be independent. However, the relevant question is whether there's any channel by which the model learns about how it's going to be assessed, not just whether the model is being trained on the assessment metric. Since humans are doing the final evaluation, any method by which the model learns how humans tend to behave will contaminate the check, similar to how a software engineering candidate learning about how the interviewer thinks will also contaminate the interview.

In other words, the capability of human modeling makes it easier to deceive human-judgment-based checks in the same way that knowing the contents of the test makes it easier to game. If we want to be confident our AI systems are aligned, we want its capability profile to be as far away from deception as possible, suggesting we should train systems that cannot do human modeling, for example, STEM AI


I am sympathetic to the view that agents that have no human models will find it very difficult to be deceptive. My main concern is that human models are basically required for models to be performance competitive. For example, the authors mention that it's hard to see how vaguely defined tasks can be automated without some sort of human modeling. Additionally, since the largest datasets are something like "stuff that humans tend to do" it seems very difficult to get large amounts of training data that won't include information that can be used to construct human models.

One can also think of a spectrum of "how much do you restrict the data you give your models?" Having data with information about humans is one extreme, giving the model all of common crawl is the other. Other points on this spectrum are "don't give the model information about itself" and "don't give the model information about deception." However, it seems like if you want data-cleaning to lend any semblance of safety, you have to be on the extreme end. Otherwise, you might as well include as much data as possible for capabilities.


What if AGI is very near?

14 апреля, 2021 - 03:11
Published on April 14, 2021 12:05 AM GMT

Consider the following observations:

  • The scaling hypothesis is probably true: we will likely continue to see great improvements in AI capabilities as model sizes increase.
    • Sutskever mentioned that models currently under development already have dramatic and qualitative improvements (e.g. going more multimodal) over those already made in 2020.
  • AI model sizes are likely to increase very quickly over the short term.
    • NVIDIA’s CEO: “We expect to see models with greater than 100 trillion parameters by 2023". Something 1000x the size of GPT-3, given its already shocking performance, is scary to imagine, to say the least.
    • Even if OpenAI is cautious and will not go ahead with potentially catastrophic projects (dubious), the wide availability and rapidly decreasing cost of efficient hardware, along with publicly available information about how to train GPT-like architectures, means that some organization will achieve scaling.
  • We will likely not be able to solve AI alignment within the next few years, even if AI safety research were to speed up dramatically.
    • Deep learning AIs in the near term cannot conceivably be anything besides not remotely aligned.


  • What takeoff speeds are likely with large deep learning models, if they attain AGI? Are medium takeoffs more plausible due to deep learning type AIs having less obvious “tweaks” or ways to recursively self improve by altering code?
    • The key extension of that question is how will society react to presence of AGI? will it even shut down, stop development & wait for safety?
    • Perhaps takeoff speeds are moot beyond human level, because even a human-level AGI would have the wits to proliferate itself over the internet to computers all over the world and therefore make its eradication impossible once it has come into existence? Therefore once it has been created, it would be a slow but certain, inexorable death?
  • Given short term AGI, what options are available to reduce the risk of existential catastrophe? Would global nuclear war or a similarly devastating event that prevents technological progress be the only thing which could stave off AGI annihilation in the near term?

More serious thought needs to be given to this, to solemnly consider it as a looming possibility. 


Keylogging: Continuous Convenient Cryonics

14 апреля, 2021 - 02:13
Published on April 13, 2021 11:13 PM GMT

If you are like me, you don't really remember further than a year back, and versions of you a decade apart are further from each other than from the most similar other human. In such a case, cryonics doesn't cut it - you'll be dead before you die. I have long hoped that the future will resurrect a continuum of my past selves, so that I might actually make it there. And it sure looks like GPT-5 can do that! The accuracy will merely depend on the available data. And so I have decided to log my thoughts to the extent that my fingers can keep up.

Why identify with future reconstructions?

Definitions for continuity of identity and consciousness are only important to decide which entities to care about. Evopsychologically, humans care because of game theory. Characters in books and dreams and GPT outputs have more claim to consciousness than real-life animals, but the latter's choices actually impact us. Therefore, a wrapper around GPT-5 that acts indistinguishably from a human, that is guranteed to continue running on some server, that does whatever they want on the internet, like making money and friends, should be treated by us as a human with rights. Analogously, you ought to consider a faithful reproduction of your behavior, abilities and access level as yourself.

How is this convenient?

You don't need to invest effort keeping your log organized. Install a keylogger, type anywhere. The rest is software. Cryonics requires money, bureaucracy and is questionably legal. If progress stops, such a log will still come in handy. GPT-4 can match all excerpts against millions of abstracts of papers to show you approaches you missed. Even GPT-3 could be fine-tuned on the log so in your old age, you at least have a caricature of your young self to talk to, one still more accurate than your memories.


The Case for Extreme Vaccine Effectiveness

14 апреля, 2021 - 00:08
Published on April 13, 2021 9:08 PM GMT

I owe tremendous acknowledgments to Kelsey Piper, Oliver Habryka, Greg Lewis, and Ben Shaya. This post is built on their arguments and feedback (though I may have misunderstood them).

I plead before the Master of Cost-Benefit Ratios. “All year and longer I have followed your dictates. Please, Master, can I burn my microCovid spreadsheets? Can I bury my masks? Pour out my hand sanitizer as a libation to you? Please, I beseech thee.”

“Well, how good is your vaccine?” responds the Master. 

“Quite good!” I beg. “We’ve all heard the numbers, 90-95%. Even MicroCOVID.org has made it official: a 10x reduction for Pfizer and Moderna!” 

The Master of Cost-Benefit Ratio shakes his head. “It helps, it definitely helps, but don’t throw out that spreadsheet just yet. One meal at a crowded restaurant is enough to give even a vaccinated person hundreds of microCovids. Not to mention that your local prevalence could change by a factor of 5 in the next month or two, and that’d be half the gains from this vaccine of yours!”

I whimper. “But what if...what if vaccines were way better than 10x? What about a 100x reduction in the risks from COVID-19?” 

He smiles. “Then we could go back to talking about how fast you like to drive.”

In its most extreme form, I have heard it claimed that the vaccines provide 10x reduction against regular Covid, 100x against severe Covid, and 1000x against death. That is, for each rough increase in severity, you get 10x more protection.

This makes sense if we think of Covid as some kind of "state transition" model where there's a certain chance of moving from lesser to more severe states, and vaccines reduce the likelihood at each stage.

I think 10x at multiple stages is too much. By the time you're at 1000x reduction, model uncertainty is probably dominating. I feel more comfortable positing up to 100x, maybe 500x reduction. I dunno.

There is a more limited claim of extreme vaccine effectiveness that I will defend today:

  1. In the case of the Pfizer vaccine (and likely Moderna too), the effectiveness in young healthy people is 99% against baseline symptomatic infection, or close to it.
  2. That we can reasonably expect the effectiveness of the vaccine against more severe cases of Covid to be greater than effectiveness against milder cases of Covid.

(Maybe it's 2x more effective against severe-Covid and 3x more effective against death compared to just getting it at all. Something like that, it doesn't have to be 10x– it'd still be a big deal because more severe outcomes are where most of the disutility lies.)

It's a very simple argument, really. First, the data very clearly suggests effectiveness of ~99% for young people, with nice tight confidence intervals. Second, across all the data we see trends of increasing effectiveness against increasing severity, granted that the confidence intervals are wide in some cases. Third, a very reasonable (imo) mechanistic model supports this interpretation of the data.

The 1.2 Million-Person Pfizer Israeli Observational Study

This observational study matched ~600k vaccinated people 1:1 with ~600k demographically similar controls. It covered the period December 20 to February 1. As far as I know, it is by far the largest Covid-19 vaccine study published to date. The other studies are clinical trials with sample sizes on the order of 20k-40k, and some other observational studies, typically with healthcare workers, in the single-digit thousands.

Why it's not as big as it sounds

Before we get to looking at data, I think it's important to note why uncertainty remains despite the huge N. To start with, the outcomes are all quite rare. Eyeballing it, Israel had a Covid-19 prevalence of ~0.5% during the study period. Out of a million people, a few thousand might be expected to actually catch Covid. Of that few thousand, only dozens or hundreds will progress to more severe forms of Covid. When sample sizes are in the dozens, confidence intervals are wide.

The authors report:

During a mean follow-up of 15 days (interquartile range, 5 to 25), 10,561 infections [6,101 control vs 4,460 vaccinated]  were documented...of which 5996 [2494 vs 2,071](57%) were symptomatic Covid-19 illness, 369 required hospitalization [259 vs 110], 229 were severe cases of Covid-19 [174 vs 55], and 41 resulted in death [32 vs 9].

See Appendix B for complete breakdown of outcomes by period and vaccination status. 

What's more, those numbers are for the entire study period (44 days). Only in a subset of days had participants been vaccinated long enough for it to be a real test. Nominally it is a 1.2 million person study, but practically, when you're looking at results 2 weeks after the first dose (~day 14) or one week after the second dose (~day 28), the numbers are much lower. ~80% lower.

96% of participants received a second dose on day 21 or after; 95% received it before day 24 

All that to say, sample sizes aren't as big as they sound. Well, let's look at the results. This is the main outcome table. Definitions in Appendix A.

It's a bit hard to track trends formatted like that, so here's an equivalent graph:

Left: 1-RR for 14-20 days after 1st dose; 
Middle: 21-27 days after 1st dose; 
Right: 7 days after 2nd dose until end of follow-up

To me, the headline result is that  2nd-dose + 7-days, vaccine effectiveness against Symptomatic Illness is 94% (87-98). Pretty good! Also, efficacy clearly rises from the earlier to later periods after vaccine administration.

Unfortunately, we don't see efficacy improvements moving to the right on the rightmost graph (more severe outcomes), counter to claims of extreme vaccine effectiveness. We do see upwards-right trends in the earlier periods (left and middle graphs). 

Well, 2 out of 3 ain't bad! Too bad the last one is the one we care about most.

It's okay. I've got more. The astute reader will have noted that the above graphs have "subgroup = Full" in their title. The study made available endpoints (outcomes) for multiple subgroups. Below are some of them. (Here are ALL OF THEM.)

Since there was data for it, I also added in the "2nd dose and 6 days after" period.

To provide a sense of changing sample size between time periods: There were on average 221k participants each day, in each experimental group between days 14-20, 160k between days 21-27, and 39k from 28-44

Many of the points are missing. The authors did not compute vaccine effectiveness if the control and vaccine group combined did not have 10 or more instances of an outcome. For example, the Age 16-39 subgroup did not have 10 instances of hospitalization, severe-Covid, or death for almost all of the time periods.

In cases where there are 10 instances of an outcome in the control group but 0 in the vaccine group, a value is reported without confidence intervals, e.g. the dots in the Females Subgroup, 2nd-2nd+6 period.

The up-and-to-right shape of the graphs persists across subgroups, except for Males and when the values are already maxing out near 100% (the right-most graphs). Overall, I think this is suggestive of the general trend that vaccines are more effective against progressively more severe outcomes. I also suspect that an uncertainty modeling that took account of the not uncorrelated neighboring values would shrink the error bars beyond the naive bootstrap method.

I'll comment more on why the flat/missing trend in the rightmost graph doesn't bother me much, beyond the fact that sample size means it's hard for it to show much at all.

Also! If you prefer Tables, here's the top half from the paper itself corresponding to the above graphs:

As someone falling in the Age 16-39 subpopulation, I'm quite pleased to see 99% effectiveness against Symptomatic Infection, with a nice tight 96-100 confidence interval. This is higher than I'd had anyone cite to me, and is approximately a 5x increase in how effective I believe my Pfizer vaccine to be. That's even before we get to more severe outcomes.

So why is the absence of data showing increasing efficacy not evidence of absence?

Because we expect to see data that looks like this even in worlds where the vaccine is 100% effective (at least for all vaguely healthy people). To be evidence against something, it has to be less common in worlds where that thing is true, and that's not the case here.

Why would we see these numbers with a 100% effective vaccine?

"Saturation" and "noise" 

When you have 100 true positives and 3 false positives, the false positives aren't such a big deal. When you have 0 true positives and 3 false positives, the false positives can change the entire picture.

I argue this is very likely what is going on with Covid-vaccine effectiveness, above and beyond sample sizes. 

Consider that PCR Covid-19 tests have both a false negative and false positive rate (FPR). According to this random site I found by Googling that looks legit enough, the FPR for Covid-19 tests is between 0.2% and 0.9%. Let’s choose a point estimate of 1% for the FPR to be safe, but run it twice for every case to compensate. So now our FPR is 0.01% 

Let’s now imagine using this test on a 99% effective vaccine (the same argument holds for 99.9% and 99.99% even more so). We run an RCT with 100,000 people receiving the vaccine and 100,000 receiving placebo. Covid-19 prevalence is 0.1% in the region our hypothetical test is running. 

99 people from the control group catch actual Covid and receive positive test results (we lose one to a realistic false-negative rate of 10%, run twice to become 1%) plus 0.01% false positives for a total of 109. From the treatment group with 100% effective vaccine, we get 1 true-positive and 10 false-positive test results. Our final effectiveness estimate is 1 - 11/109 = 90% 

90%! And that’s from what is in truth a 99% effective vaccine. The control is mostly unaffected by the noise (109 vs actual 100) but the treatment is enormously changed. Instead of 1, it’s 11. 

Of course, the outcomes we’re interested in are hospitalization, severe Covid, and death. I’d expect the false positives on these to be lower than for having Covid at all, but across tens of thousands of people (the Israel study did still have thousands even in later periods), it’s not crazy that some people would be very ill with pneumonia and also get a false positive on Covid. The lower false positives on these outcomes are likely balanced by the much lower rate of them occurring even in the control group.

The greatest noise of all is selection effects

Never mind fluke false positive tests, on priors we have reason to suspect that the people who are still getting severe Covid despite being vaccinated or even hospitalized are very likely not like you. Why isn’t the number of vaccinated people who ended up in critical condition zero?

Because within the Israel Observation study, the vaccinated group contains some very sick people. (In a half-million person group without exclusion criteria against it there simply will be, and that’s before the fact that we know many people are elderly and many explicitly meet risk criteria such as cancer, obesity, pulmonary disease, Type 2 diabetes, etc. See Table 1. Demographic and Clinical Characteristics of Vaccinated Persons and Unvaccinated Controls at Baseline.)

Vaccines work by stimulating an immune response. If your immune system is in tatters, unable to manufacture healthy antibodies or something, your vaccine might not do much. You might fare little better than the unvaccinated.

There’s actually a term for immune system failure in the elderly: immunosenescence. A couple of papers on that topic: Immunosenescence and vaccine failure in the elderly (2009) and Immunosenescence: A systems-level overview of immune cell biology and strategies for improving vaccine responses (2019)

Quoting the abstract from the first one:

An age-related decline in immune responses in the elderly results in greater susceptibility to infection and reduced responses to vaccination. This decline in immune function affects both innate and adaptive immune systems...Essential features of immunosenescence include: reduced natural killer cell cytotoxicity on a per cell basis; reduced number and function of dendritic cells in blood; decreased pools of naive T and B cells; and increases in the number of memory and effector T and B cells...Consequently, vaccine responsiveness is compromised in the elderly, especially frail patients...In the future, the development and use of markers of immunosenescence to identify patients who may have impaired responses to vaccination, as well as the use of end-points other than antibody titers to assess vaccine efficacy, may help to reduce morbidity and mortality due to infections in the elderly.

The last line there suggested antibodies can be present even when a vaccine overall is not protective against disease.

Don’t get old kids, it’s bad for you.

When I see that the treatment group has lower but not zero severe cases than the control group, I assume it’s coming from the very ill, the immunocompromised and immunosenescent.

I am pretty confident that I am not in those groups. I’m pretty sure my vaccine elicited some real response (I had some side effects, not terrible, but some). When I see the effectiveness numbers showing globally that there’s still some chance of really bad outcomes, I adjust them downwards because they were very likely not happening to people with remotely my level of health. 

To invoke the math of the previous section regarding false positives, the immunocompromised might be only 10% to the number of severe Covid patients in the control group (because their prevalence is low), but after the filter/selection effect of vaccination, they make up 95% of severe cases in treatment group.

Here’s a diagram for good measure:

This is a causal diagram. Arrows indicate the direction of causality but not the specific causal relationship. (Having a poor immune system, e.g. because you’re older, might make you more likely to be vaccinated, but of course, in our study we’re conditioning on vaccination status, so it doesn’t matter.)

Since we didn’t condition on immune system health, we can’t expect that a naive interpretation of the table of efficacies (above) tells us the true relationship between vaccines and outcomes.

Well, we approximately can filter. The vaccine efficacy is higher for the Age 16-39yr subgroup (99%, 96-100) vs the entire population (94%, 87-98). Not just that younger people get less Covid, but that the vaccine worked better on them.

The Age 16-39yr subgroup didn't have zero cases of symptomatic disease (it might have zero of hospitalization, etc), but as above guessing those were almost all people who knowable had weak immune systems within the overall healthier group too.

Priors & Trends

In the above section I argued two things:

  • that we see something of a trend towards increasing effectiveness with increasing severity in the Pfizer Israeli mass study
  • Reasons why even if the vaccine was 99.9% effective, we would expect to observe effectiveness data that is lower than this.

In this section, I want to offer 1) a plausible mechanistic model for why this should be true, 2) further indications of increasing effectiveness from other trials and studies.

Shifting the Distribution of “Infection”/Viral Load

Epistemic status: I'm not a biology/virology/immunology person and this feels kinda hand-wavy to me.

This post started with a table that lists a progression of discrete outcomes: Documented Infection, Symptomatic, Hospitalization, Severe Disease, Death...and I’ve been referring to them since. Obviously, the underlying biological reality isn’t quite as discrete as that.

It’s probably more something like there’s a continuous value of how infected you are, and the higher that value gets, the worse your condition will be.

How infected you are is probably a fuzzy thing in reality, but viral load might be an adequate proxy. It’s been documented that viral load varies together with Covid-19 severity. See SARS-CoV-2 viral load is associated with increased disease severity and mortality and Saliva viral load is a dynamic unifying correlate of COVID-19 severity and mortality

The graphs in the second paper particularly make this point. 

(a) Comparison of first recorded saliva viral load between individuals hospitalized for COVID-19 and non-hospitalized individuals within the first 10 days from symptom onset using a two-sided t-test. Comparison of only first recorded saliva viral load measurements amongst (b) moderate and severe disease or (c) alive and deceased individuals throughout the course of disease. 

Presumably, your viral load is the result of competition between the virus replicating and your immune system fighting it. Vaccination gives a significant boost to your immune system. (Cf the oft-cited claim about “4x lower” viral load in vaccinated people)

!! To engage in some inexpert armchair speculation, I’d guess that in the virus-immune system race, about x% of the time the virus gets the upper hand with an infected person and makes them symptomatic (~Level 1), and then in x% of cases where it got to Level 1, the virus wins out again to progress to Level 2 before the immune system can stop it. The virus progresses two levels x^2% of the time. If x is 20%, then overall 4% to get to two levels worse.

At each level, the virus only has an x% of winning out and progressing to the next level. Alternatively, in each time period, there’s some chance, y%, that the immune system will catch up and win. 

In cases where someone has a weak immune system (high x%, small y%), increasing levels of case severity aren’t much less likely than earlier ones. You might get an approx flat effectiveness curve.

But suppose that someone is vaccinated and has a real boost to their immune system. Intuitively, I’d reckon they’re now at x/5% for each stage. For the virus to progress two levels, it’s (x/5)^2, or 0.16% when x=20%.

Maybe it’s a factor 2 or 3 instead of 5, but either way, it’d be a compounding effect. More severity means the virus has to replicate more times, which is more time for the immune system to catch up and beat it, ergo less chance for it to get that bad.

It's tough being a dude

Incidentally, Silva et al (2021) who provided the graphs of viral load immediately above, also had this to say re male vs female:

Y-axis is Saliva Viral Load Log (GE/ml)

The difference in viral loads lines up with the Male subgroup having worse vaccine efficacy than the Female subgroup: 88% (71-98) vs 96% (90-100) against symptomatic infections, 2nd dose+7 . It is also the case that across the world, women live longer and die less often from cardiovascular disease, cancer, diabetes, and chronic respiratory disease (Our World in Data).


Johnson & Johnson & Friends

Although not an mRNA vaccine like today's star, Pfizer's BNT162b2, the J&J clinical trial tracked multiple endpoints that show our hoped-for trend.

  • ~40,000 participants (treatment and control)
  • 65% of cohort is 18-59yr

The design/reporting of the J&J clinical trial differs from others, particularly the large Pfizer observational study. The time periods and outcomes are defined differently. 

To avoid copying multiple tables, I’ve extracted the numbers as I’ve understood them.


Onset at Least 14 Days

Onset at Least 28 Days

 TreatmentPlaceboVE% (95% CI)TreatmentPlaceboVE% (95% CI)Moderate to Severe,
18-59 yrs9526063.7% (53.9-71.6)6619366.1% (53.3--75.8)Moderate to Severe,
>=60 years218876.3% (61.6-86.0)144166.2% (36.7-83.0)Severe /Critical, 
18-59 yrs125276.9% (56.2-88.8)53385% (61.2-95.4)Severe
>= 60 years72875.1% (41.7-90.8)31580.2% (30-96.3)Requiring Medical Intervention21485.7% (37.8-98.4)07100% (31.1-100)
  • Moderate Covid-19: Positive test, any 1 really bad symptom or 2 of regular symptoms like fever, sore throat, cough
  • Severe/Critical: 3+ regular symptoms or  things like respiratory rate ≥30 breaths/minute, heart rate ≥125 beats/minute, oxygen saturation (SpO2) ≤93%, shock, admission to ICU, death
  • Requiring Medical Intervention: hospitalization, ICU admission, mechanical ventilation, and/or ECMO. 

There were deaths in the J&J study, all within the placebo group:

The authors note that all these cases occurred at study sites in South Africa. Hmmm.

Using a somewhat different formula, the authors also report on interim asymptomatic results. They present four different operationalizations of which I choose two, the ones with the highest and lowest efficacy after 29 days. See Table 20 for further detail.


Day 1-Day 29

After Day 29

 TreatmentPlaceboVE% (95% CI)TreatmentPlaceboVE% (95% CI)FAS seronegative at baseline, +PCR and/or serology and did not show signs and symptoms15918212.5% (-8.9-29.7)225459.7% (32.8-76.6)Seroconverted without previous symptoms8418022.6% (-3.9-42.5)103774.2% (47.88-88.6)

Despite the J&J trial using overlapping criteria, both explicitly lumping things and in their criteria, we see the progression we’d expect to see if vaccines work better to prevent worse outcomes than they do milder ones. Modulo confidence intervals, that is.

While efficacy against moderate to severe Covid-19 is 65% (for 18-59 years old), it jumps to 85% for severe alone, granted the overlap in confidence intervals. 0 cases in the vaccine group fell into the Requiring Medical Intervention endpoint, compared to 7 with placebo. It’s as good as we could hope to see with this data.

However, there isn’t a clear jump between “asymptomatic” and “moderate to severe”. Partly because the operationalization isn’t clear (59.7% to 66% is a jump) but there are still wide error bars. The After Day 29 antibody test was conducted at Day 71 and had only been completed for ~30% of participants at the time of publication.

On net, I think the overall endpoint trend lines up with increasing vaccine efficacy against more severe outcomes, but the asymptomatic vs symptomatic doesn't clearly show it, but part of that is I don't really understand the groups or how to interpret them.

Let's get clinical, clinical

Moderna Clinical Trial

The Moderna Phase III Trial tracked Covid and severe Covid but no other end points. There were no severe Covid or deaths in the vaccine group, but 30 severe Covid cases in control and one death. Doesn’t let you compute a precise value, but is consistent with the vaccines being very good.

  • 59% of cohort is aged 18-64 yr and not in any risk categories
  • Further breakdown of participant characteristics in Table 1
  • Asymptomatic Infections and Hospitalizations were not tracked.
  • Table S13 in the appendix provides a very detailed breakdown of symptoms. No severe symptoms at all occured to participants in the treatment group, granted that very few occurred in the control group either.


Pfizer Clinical Trial

  • 58% of cohort is aged 16-55yr, median 52yr, range 16-89
  • 35% have BMI >= 30
  • After the second dose, there 5 cases of severe-Covid in the control group and 1 in treatment, no reported deaths.

It’s nice (for me) to note that the vaccine efficacy, and particularly the confidence intervals, are higher for the 16-55yr group. 95.6% with CI 89.4%-98.6%. Doesn’t prove the main point, but is in line with the immunosenescence model.

No Covid deaths are reported in either placebo or vaccine groups. Table S5 from the appendix least severe-Covid outcomes. A total of 9 after the first dose in control, 1 in the vaccine group for a 89% reduction with confidence interval between 20.1%-99.7%. If we break it down to different time periods (before/after 1st/2nd dose), we end up with confidence intervals (-3800% to 100%). Yes, maybe taking the vaccine will increase your chance of severe-Covid by 39x!

As expected, the data doesn’t show that the reduction in severe-Covid is greater than lesser-Covid, but it also they doesn’t show the opposite either.

As described above, we have to note the greatly reduced sample size in the later periods.

Evidence of Increased Asymptomatic/Symptomatic Ratios

This was already shown in the mass Pfizer study, but several other sources indicate the ratio of asymptomatic-to-symptomatic cases is increased for vaccinated people. In other words, vaccination works better against symptomatic Covid (more severe) than asymptomatic Covid (less severe). This is at earlier side of "severity", compared to hospitalization, severe-Covid, and death, but it suggests the same trend–plus there's more data than when looking at more severe outcomes.

I've copied most of the numbers here form MicroCOVID.org, see their analysis for calculation detail.










Mass Pfizer Study, Day 28+

210 vs 191


31 vs 59


MicroCOVID Moderna/CDC


185 vs 37


11 vs 11



351 vs 182


117 vs 159



248 vs 73


84 vs 57



Pfizer Asymptomatic percentage goes from 48%->65%, Moderna from 17%->50%, J&J from 34%-57%, AstraZeneca 23%->40%

I expect the very different absolute numbers to come from the widely varying study methodologies as much as differences between the vaccines. (AstraZeneca uses home tests, for example.) 

I didn't exhaustively look through all possible sources of asymptomatic vs symptomatic efficacy. I would be very interested if someone had a credible source not showing this trend.

I also didn't scrutinize these calculations much, so I wouldn't be that surprised if it turned out there were deep flaws that undermine the trend seen here.

It gets better

If we go back to the big Israeli Pfizer observational study, we see increasing vaccine effectiveness as more time passes since first/second dose. Unfortunately, the study didn't have enough time/data to show us things two weeks after the 2nd dose.

Fortunately, there was a follow-up. On March 11, Pfizer/Israel Ministry of Health made the following press release:

Findings from the analysis were derived from de-identified aggregate Israel MoH surveillance data collected between January 17 and March 6, 2021, when the Pfizer-BioNTech COVID-19 Vaccine was the only vaccine available in the country and when the more transmissible B.1.1.7 variant of SARS-CoV-2 (formerly referred to as the U.K. variant) was the dominant strain. Vaccine effectiveness was at least 97% against symptomatic COVID-19 cases, hospitalizations, severe and critical hospitalizations, and deaths. Furthermore, the analysis found a vaccine effectiveness of 94% against asymptomatic SARS-CoV-2 infections. For all outcomes, vaccine effectiveness was measured from two weeks after the second dose.

The lack of actual paper makes this a little harder to interpret, but I don’t find it surprising given that (1) at this later date in their roll-out, an even greater proportion of people will be young and healthy, (2) this data is only counting two weeks after the second dose, whereas the previous large observational study only had a “7 days after 2nd dose until end of follow-up” (maximum of day 28 to 44).

And, of no small significance, the Pfizer vaccine appears fully effective against the UK variant. (Yay!!)

If the vaccine is showing 97+% amongst everyone, I would expect that's at least as true when you filter for younger/healthier people and filter out those with comorbidities. 

What I believe

I believe that what I wrote above supports my initial assertions:

  1. In the case of the Pfizer vaccine (and likely Moderna too), the effectiveness in young healthy people is 99% or close to it.
  2. That we can reasonably expect the effectiveness of the vaccine against more severe Covid to be greater than effectiveness against milder cases of Covid.

The initial Pfizer mass study has 99% (96-100) for the age 16-39yr group, and the subsequent follow-up gives 97% for everyone. At baseline for symptomatic cases, we're talking 30-100x reductions, which is hella good.

Further, across multiple studies, vaccines, and outcomes we see trends of increasing effectiveness against progressively severe outcomes. In some cases, we don't definitively see it, but that's easily attributable to lack of sample size and inherent limitations in methodology due to noise and selection effects.

If we're talking 99% against symptomatic cases (100x reduction), then I think it's reasonable to expect at least that for hospitalization, 99.5% (200x). Hence the title, extreme vaccine effectiveness.

What about J&J (and AstraZeneca)? Granted, The effectiveness numbers for J&J look lower than for Pfizer and Moderna, but I think they're higher than MicroCOVID.org's numbers imply. First, we get 85% (61.2-95.4) effectiveness against Severe/Critical in the 18-59yr subgroup. That number matters more. Second, that is a very wide age range. I would bet that restricting it to 18-39 would show an improvement relevant to most of those reading this. Lastly, I suspect all the factors mentioned above (selection effects, noise/saturation) to affect it and make the result lower than it would be otherwise.

On net, J&J might not be quite as extremely effective as the mRNA vaccines, but it's no pushover either.

AstraZeneca isn't on offer in the Bay, and was recently abandoned in my home country of Australia too, so I apologize for not examining it.

Tell me where I’m wrong

I want the case I've made today to be true, but even more than that I want to believe true things (and I certainly don't want people to believe false things because of me). If you think any of this is wrong. PLEASE SAY SO. 

Many thanks!


Ben Shaya's Thoughts

Ben Shaya, one of the people responsible for MicroCOVID.org's models, has been kindly taking time to discuss the topic of vaccine effectiveness with me. 

Here wrote a document arguing Why I think vaccines don’t bring the chance of severe COVID to 0. While that's a stronger claim than I would make, his arguments and models are still valuable when thinking about the topic generally. 

Read them all in his doc, but I'll highlight one that I found very relevant to mechanistic models of Covid severity:

There's is a bit of mechanistic nuance - COVID tends to start with an upper respiratory tract infection - that's when nasal swabs work, and then moves to the lower respiratory tract (lungs) - which is where is screws over your blood oxygen. There's research suggesting that the immune response of the lining of your windpipe is what determines whether the virus reaches your lungs: https://www.biorxiv.org/content/10.1101/2021.02.20.431155v1.abstract?%3Fcollection=

That is to say, there isn't a single "immune response" that determines how bad your infection is; immune response in your upper respiratory tract determines if you get COVID while immune response in your lungs and windpipe determine whether you get a severe reaction. Immune response in your blood correlates with these, but it's not the same as either.

(this comes from Riley Drake, who is a virologists and one of the authors on the paper, who strongly cautioned against reading too deeply into antibody concentrations as a proxy for immunity)

When we treat these three immune systems as separate, we see there are at least 3 hidden models - upper respiratory response, lower respiratory response, and blood response. Of the 3, the side effects to the vaccine only reflect the blood response. The overall efficacy of the vaccine only reflects the upper response.

Further, we know that humans have significant heterogeneity in all thee of these responses, since only some people get severe covid, and some people get lung damage from mild covid, and some people get exposed to COVID and don't contract the virus at all.

Note further that, assuming a strong immune response protects you is also not necessarily true; people with very strong response can cause cytokine storms (which we now know how to handle, but will land you in a hospital).

This gets at more gears in immune response and is the kind of thing that can expose where simple state transition models and immune response as a single thing don't hold up.

What if the vaccine boost one part of the immune system but not another? In that case, you might see the vaccine be very effective against symptomatic and asymptomatic Covid, but not more severe disease. If  the virus makes its way deep into your lungs, and the lungs are protected by a different immune response that isn't helped by the vaccine– then, conditional on having gotten to that point, you might not be better off than a non-vaccinated person.

All this to say we should be cautious in putting too much stock in simple mechanistic models.

What if vaccines are all or nothing?

The model behind the claim of extreme vaccine efficacy is that even if your post-vaccine immune response isn’t enough to stop you getting Covid at all, it should be stronger than it would have otherwise been, and you’ll do better at fighting off severe-Covid. This takes vaccine efficacy as a continuous thing.

But maybe the vaccine is 100% effective against all outcomes! So long as it’s correctly transported and administered, that is. Except sometimes vaccines are left at high temperature for too long, the delicate proteins are damaged, and people receiving them are effectively not vaccinated. If this happens 5% of the time, then 95% of people are completely immune to Covid and 5% are identical to not be vaccinated. Whatever chance they had of getting severe Covid before, it’s the same now.

In this world, not knowing whether your vaccine was a dud or not, post-vaccine you should assume you have a 95% reduction across all outcomes equally. 

I originally found this argument very persuasive. How could assume that the “continuous” model of vaccine-immune response was true? But actually, most things are continuous in the real world, particularly in biology. People aren’t old or young, but somewhere on a continuous measure. Immune response isn’t an all or nothing affair, and some people’s body’s will produce more antibodies than others. Some enough to stymy any symptoms at all, but some only enough to prevent them getting hospitalized.

One person told me that the mRNA vaccines induce 60x the antibodies of a recovered Covid patient (closest source I found), such that even if your response was weaker, it should still be more than powerful to deal with any actual Covid. Therefore, we should take people still getting quite sick with Covid as a sign that some people’s vaccines must not be working at all.

I would be surprised if complete vaccine failure didn't occur some of the time, the question is how much. Suppose we have a vaccine that, when it works successfully, confers 100x reduction (99% efficacy). If it fails to work at all 5% of the time, we'd see a
1-(0.01*0.95 + 1*0.05) = 94% efficacy overall. Something like that could be a big part of what we observe.

Also, Kelsey Piper says that she couldn't find anything about any other vaccines working this way (all or nothing).

Caveats – read before you act

In my long case for the extreme effectiveness of vaccines, there are some topics of crucial practical importance. These imply that maybe you don't want to throw caution to the wind just yet. I'm not sure, I didn't get to looking into these.


Long-Covid is not an endpoint tracked by any of the studies I've looked at. I would think that'd be related to increased viral load and behave like an outcome more severe than just a symptomatic case, but there isn't data for that. Anecdotally, I've heard of a couple of cases where someone experienced mild Covid yet was dramatically affected for months afterwards.

My conservative gut estimate is that your odds or getting long-Covid are reduced by a vaccine by as much as your chance of getting symptomatic Covid at all, but not necessarily any more than that.


The Israeli study showed supreme efficacy even against the much-feared UK variant (B.1.1.7). However, there are fears the vaccines aren't not nearly as good against the South African variant (B.1.351) or Brazilian P.1 variant.

A new study published on MedRxiv last week, Evidence for increased breakthrough rates of SARS-CoV-2 variants of concern in BNT162b2 mRNA vaccinated individuals, state that vaccinated individuals were more likely to contract B.1.1.7 and B.1.351 than unvaccinated controls, suggesting both variants are more resistant to vaccination than other strains. (Caveat: I didn't read the paper in detail.)

...we performed a case-control study that examined whether BNT162b2 vaccinees with documented SARS-CoV-2 infection were more likely to become infected with B.1.1.7 or B.1.351 compared with unvaccinated individuals. 

Vaccinees infected at least a week after the second dose were disproportionally infected with B.1.351 (odds ratio of 8:1). Those infected between two weeks after the first dose and one week after the second dose, were disproportionally infected by B.1.1.7 (odds ratio of 26:10), suggesting reduced vaccine effectiveness against both VOCs under different dosage/timing conditions.

Not good.

The CDC's CovidTracker page actually has some nice dashboards for tracking variant proportion, though I haven't looked into the data quality. Their brief is helpful too.


There's a breakdown by US state too. I'm in California and am pleased to see that currently, B.1.351 is only 0.3% of the Covid cases and P.1 is 1.6%

Of course, if B.1.351 and friends are resistant to vaccination, we will see them rise in prevalence.

This needs more investigation. Without looking into it more, the sensible strategy would be something like act according to your local prevalences and beliefs about how vaccine-resistant the different strains are. Right now the suspicions are on B.1.351 and P.1, but they're uncommon in California (0.3% and 1.6% respectively).

If you've got the time and skill to look into this more, please do, I can provide you money and glory.

Spreading to the Unvaccinated

Even if the vaccine protects you 200x against more severe outcomes, that doesn't help the unvaccinated if they catch Covid from you when you had an asymptomatic or mild case. This means that until such time as those you interact with most are vaccinated, you might want to be more conservative in your microCovids.

MicroCOVID.org calculated reductions in contagiousness for vaccinated people (10x reduction for Pfizer/Moderna, 3x for J&J), and that's what I'd stick to myself if interacting with the unvaccinated. (But hey, California is just about at universal eligibility, now is the time!)


Modified bases in mRNA vaccines against Covid-19

13 апреля, 2021 - 16:00
Published on April 13, 2021 1:00 PM GMT

What are the modified RNA bases in mRNA Covid vaccines and/or how are the metabolised in the human body ? Have you heard about any safety issues with these modified bases ?

For Pfizer, I know, that the modified bases are 1-methyl 3'-pseudouridine. From there, wikipedia directs me to the page on pseudouridine, which is naturally present in tRNA. However, it is not the same chemical. I do not know how easy or difficult for the human body is to remove the methyl moiety.

For Moderna, I do not know what are the modifications at all.

The reason I am worried: I have seen an unsourced claim on facebook, that Moderna has had a problem getting approved several of their products, because of the toxicity of the modified bases. Maybe it is a failure mode, letting an unsourced claim start a process of worrying and searching the information on the modified bases. But the process has already started in my head... I would very much appreciate some details, if you guys know them. (Apart from the obvious info, that the clinical trials did not show any short od medium- term problem).


How do you deal with decision paralysis?

13 апреля, 2021 - 09:01
Published on April 13, 2021 6:01 AM GMT

I have too many choices in too many things and I can see that it's crippling my ability to get things done. 

I'm struggling a great deal now to come up with plans and stick with them. I know choices are bad but I'm not sure what to do about it. 

I think the solution from this video on the paradox of choice is to lower expectations but how do you actually do that, routinely?


Wanting to Succeed on Every Metric Presented

12 апреля, 2021 - 23:43
Published on April 12, 2021 8:43 PM GMT

There’s a tendency to want to score high on every metric you come across. When I first read Kegan’s 5 stages of adult development, I wanted to be a stage 5 meta-rationalist! Reading the meditation book “The Mind Illuminated” (TMI), I wanted to be stage 10 (and enlightened and stage 8 jhana and…)!  I remember seeing this dancer moonwalk sideways and wanting to be that good too! 

This tendency is harmful.

But isn’t it good to want to be good at things? Depends on the "things" and your personal goals. What I’m pointing out is a tendency to become emotionally invested in metrics and standards, without careful thought on what you actually value. If you don’t seriously investigate your own personal preferences and taste, you may spend years of your life invested in something you don’t actually care about. By adding this habit of reflection, you could become much happier than you are right now.

[Note: I believe most people are bad at figuring out what they actually value and prefer. For example, I thought skilled pianists are cool and high status, but when I actually became decent enough to wow your average Joe, being cool in those moments wasn’t as cool as I thought it would be. As they say, “Wanting is better than having”.]

There’s a difference between wanting to score 100’s/all A+’s and scoring well enough to get a job. There’s a difference between reading multiple textbooks cover-to-cover and reading the 40% or so that seem relevant to your tasks.  There are tradeoffs; you can’t optimize for everything. When you perceive a metric that makes you really want to score highly on, nail down the tradeoffs in fine-grained details. What about this do you actually care about? What’s the minimum you could score on this metric and still get what you want? What do you actually want? Speaking out loud or writing this out is good for getting an outside view and notice confusion.

Noticing this pattern is half the battle. To make it concrete, here are examples from my life:

Running - I ran cross country and track for 3 years, but then I realized I don’t enjoy running long distance. Later I found out that sprinting is fun! If I was better at knowing my values, I could’ve just played ultimate frisbee with friends instead.

Dancing - I used to imagine dancing at weddings and such and looking really cool! I remember being really self-conscious and slightly miserable when I did dance in front of others. Trying to impress people is disappointing (and trying to be cool is so uncool). Now I value dancing because it’s fun and a good workout; I don’t worry about recording myself and consistently improving or dancing hypotheticals.

Kegan’s 5 stage development - I used to want to be stage 5, and I remember reading lots of David Chapman’s work to figure this out. I believe I benefited from this, but I ironically would’ve understood it better if I considered my values better. Now I value it as a useful framing for how large segments of people interpret the world. [See? I pointed out that it’s just another system with its own set of limits. I’m a cool kid now, right?]

Meditation - Becoming enlightened or TMI stage 10 sounded really cool! I’ve spent 100’s of hours meditating now, but I would’ve been much better off if I crystallized in my head the skills being optimized and how improving those skills improved my life. It wasn’t the “wanting to be enlightened prevented becoming enlightened” trope, but optimizing for a fuzzy “enlightened” metric was worse than more tractable metrics with clear feedback.

What I value now from meditation is being happier, accepting reality, being okay with metaphysical uncertainty (not freaking out when realizing I can’t directly control all my thoughts, or noticing my sense of self being constructed), and maintaining awareness of context, all of which are much clearer metrics that I actually care about.

Grades - I wanted all A’s and to work my hardest on every assignment, wasting a lot of time I could’ve spent elsewhere! Afterwards, I learned to do just enough to graduate and signal with my GPA that I’m a hard worker/smart. [Once, I missed my final exam where I needed a 60 to keep an A, dropping me to a C. I thought it was hilarious. Thanks Nate!]

People’s Opinions - I used to be emotionally affected by most everybody’s social rewards and punishments (i.e. attention and praise vs ignoring and criticism). I’ve felt awful and disappointed so many times because of this! I’ve come to realize that I actually only care about <10 people’s opinions, and they all care about me and know me well. [Note: this is separate from taking someone’s thoughts into consideration; I couldn’t think of a better word than “opinions”]

The post that prompted this was Specializing in problems we don’t understand. Great post! I noticed the compulsion to work on this problem immediately without considering my current context and goals, so I wrote this post instead.

Topics people in this community may benefit from re-evaluating are:

  • Existential AI risks and other EA areas. Not just whether or not you actually want to work in these fields, but also “do you actually enjoy pursuing it the way you are currently pursuing it?” 
  • Reading text books cover-to-cover and doing all the exercises 
  • Writing posts and comments in this forum in general

So… do you feel compelled to succeed according to the metric I’ve presented?


Using Flashcards for Deliberate Practice

12 апреля, 2021 - 22:07
Published on April 12, 2021 7:07 PM GMT

Interleaving Deliberate Practice

One criticism of using automated flashcard systems, like Anki, for conceptual topics, like math or physics, is that they don't offer you an easy way to practice solving problems. In theory, you might be able to use some flashcard platforms to automatically generate new problems to solve. However, this seems like a complicated, time-consuming, and ongoing challenge.

An alternative idea is to create flashcards that tell you what exercises to solve from a textbook. Such a flashcard might read:

"Solve 5 problems from Chapter 1.1 pg. 10-13 of Zill's First Course in Differential Equations textbook."

Assume that the spacing of flashcard reviews starts at about 4 days apart, and increases by about 2.5x each time. If so, you'd see that flashcard about 4 times in the first two months, 6 times in the first year, and 9 times over 15 years.

Imagine you wanted to spread your practice out over 15 years at increasing intervals, for a total of 9 reviews. Also, imagine there were 60 problems in Chapter 1.1. In that case, you'd want to assign yourself 60 ÷ 9 ≅ 7 problems every time that flashcard came up.

Since you'd most likely not want to do all the easy problems first and leave the challenge problems for 15 years later, you might want to refine this by requiring that you work on every 9th problem. Hence, on your first review, you'd do problems 1, 10, 19, 28, 37, 46, and 55 from Chapter 1.1. On the second review, you'd do problems 2, 11, 20, 29, 38, 47, and 56.

As you go, you'd need to keep a record of which problems you'd solved. You could do this in several ways:

  1. Keep track in the flashcard itself, by editing it with the next set of "to-do" problems so that you'll know just what to do when it flashes up.
  2. Mark off the problems in a physical book, or use a highlight annotation in an e-book.
  3. Keep a physical document, such as a notebook, where you work the exercises for a specific subject so that you can look up what you did last time you worked on Chapter 1.1.
  4. Keep a separate document, like a text file, that notates just what you have to work on next.

Of all these options, I like #1 the best. This whole strategy presumes you'll keep track of your flashcards long-term, and this way you don't have to worry about losing any other document. It's built into your workflow.

You could also use this for non-textbook-based forms of learning. You might have lots of physical tasks you wanted to learn about. Look up a website that gives lists of applied projects, and factor the projects that look interesting into flashcards:

Microscope projects

Bioinformatics and programming

Computer programming projects

Linux system administrator projects

Graded piano pieces sorted by difficulty

Photography projects

Core strengthening exercises

Learning what you could learn

Problems With This Approach

There's often a premium on getting projects done quickly. It costs time to switch between tasks, and there's a reward for finishing. Certainly, this method is not appropriate for work-related tasks.

Flashcards may not be the best way to randomize projects. For example, you might want to interleave 50 different grade 1 piano pieces for your first year of learning the piano, but then abandon them when you move up to grade 2. To do that, you might want to have a special deck for "piano" where you insert one flashcard for every passage of a piece you've learned. Perhaps you have one flashcard for every 8-16 measures, for example.

If you practice one hour per day, you'd divide that hour by the number of flashcards you have to do, to allocate the amount of time per flashcard. Then you'd select "again/hard/good/easy" depending on how that passage was sounding at the end of those few minutes of practice.

More challenging is when a task doesn't lend itself to factoring into flashcards very easily. For example, solving a bioinformatics problem on Rosalind.info often requires a lot of programming and creative mathematical thinking. They also take long enough in many cases that you wouldn't want to solve the entire project in a single day. Yet you finish an individual project quickly enough that you wouldn't want to be returning to it 15 years later.

It might therefore be better to use a simple randomizer to pick a project category. Imagine you wanted to learn 50 diverse skills over several years. Each project used to practice those skills requires at least several hours, if not several days. Some of them "stack."

In that case, you might want to put all your possibilities on the rows of a spreadsheet. Then pick a row using a random number generator to select the next skill. On that row, you'd include a link to a list of projects for that skill. Then use the random number generator again to pick a project that's at your skill level.


How long would you wait to get Moderna/Pfizer vs J&J?

12 апреля, 2021 - 21:29
Published on April 12, 2021 6:23 PM GMT

My assessment:

  • People who can easily continue to guard against significant COVID risks for several weeks without much downside other than quality of life should wait several weeks for Pfizer or Moderna.
  • (People who have to expose themselves to a non-trivial amount of COVID risk no matter what should take the J&J vaccine if they'd have to wait several more weeks for Pfizer / Moderna. I haven't run the numbers on this, but at some point I'd expect the additional risk exposure from being unvaccinated for several weeks to outweigh the difference in efficacy.)

My rationale:

1. Moderna and Pfizer provide significantly better protection against non-severe infections.

  • J&J provides:
    • 28 days after the injection: 66% protection against moderate to severe COVID infections (72% "in the United States", but I don't know to what extent that is robust to further spread of foreign variants in the US) and 85% against severe disease [1][2]
    • 48 days after the injection: 100% protection against severe COVID [3]
  • Moderna and Pfizer provided 94.1% and 95% protection against any symptomatic infections generally after the 2nd dose [1]

2. I'd expect that even non-severe infections increase your risk of long-term lingering effects (in addition to being fairly unpleasant in the meantime, but I'm less concerned about that). 

  • I don't have great evidence for #2 yet. While mild infections have a non-trivial risk of long COVID [4], it seems like even initially asymptomatic cases account for about a 3rd of long-COVID cases [5]. I would hypothesize that the risk of long COVID is significantly less for asymptomatic cases than for symptomatic cases, but haven't researched that much yet.

Zvi said in his 2/4 COVID post, "I’d pay a substantial amount to get Pfizer or Moderna instead of J&J if I could get either one today, but given the choice between waiting and taking what’s available, I will happily accept the J&J vaccine now rather than hold out for Pfizer or Moderna."


What am I missing? Is this just a difference in the weight we place on resuming higher-risk activities sooner rather than later? Or am I overplaying the superior efficacy of the Pfizer and Moderna vaccines?






[1] https://www.statnews.com/2021/02/02/comparing-the-covid-19-vaccines-developed-by-pfizer-moderna-and-johnson-johnson/

[2] https://www.fda.gov/media/146265/download?fbclid=IwAR1eMK87XHF7ibOLoojaoH-ZFPNgTuOLqgkqur9D1SCtSGbrQj3A3VT5C5k 

[3] https://twitter.com/VirusesImmunity/status/1355149007220310019 

[4] https://www.webmd.com/lung/news/20210111/even-mild-cases-of-covid-can-leave-long-haul-illness-study-shows#1

[5] https://www.pharmacytimes.com/view/study-many-long-haul-covid-19-patients-were-asymptomatic-during-initial-infection 


D&D.Sci April 2021: Voyages of the Gray Swan

12 апреля, 2021 - 21:23
Published on April 12, 2021 6:23 PM GMT

You were prepared for gratitude, a commendation from the Admiral, your own department, parades in your name. You were also prepared to hear that your ‘list of helpful suggestions for ensuring supply ships survive random encounters’ was an impudent insult to the collective intellect of High Command, and receive a public execution for your trouble. What you weren’t prepared for was what happened: being allocated a modest stipend, assigned to a vessel, and told that if you’re so clever you should implement your plans personally.

You have 100gp to spend, and your options are as follows:

InterventionCostCoating the underside of the ship in shark repellent would ensure that no journey would feature shark attacks; however, Vaarsuvius’ Law (“every trip between plot-relevant locations will have exactly one random encounter”) means something else would attack instead.40gpYou’ve given up trying to understand what it is about woodwork that makes its practitioners so good at fighting Crabmonsters, but your findings are undeniable: arming the ship’s carpenters would halve the damage done by Crabmonster attacks.20gpOffering tribute to the Merpeople would ensure they won’t attack the ship, similar to the effect of shark repellent.45gpThere’s enough space in the lower decks to add up to twenty more oars, so when fleeing is the best option, the entire crew can work together to escape. Each extra oar would decrease the damage done by Krakens and Demon Whales by 2%.1gp/oarYou wouldn’t think these ships could fit more artillery, but clever ergonomics allow you to add up to three more cannons. Your studies suggest each cannon would reduce the damage suffered in Nessie and Pirate attacks by 10%.10gp/cannonArming the Crow’s Nest with state-of-the-art rifles would give lookouts a 70% chance of ensuring a given Harpy attack does no damage.35gpGiving the deck crew novelty foam swords to wield alongside their standard-issue cutlasses would improve their effectiveness when fighting Water Elementals, reducing the damage these creatures do by 60%.15gp

You’re completely confident in the effectiveness of your ideas, but much less confident that you know which combination would make the best use of your limited budget. To investigate this angle, you’ve procured a record of random encounters encountered by the ships travelling your assigned route; unfortunately, it’s missing some important information for the ships that sank, due to everyone who could fill in those details being dead.

As you board the Gray Swan (why do they give these ships such charmingly unique names when they’re all built and operated identically?), it occurs to you that this might have been intended as an execution after all. The dataset suggests that without any of your clever plans, the survival rate for a journey along your route is a little below 90%, and the Gray Swan is scheduled to make ten trips – five northbound voyages, five southbound – in quick succession. Hopefully this indicates nothing more than your superiors wanting to test your interventions very very thoroughly.

Your top priority is to save your skin. Secondary priorities are minimizing total damage taken and spending as little gold as possible, to impress High Command and return to their good graces.

What will you do?


  • As a passenger, you’ll be kept away from any fights, but the Gray Swan has no lifeboats; keeping the ship from sinking is necessary and sufficient to ensure your survival.
  • Ships are fully repaired every time they make port.
  • Interventions stack such that two 10% reductions are equivalent to one 20% reduction.
  • Each journey takes a month; it is currently Month 5, Year 1406.)

I’ll be posting an interactive letting you test your decision, along with an explanation of how I generated the dataset, sometime next Monday. I’m giving you a week, but the task shouldn’t take more than a few hours; use Excel, R, Python, a priori knowledge, or whatever other tools you think are appropriate. Let me know in the comments if you have any questions about the scenario.

If you want to investigate collaboratively and/or call your decisions in advance, feel free to do so in the comments; however, please use spoiler tags or rot13 when sharing inferences/strategies/decisions, so people intending to fly solo can look for clarifications without being spoiled.


Post-COVID Integration Rituals

12 апреля, 2021 - 19:54
Published on April 12, 2021 4:54 PM GMT

I'm writing this because I am realizing we're at the cusp of the end of a very long "retreat." 

At MAPLE (Monastic Academy), we go on retreat once a month, and so we exit retreat once a month. 

This can be quite a shock to the system, so we deliberately set aside a day for integration purposes, where we slowly transition out of retreat into our normal work period. 

For those of you who are chomping on the bit for in-person, social interaction, I would advise using intentionality about how you go about this. 

It might be good if people made deliberate efforts to emphasize integration, as part of their transition process. It might be worthwhile to think through how you want your "integration rituals" to go. 

Consideration for Rituals 

I advocate for rituals—they seem to be an unintended victim of our modern age. We have forgotten about them. They hold no space in our minds as a thing to value. In the past, we wouldn't really have to think about our rituals; they would just happen as part of traditions passed down to us. Since we are in somewhat of a "tradition-scarce" age, it takes activation energy to prioritize things like intentional rituals. 

However, this pandemic has meaning to people on every level. To the individual, to the family, to the group house, to the community, to the workplace, to the globe. 

And so it is obviously a meaningful event for it to end. For the enforced retreat to come to a close, however gradually or unevenly the vaccination phase may take place across groups, states, and nations. 

What kinds of meanings would you want to symbolically acknowledge? What would help your body, heart, and mind through this transition? What things need to be set aside now? What things need to be seeded and fostered now? What things need to be remembered? What things need to be envisioned for the future? 

Who should be involved? Who do we acknowledge? Perhaps our own selves, having gone through this process. Perhaps the people we've lost. Perhaps the new life that has emerged. Perhaps the people we've been together with through this period. Perhaps the people we now are able to connect with in person. 

Who is coming back together for this integration? Who is the new "we"? Who are we now? 

The Integration Period

The other consideration here is how to integrate in such a way you don't: 

  • Burn out on too much social interaction all at once
  • Get too disoriented 
  • Destabilize or harm others
  • Rush unskillfully or heedlessly forward
  • Stay stuck in a dead past with old assumptions

I'd consider doing an integration event with other people—the people you love or the people you plan on communing with. 

Integration events can include:

  • Shared meals with intentional topics of conversation
    • This can include norms for taking turns or only one person speaking at a time
  • Circling or other group practices
  • Fire rituals / letting go rituals
  • Singing or chanting together
  • Sharing intentions about the future
  • Creating accountability or sharing commitments; each person shares how they want to show up in the next day, week, or month* 
  • Sharing stories or expressions* 
  • Each person takes X minutes asking for something from the whole group (e.g. touch, listening, fun group activity, etc.)* 
  • Each person shares a song with the group (using Spotify or something), and everyone listens
  • Whatever other ways you come up with to create common knowledge about what just happened, where we are, and where we're going

*I have specific examples on how to do these in particular. Feel free to ask me for details.

Consider how you can facilitate workplace integrations to occur. This may require some leadership on your part. How can you lead the way forward in a way that promotes emotional health in the workplace? How do you lead people through a group process that allows people to settle in, to orient, and not "pretend" like everything is the same as it was? Because it probably is not the same as it was! But also not everything is different, either. What is the same? What is different? Honor both. 

People will need time to orient, adjust, and stabilize. 

Even if some people are quick and others are slow, everyone should acknowledge this is a group process. This isn't something anyone is going through alone. I'd emphasize interconnection and interdependence, while honoring the individuality and uniqueness of each person's experience. 

We should acknowledge that some people may have quite a difficult time with this transition—they may need extra care and attention. 

Please consider how to best take care of yourselves and others. This may be a confusing and destabilizing phase. Or it may lead to a manic release of energy that results in regrettable actions and breaking things unnecessarily. Can we leave room for certain mistakes while minimizing harm? 

If you have leadership capacity, or space for holding others, now seems like a good time to be available. 

I seem personally more worried about "too fast" than "too slow." People moving too quickly to the next phase without reflection. People pretending that things are "back to normal." People acting like everything is OK or that nothing significant happened. People wanting to escape into work, activity, social energy, mania. 

"Too slow" or "getting stuck" may also be quite a problem however. Like people who've lived in a cave emerging into bright light—if they get stunned or pained, they might curl back into their caves or get frozen in immobility. Residual fears may create a more paranoid atmosphere, a mistrusting atmosphere. It may take time to rebuild social cohesion, comfort, trust. 

Whatever ways people can stay connected to themselves and each other seems good. Whatever ways you can cultivate a sense of calm ease, appreciation, and purpose seems good. 


You may want to deliberately take time to appreciate this past period of time. No matter how hard it was or how miserable it was. It's generally a healthy approach to find ways to appreciate your lives, your efforts, and that of others. I recommend taking time (individually or as a group) to do gratitude rituals. 

This post doesn't contain that many specific prescriptions on how to go about this. It's more a way to open up the conversation in case you haven't already done that. 

I think many, if not all, of us will need to step into our own forms of leadership in this whole process, and this may be a good opportunity for practicing leadership qualities and skills. I believe it is needed now, so I invite you to take the opportunity. 


A New Center? [Politics] [Wishful Thinking]

12 апреля, 2021 - 18:19
Published on April 12, 2021 3:19 PM GMT

Political polarization in the USA has been increasing for decades, and has become quite severe. This may have a variety of causes, but it seems highly probable that the internet has played a large role, by facilitating the toxoplasma of rage to an unprecedented degree.

Recently I have the (wishful) feeling that the parties have moved so far apart that there is "room in the center". The left is for people who are fed up with the extremes of the right. The right is for people who are fed up with the extremes of the left. But where do people go if they've become fed up with both extremes?

The question is: how would the new center work? There's not room for a new political party; plurality voting makes that too difficult, because if the new party doesn't gain more than 1/3rd of the vote, it's basically a wasted vote.

Here is my proposal for what it could look like:

  • Rather than operating as a traditional political party, New Center would attempt to be a formalized group of swing voters: it makes recommendations about which candidates from other parties to vote for. Given how some elections are consistently very close (most notably, the US presidential election), New Center might be able to achieve a kingmaker status even with only a relatively small portion of voters.
  • In order to accomplish this, New Center has to make recommendations which credibly represent centrist values (and only centrist values).
  • The New Center needs a strong set of criteria by which it judges politicians. These criteria must be based on a critique of the extreme left and the extreme right, to capture people's frustrations with both sides.
  • Registering with the movement might involve pledging your vote to their recommended candidates. In return, registering might give you a voice in the selection process.
    • For example, New Center candidates might be selected by New Center members rating other party's candidates on each New Center criterion. Of course this process is easily manipulated by rating your favorite candidates highly on every criterion; however, arranging the ballot this way nudges people to judge honestly. Also, strategic voting here isn't that bad provided that those who join are actually pretty centrist in the first place.

It might also be good for the initial set of criteria, or at least the rhetoric, to appeal to moderate libertarians as well, since that's a pre-existing group which considers its issues to be orthogonal to the usual political spectrum. I would personally think the core values of the new center should resemble Scott Alexander's take on classical liberalism:

So let’s derive why violence is not in fact The One True Best Way To Solve All Our Problems. You can get most of this from Hobbes, but this blog post will be shorter.

Suppose I am a radical Catholic who believes all Protestants deserve to die, and therefore go around killing Protestants. So far, so good.

Unfortunately, there might be some radical Protestants around who believe all Catholics deserve to die. If there weren’t before, there probably are now. So they go around killing Catholics, we’re both unhappy and/or dead, our economy tanks, hundreds of innocent people end up as collateral damage, and our country goes down the toilet.

So we make an agreement: I won’t kill any more Catholics, you don’t kill any more Protestants. The specific Irish example was called the Good Friday Agreement and the general case is called “civilization”.

So then I try to destroy the hated Protestants using the government. I go around trying to pass laws banning Protestant worship and preventing people from condemning Catholicism.

Unfortunately, maybe the next government in power is a Protestant government, and they pass laws banning Catholic worship and preventing people from condemning Protestantism. No one can securely practice their own religion, no one can learn about other religions, people are constantly plotting civil war, academic freedom is severely curtailed, and once again the country goes down the toilet.

So again we make an agreement. I won’t use the apparatus of government against Protestantism, you don’t use the apparatus of government against Catholicism. The specific American example is the First Amendment and the general case is called “liberalism”, or to be dramatic about it, “civilization 2.0”

Every case in which both sides agree to lay down their weapons and be nice to each other has corresponded to spectacular gains by both sides and a new era of human flourishing.

The classical-liberal rhetoric of the new center might be very similar to counterweight, except that counterweight only combats the extremes of one side (as expressed vividly by their name), rather than extremes on both sides.


On Falsifying the Simulation Hypothesis (or Embracing its Predictions)

12 апреля, 2021 - 13:50
Published on April 12, 2021 12:12 AM GMT

Disclaimer: This is my first post on this website, I tried to follow the proper etiquette, but please let me know if something is off.  :)

Briefly about me: former academic (PhD in theoretical physics, quantum black holes, string theory, information paradox) turned entrepreneur (currently building a company in the AI/Robotics space).


A widespread belief surrounding the Simulation Hypothesis (SH) is that being or not being in a simulation doesn't really have any implication for our lives. Or equivalently, SH is often criticised as unscientific and unfalsifiable, since no definite universal testable predictions have (so far) been made. By universal prediction I mean a prediction that all (or at least a very large part) of the simulations must make. 

In this post I would like to challenge this view by noticing that in the space of all simulations some families of simulations are more likely than others. Having at least the rough behaviour of the probability distribution over the space of simulations then allows us to extract probabilistic predictions about our reality, therefore bringing SH in the realm of falsifiable theories. Of course there will be some assumptions to stomach along the way. 

The whole line of reasoning of this post can be summarised in few points: 

1- We are equally likely to be in one of the many simulations.

2- The vast majority of simulations are simple.

3- Therefore, we are very likely to be in a simple simulation.

4- Therefore, we should not expect to observe X, Y, Z, ...


I will now expand on those points.


1- We are equally likely to be in one of the many simulations.

First of all, let's assume that we are in a simulation. Since we have no information that could favour a given simulation, we should treat our presence in a given simulation as equally likely among all the simulations. This "bland indifference principle", is telling us that what matters is the multiplicity of a given reference class of simulations, that is what percentage of all the possible simulations belong to that reference class. The definition of a reference class of a civilisation simulation is tricky and subjective, but for our purposes is enough to fix a definition and the rest of the post will apply to that definition. For instance we may say that a simulation in which WWII never started is part of our reference class, since we can conceive to be reasonably "close" to such an alternative reality. But a simulation in which humans have evolved tails may be considered out of our reference class. Again, the choice is pretty much arbitrary, even though I didn't fully explore what happens for "crazy" choices of the reference class.


2- The vast majority of simulations are simple.

This is pretty much the core assumption in the whole post. In particular we arrive there if we assume that the likelihood of a given simulation to be run is inversely correlated with the computational complexity of the simulation, in the space of all the simulation ever run. We can call the latter the Simplicity Assumption (SA). The SA mainly follows from the instantaneous finiteness of the resources available to the simulators (all the combined entities that will ever run civilization simulations. Governments, AIs, lonely developers, etc.). By instantaneous I mean that the simulators may have infinite resources in the long run, for instance due to an infinite universe, but that they should not be able to harness infinite energy at any given time. 

We observe this behaviour in many systems: we do have a large number of small instances, a medium number of medium size instances and a small number of large ones. For instance the lifetime of UNIX processes has been found to be scaling roughly as 1/T, where T is the CPU age of the process. Similarly, many human related artifacts have been found following Zipf’s law-like distributions. 

In the case of civilization simulations, there are multiple observations that point to the SA being valid:

-While the first ancestor simulation may be a monumental government-size project, at some point the simulators will be so advanced that even a single developer will be able to run a huge amount of simulations. At that point, any simulator will be able to decide between running a single bleeding edge simulation or, for instance, 106.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  simple simulations. While it is reasonable to imagine the majority of simulators not being interested in running simple simulations, it’s hard to imagine that ALL of them would not be interested (this is similar to the flawed solutions to the Fermi's paradox claiming that ALL aliens are not doing action X). It is enough for a small number of simulators to make the second decision to quickly outnumber the number of times complex simulations have been run. The advantage for simple simulations will only become more dramatic as the simulators get more computational power. 

-If simulations are used for scientific research, the simulators will be interested in settling on the simplest possible simulation that is complex enough to feature all the elements of interest and then run that simulation over and over.

-Simple simulations are the only simulations that can be run in nested simulations or on low powered devices.

An example partially (no intelligent observer inside!) illustrating this are the Atari games. Take Asteroids. No doubt that more complex and realistic space-shooting games do exist nowadays. But the fact that Asteroids is so simple allowed for it to be embedded as playable in other games (a nested game!) and used as a reinforcement learning benchmark. So if we purely count the number of times an Asteroid-like space-shooting game (this is our reference class) has been played, the original Asteroids is well posed to be the most played space-shooting game ever.    

The exact scaling of the SA is unclear. One day we may be able to measure it, if we will be advanced enough to run many ancestor simulations. In the following let’s suppose that the scaling is at least Zipf’s law-like, so that if simulation A takes n times more computation than B, then A is n times less likely than B in the space of all simulations.


3- Therefore, we are very likely to be in a simple simulation.

This follows from 1+2.


4- Therefore, we should not expect to observe X, Y, Z, ...

We don’t know how the simulation is implemented, but in fact we only need a lower bound on how complexity scales in a simulation and then factor out our ignorance of the implementation details by finding how likely a simulation is w.r.t. another simulation. Let’s assume an incredible level of computational complexity optimisation, namely that the simulators can simulate all the universe, including the interaction of all the entities, with O(N) complexity, where N is the number of fundamental entities (quantum fields, strings, etc., it doesn’t matter what the real fundamental entity is). We also don’t really care about what approximation level is being used, how granular the simulation is, if time is being dilated, if big part of the universe are just an illusion, etc since the SA tells us that the most likely simulations are the one with the higher level of approximation. So taking the highest possible approximation level compatible with the experience of our reference class, the lower bound on the computational complexity is proportional to the time the simulation is run multiplied by the number of fundamental entities simulated. Since our universe is roughly homogenous at big scales, N is also proportional to how large the simulated space is.

Now consider a civilization simulation A that is simulating in detail our solar system and mocking the rest of the universe and a simulation B which is simulating in detail the whole milky way and mocking the rest. Simulating in detail the milky way is about 1012harder, if we count the number of stars and black holes. According to the SA with linear scaling, being in simulation B is about 1012 less likely than being in A. Some interesting predictions follow: we are very likely not going to achieve significant interstellar travel or invent von Neumann probes. We are not going to meet extraterrestrial civilizations, unless they are very close, in turn explaining Fermi's paradox. 

Similarly given two simulations with the same patch of simulated space, long living simulations are less likely than short living ones. In particular infinite lifetime universes have measure zero.

More generally, this argument applies to any other feature which provides a large enough “optional” jump in complexity in our universe. Notice that the argument is significantly weakened if super efficient ways of simulating a universe can exist (log(N) or more efficient, according to how sharp the SA distribution is). 

In turn, if humanity were to achieve these feats it would be a pretty strong indication that we don’t live in a simulation after all. Of course SH can never be completely falsified, but this is similar to any physical theory with a tunable parameter. What we can do is to make SH arbitrary unlikely, for instance by achieving space colonization of larger and larger spaces. In fact one may point out that the achievements we already made, such as the exploration of the solar system, are already a strong argument against SH. But this depends on the exact shape of the SA. 

In this post I’ve tried to keep details and subtleties at minimum, I’ve written a larger writeup for those who may be interested in digging deeper, see here: https://osf.io/ca8se/

Please let me know your comments, critiques on the assumptions of this post are very welcome.