Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 9 минут назад

Poll: Which variables are most strategically relevant?

2 часа 41 минута назад
Published on January 22, 2021 5:17 PM GMT

Which variables are most important for predicting and influencing how AI goes?

Here are some examples:

  • Timelines: “When will crazy AI stuff start to happen?”
  • Alignment tax: “How much more difficult will it be to create an aligned AI vs an unaligned AI when it becomes possible to create powerful AI?
  • Homogeneity: "Will transformative AI systems be trained/created all in the same way?"
  • Unipolar / Multipolar: "Will transformative AI systems be controlled by one organization or many?"
  • Takeoff speeds: "Will takeoff be fast or slow (or hard or soft, etc.)?"

We made this question to crowd-source more entries for our list, along with operationalizations and judgments of relative importance. This is the first step of a larger project.


  1. Answers should be variables that are importantly different from the previous answers. It’s OK if there’s some overlap or correlation. If your variable is too similar to a previous answer, instead of making a new answer, comment on the previous answer with your preferred version. We leave it to your judgment to decide how similar is too similar.
  2. Good operationalizations are important. If you can give one or more as part of your answer, great! If you can’t, don’t let that stop you from answering anyway. If you have a good operationalization for someone’s variable, add your operationalization as a comment to that variable.
  3. Upvote variables that you think are important, and strong-upvote variables that you think are very important. You can also downvote variables that you think are unimportant or overrated.
  4. The relevant sense of importance is importance for predicting and influencing how AI goes. For example, “Will AIs in the long-term future be aligned?” is important in some sense, but not that helpful to think about, so shouldn’t score highly here.


Is there an academic consensus around Rent Control?

3 часа 40 минут назад
Published on January 22, 2021 4:18 PM GMT

Rent control is a type of policy where a maximum cap is put on what a landlord may charge tennents.

I've seen two sources that suggest that there is an academic consensus against rent control:

  1. A 2009 review.
  2. 81% of economists agree against rent control.

I'm not sure how much faith to put in these, and how non-controversial this topic is in practice (perhaps there are important subcases where it is a good policy). 

Are there strong claims for rent control policies in relevant cases that are supported by a non-trivial amount of economists?

(Yonatan Cale thinks that there is a consensus against rent control. Help me prove him wrong and give him Bayes points!)


Bayesian Charity

5 часов 9 минут назад
Published on January 22, 2021 2:49 PM GMT

Cross-posted from Living Within Reason

In philosophy, the Principle of Charity is a technique in which you evaluate your opponent’s position as if it made the most amount of sense possible given the wording of the argument. That is, if you could interpret your opponent’s argument in multiple ways, you would go for the most reasonable version.

– UnclGhost

There is a corollary to the Principle of Charity which I’m calling Bayesian Charity. It says that, in general, you should interpret your opponent’s wording to be advocating the most popular position.

This is implied by Bayes Theorem. Your prior for whether someone believes something unpopular should be related to its popularity. Since an unpopular belief, by definition, has few believers, your prior for whether someone believes an unpopular position should be lower the less popular the position is, and you should update your prior based on how clear their statement was.

Extraordinary claims require extraordinary evidence. If someone said something to me suggesting that the moon landing was real, I wouldn’t think twice about it. Most people believe the moon landing was real. However, if someone said something suggesting that the US has an underground base on Mars, I’m going to ask for clarification.

Context matters, of course. If you’re at a flat Earther convention, your prior should be much higher that any random person believes in a flat Earth. But in general, if someone says something that almost no one believes, it’s probably a good idea to confirm you understood what they meant before assuming they hold such a fringe opinion.


Reflections on "Psycho-Pass"

10 часов 17 минут назад
Published on January 22, 2021 9:41 AM GMT

Psycho-Pass takes place in a cyberpunk dystopia ruled by a totalitarian AI dictator. Cyberpunk stories are often about evading the law. What makes Psycho-Pass special is its protagonist is a police officer.

Tsunemori Akane's job is to suppress crime. This involves suppressing violent criminals, which is a good thing. The AI's surveillance state makes it possible to suppress crime before it happens, which is even better. Potential criminals often include punks, radicals, gays, artists, musicians, visionaries and detectives which is…

Wait a minute.


If Psycho-Pass was written in America then Tsunemori's character arc would be a journey of disillusion. She would be commanded to do something unethical. Tsunemori would refuse. Her valiant act of disobedience would instigate a cascade of disorder leading to a revolution and the eventual overthrew the oppressive system.

Society would collapse. Millions of people would starve off-camera. Japan would plunge into civil war. Violence would permeate all corners of society.

Tsunemori had the exam scores to do anything. She chose to be a low-paid low-prestige Inspector of the Public Safety Bureau.

The law doesn't protect people. People protect the law. People have always detested evil and sought out a righteous way of living. Their feelings–the accumulation of those peoples feelings–are the law.

―Tsunemori Akane

Tsunemori Akane's nemesis equips potential criminals with the tools to indulge their desire to commit evil against others.

I want to see the splendor of people's souls.

―Makishima Shougo

Makishima doesn't care about evil or freedom per se. What he really wants to know is what do you care about more than anything else in the world? What would you sacrifice your friends, your society and your morality for?

Tsunemori values the rule of law above all else. Makishima values his individual humanity.

Psycho-Pass doesn't strawman crime by conflating rule of law with freedom, fairness or democracy. In Psycho-Pass, the authorities routinely violate their own laws. The system is corrupt to the core.

Where Psycho-Pass really shines is the scene where Tsunemori is forced choose between law and consequentialism. Makishima contrives things such that Tsunemori wouldn't even have to technically break the law. All she would have to do is break protocol. Tsunemori chooses instead to sacrifice the thing she loves most.

Tsunemori acts perfectly in character for herself. Makishima acts perfectly in character for himself. I empathize with Tsunemori. I associate with Makishima. I would put people in similar situations if[1] I was an evil psychopath too.

I notice I am confused. I too want to see the splendor of people's souls. My desire is a luxury born of privilege paid for by the likes of Tsunemori Akane.

  1. I am neither evil nor a psychopath. ↩︎


What if we all just stayed at home and didn’t get covid for two weeks?

10 часов 48 минут назад
Published on January 22, 2021 9:10 AM GMT

I keep thinking about how if at any point we were all able to actually quarantine for two weeks1 at the same time, the pandemic would be over.

Like, if instead of everyone being more or less cautious over a year, we all agreed on single a two week period to hard quarantine. With plenty of warning, so that people had time to stock up on groceries and do anything important ahead of time. And with massive financial redistribution in advance, so that everyone could afford two weeks without work. And with some planning to equip the few essential-every-week-without-delay workers (e.g. nurses, people keeping the power on) with unsustainably excessive PPE.

This wouldn’t require less total risky activity. If we just managed to move all of the risky activity from one fortnight to the one before it, then that would destroy the virus (and everyone could do as many previously risky activities as they liked in the following fortnight!). It could be kind of like the Christmas week except twice as long and the government would pay most people to stay at home and watch movies or play games or whatever. Maybe the TV channels and celebrities could cooperate and try to put together an especially entertaining lineup.

How unrealistic is this? It sounds pretty unrealistic, but what goes wrong?

Some possible things:

  1. To actually coordinate that many people, you would need to have serious policing—beyond what is an acceptable alternative to a year-long pandemic—or serious buy-in—beyond what is possible in any normal place of more than ten people.
  2. Even if you could basically coordinate that many people, you would fail in a few places. And if you fail anywhere, then the disease will gradually build back up.
  3. You can’t just have everyone buy groceries for a given fortnight at some point in the preceding months, because there aren’t enough groceries in warehouses or enough grocery producers able to spin up extra weeks of grocery production on short notice (I am especially unsure whether this is true).
  4. The people who do really have to work are too many to over-prepare well for it in a month
  5. It would cost really a lot of money
  6. It would need to be longer than two weeks if you wanted to actually crush the disease, because some people are probably infectious for abnormally long times.
  7. You would need everyone not living alone to stay away from those they live with, to avoid spreading covid within houses, making this a more extreme proposition than it first seems, very hard to police, and basically impossible for households with small children or other very dependent members.
  8. It’s just way too much logistical effort to make this happen well.

1, 2 and 7 look like the clearest problems to me. I don’t know enough to say if 3, 4 or 8 are real obstacles, and it seems like the US federal government has sent out a lot of money already, so 5 could at worst be solved by doing this thing at the time the money was sent out. 6 seems true, but I’m not sure if the length it would need to be is out of the question, if the other questions are solved.

7 is pretty bad even in a community without dependent people, because it requires active effort from everyone to protect themselves within their houses, which seems much less likely to be ubiquitously complied with than a request to not go to effort to do something (i.e. more people will find the energy to stay on their sofas than will find the energy to set up their room to prepare food in it for a fortnight). Then the dependent people who really need to stay with someone else seem even harder to get the end-of-fortnight risk down for. I could imagine dealing with these problems by spreading people out as much as feasible and requiring longer quarantines for pairs. But the difficulty of that—or need for extending the length of the whole thing—seem quite costly.

On 2 and 7, even if you don’t actually stop the pandemic, and you have to have another occasional scheduled ‘firebreak’ in activity, once cases had built up again, it seems like it could hugely reduce the human cost, without more total caution (just moving the caution in time).

(Also, if you did it for four weeks instead of two, you would only end up with cases where two failures met, i.e. where someone improbably got covid during the first two weeks, then improbably passed it on to another person in the second.)

On 4, One way you might swing this is to have many of the people who work during the two weeks then do their own hard quarantine in the following two weeks, where they can be replaced by some of the workers with similar skills who were at home during the main round.

Many of these depend on scale, and location. For instance, this can clearly often work at the level of a group house, and is probably too ambitious for a large and ideologically diverse nation (especially one that isn’t really organized for people to consistently wear masks after a year). Could it work at the level of a relatively anti-covid city? (The city would then have to limit or quarantine incoming travelers, but that seems doable for many cities.) A small town? A small college campus? A highly religious adult community, where the church was in favor? There are a lot of human groups in the world.

Have I got the main reasons this wouldn’t work? Is there some good solution to them that I haven’t seen?

Has anyone done something like this? There have been lots of lockdowns, but have there been time-bounded almost-total lockdowns scheduled in advance, with huge efforts to avert people needing to take risks during that particular period (e.g. treating moving risks to the time earlier as great compared to running them that week)?

  1. Or however long it takes a person to reliably stop spreading covid, after contracting it. 


On the nature of purpose

11 часов 28 минут назад
Published on January 22, 2021 8:30 AM GMT

[cross-posted from my blog

In this letter exchange, Alex Rosenberg and Daniel Dennett debate the nature of "purpose", whether it is real, but most importantly, what it would mean for it to be. 

I provide a summary and discussion of what I consider the key points and lines of disagreements between the two. 

Note that these are my interpretations of their points of view, and while I've tried to think about this carefully, neither Rosenberg nor Dennett had the chance to verify I have represented their views adequately. 

Quotes, if not specified otherwise, are taken from the letter exchange linked above.



Rosenberg’s crux

Rosenberg and Dennett agree on large parts of their respective worldviews. They both share a "disenchanted" naturalist's view - they believe that reality is (nothing but) causal and (in principle) explainable. They subscribe to the narrative of reductionism which acclaims how scientific progress emancipated, first, the world of physics, and later the chemical and biological one, from metaphysical beliefs. Through Darwin, we have come to understand the fundamental drivers of life as we know it - variation and natural selection. 

But despite their shared epistemic foundation, Rosenberg suspects a fundamental difference in their views concerning the nature of purpose. Rosenberg - contrary to Dennett - sees a necessity for science (and scientists) to disabuse themselves, entirely, from any anthropocentric speech of purpose and meaning. Anyone who considers the use of the “intentional stance” as justified, so Rosenberg, would have to reconcile the following: 

        What is the mechanism by which Darwinian natural selection turns reasons (tracked by the individual as purpose, meaning, beliefs and intentions) into causes (affecting the material world)? 

Rosenberg, of course, doesn't deny that humans - what he refers to as Gregorian creatures shaped by biological as well as cultural evolution - experience higher-level properties like emotions, intentions and meaning. Wilfrid Sellars calls this the "manifest image": the framework in terms of which we ordinarily perceive and make sense of ourselves and the world. [1] But Rosenberg sees a tension between the scientific and the manifest image - one that is, to his eyes, irreconcilable.

"Darwinism is the only game in town", so Rosenberg. Everything can, and ought to be, explained in terms of it. These higher-level properties - sweetness, cuteness, sexiness, funniness, colour, solidity, weight (not mass!) - are radically illusionary. Darwin's account of natural selection doesn't explain purpose, it explains it away. Just like physics and biology, so do cognitive sciences and psychology now have to become disabused from the “intentional stance”. 

In other words, it's the recalcitrance of meaning that bothers Rosenberg - the fact that we appear to need it in how we make sense of the world, while also being unable to properly integrate it in our scientific understanding. 

As Quine put it: "One may accept the Brentano thesis [about the nature of intentionality] as either showing the indispensability of intentional idioms and the importance of an autonomous science of intention, or as showing the baselessness of intentional idioms and the emptiness of a science of intention." [2]

Rosenberg is compelled by the latter path. In his view, the recalcitrance of meaning is "the last bastion of resistance to the scientific world view. Science can do without them, in fact, it must do without them in its description of reality." He doesn't claim that notions of meaning have never been useful, but that they have "outlived their usefulness", replaced, today, with better tools of scientific inquiry.

As I understand it, Rosenberg argues that purposes aren't real because they aren’t tied up with reality, unable to affect the physical world. Acting as if they were real (by relying on the concept to explain observations) is contributing to confusion and convoluted thinking. We ought, instead, to resort to the classical Darwinian explanations, where all behaviour boils down to evolutionary advantages and procreation (in a way that explains purpose away).

Rosenberg’s crux (or rather, my interpretation thereof) is that, if you want to claim that purposes are real - if you want to maintain purpose as a scientifically justified concept, one that is reconcilable with science -, you need to be able to account for how reasons turn into causes


Perfectly real illusions

While Dennett recognized the challenges presented by Rosenberg, he refuses to be troubled by them. Dennett paints a possible "third path" to Quine’s puzzle by suggesting to understand the manifest image (i.e. mental properties, qualia) neither as "as real as physics" (thereby making it incomprehensible to science) nor as "radically illusionary" (thereby troubling our self-understanding as Gregorian creatures). Instead, Dennett suggests, we can understand it as a user-illusion: "ways of being informed about things that matter to us in the world (our affordances) because of the way we and the environment we live in (microphysically [3]) are." 

I suggest that this is, in essence, a deeply pragmatic account. (What account other than pragmatism, really, could utter, with the same ethos, a sentence like: "These are perfectly real illusions!") 

While not explicitly saying such, we can interpret Dennett as invoking the bounded nature of human minds and their perceptual capacity. Mental representations, while not representing reality fully truthfully (e.g. there is no microphysical account of colours, just photons), they also aren't arbitrary. They are issued (in part) from reality, and through compression inherent to the mind’s cognitive processes, these representations get distorted such as to form false, yet in all likelihood useful, illusions. 

These representations are useful because they have evolved to be such: after all, it is through the interaction with the causal world that the Darwinian fitness of an agent is determined; whether we live or die, procreate or fail to do so. Our ability to perceive has been shaped by evolution to track reality (i.e. to be truthful), but only exactly to the extent that this gives us a fitness advantage (i.e. is useful). Our perceptions are neither completely unrestrained nor completely constrained by reality, and therefore they are neither entirely arbitrary nor entirely accurate. 

Let’s talk about the nature of patterns for a moment. Patterns are critical to how intelligent creatures make sense of and navigate the world. They allow (what would otherwise be far too much) data to be compressed, while still granting predictive power. But are patterns real? Patterns directly stem from reality - they are to be found in reality - and, in this very sense, they are real. But, if there wasn’t anyone or anything to perceive and make use of this structural property of the real world, it wouldn’t be meaningful to talk of patterns. Reality doesn’t care about patterns. Observers/agents do. 

This same reasoning can be applied to intentions. Intentions are meaningful patterns in the world. An observer with limited resources who wants to make sense of the world (i.e. an agent that wants to reduce sample complexity) can abstract along the dimension of "intentionality" to reliably get good predictions about the world. (Except, "abstracting along the dimension of intentionality" isn't an active choice of the observer, rather than something that emerges because intentions are a meaningful pattern.) The "intentionality-based" prediction does well at ignoring variables that aren't sufficiently predictive and capturing the ones that are, which is critical in the context of a bounded agent.

Another point in case: affordances. In the preface to his book Surfing Uncertainty,  Andy Clark writes : “ [...] different (but densely interanimanted) neural populations learn to predict various organism-salient regularities pertaining at many spatial and temporal scales. [...] The world is thus revealed as a world tailored to human needs, tasks and actions. It is a world built of affordances - opportunities for action  and intervention.“ Just like patterns, the world isn’t made up of affordances. And yet, they are real in the sense of what Dennett calls user-illusions. 


The cryptographer’s constraint

Dennett goes on to endow these illusionary reasons with further “realness” by invoking the cryptographer's constraint: 

        It is extremely hard - practically infeasible - to design an even minimally complex system for the code of which there exists more than one reasonable decryption/translation. 

Dennett uses a  simple crossword puzzle to illustrate the idea: “Consider a crossword puzzle that has two different solutions, and in which there is no fact that settles which is the “correct” solution. The composer of the crossword went to great pains to devise a puzzle with two solutions. [...] If making a simple crossword puzzle with two solutions is difficult, imagine how difficult it would be to take the whole corpus of human utterances in all languages and come up with a pair of equally good versions of Google Translate that disagreed!” [slight edits to improve readability]

The practical consequence of the constraint is that, “if you can find one reasonable decryption of a cipher-text, you’ve found the decryption.” Furthermore, this constraint is a general property of all forms of encryption/decryption.  

Let’s look at the sentence: “Give me a peppermint candy!”

Given the cryptographer’s constraint, there are, practically speaking, very (read: astronomically) few plausible interpretations of the words “peppermint”, “candy”, etc. This is at the heart of what makes meaning non-arbitrary and language reliable. 

To add a bit of nuance: the fact that the concept "peppermint" reliably translates to the same meaning across minds requires iterated interactions. In other words, Dennett doesn’t claim that, if I just now came up with an entirely new concept (say "klup"), its meaning would immediately be unambiguously clear. But its meaning (across minds) would become increasingly more precise and robust after using it for some time, and - on evolutionary time horizons - we can be preeetty sure we mean (to all practical relevance) the same things by the words we use.

But what does all this have to do with the question of whether purpose is real? Here we go: 

The cryptographer's constraint - which I will henceforth refer to as the principle of pragmatic reliability [4]- is an essential puzzle piece to understanding what allows representations of reasons (e.g. a sentence making a claim) to turn into causes (e.g. a human taking a certain action because of that claim). 

We are thus starting to get closer to Rosenberg’s crux as stated above: a scientific account for how reasons become causes. There is one more leap to take.



Having invoked the role of pragmatic reliability, let’s examine another pillar of Dennett's view - one that will eventually get us all the way to addressing Rosenberg’s crux. 

Rosenberg says: "I see how we represent in public language, turning inscriptions and noises into symbols. I don’t see how, prior to us and our language, mother nature (a.k.a Darwinian natural selection) did it." 

What Rosenberg conceives to be an insurmountable challenge to Dennett’s view, the latter prefers to walk around rather than over, figuratively speaking. As developed at length in his book From Bacteria to Bach and Back, Dennett suggests that "mother nature didn’t represent reasons at all", nor did it need to. 

First, the mechanism of natural selection uncovers what Dennett calls "free-floating rationales” - reasons that existed billions of years before and independent of reasoners. Only when the tree of life grew a particular (and so far unique) branch - humans, together with their use of language -, these reasons start to get represented

"We humans are the first reason representers on the planet and maybe in the universe. Free-floating rationales are not represented anywhere, not in the mind of God, or Mother Nature, and not in the organisms who benefit from all the arrangements of their bodies for which there are good reasons. Reasons don’t have to be representations; representations of reasons do."

This is to say: it isn't exactly the reasons, so much as their representations, that become causes.

Reasons-turning-causes, so Dennett, is unique to humans because only humans represent reasons. I would nuance that the capacity to represent lives on a spectrum rather than a binary. Some other animals seem to be able to do something like representation, too. [5] That said, humans remain unchallenged in the degree two which they have developed the capacity to represent (among the forms of life we are currently aware of).  

"Bears have a good reason to hibernate, but they don’t represent that reason, any more than trees represent their reason for growing tall: to get more sunlight than the competition." While there are rational explanations for the bear’s or the tree’s behaviour, they don’t understand, think about or represent these reasons. The rationale has been discovered by natural selection, but the bear/tree doesn’t know - nor does it need to - why it wants to stay in their dens during winter.  

Language plays a critical role in this entire representation-jazz. Language is instrumental to our ability to represent; whether as necessary precursor, mediator or (ex-post) manifestation of that ability remains a controversial question among philosophers of language. Less controversial, however, is the role of language in allowing us to externalize representations of reasons, thereby “creating causes” not only for ourselves but also for people around us. Wilfrid Sellars suggested that language bore what he calls “the space of reasons” - the space of argumentation, explanation, query and persuasion. [6] In other words, language bore the space in which reasons can become causes. 

We can even go a step further: while acknowledging the role of natural selection in shaping what we are - the fact that the purposes of our genes are determined by natural selection -, we are still free to make our own choices. To put it differently: "Humans create the purposes they are subject to; we are not subject to purposes created by something external to us.” [7] 

In From Darwin to Derrida: Selfish Genes, Social Selves, and the Meanings of Life, David Haigh argues for this point of view by suggesting that there does not need be full concordance, nor congruity, between our psychological motivations (e.g. wanting to engage in sexual activity because it is pleasurable, wanting to eat a certain food because it is tasty) and the reasons why we have those motivations (e.g. in order to pass on our genetic material).

There is a piece of folk wisdom that goes: “the meaning of life is the meaning we give it”. Based on what has been discussed in this essay, we can see this saying in a different, more scientific light: as a testimony of the fact that we humans are creatures that represent meaning, and by doing so we turn “free-floating rationales” into causes that govern our own. 

Thanks to particlemania, Kyle Scott and Romeo Stevens for useful discussions and comments on earlier drafts of this post. 


[1] Sellars, Wilfrid. "Philosophy and the scientific image of man." Science, perception and reality 2 (1963).
Also see: deVries, Willem. "Wilfrid Sellars", The Stanford Encyclopedia of Philosophy (Fall 2020 Edition). Retrieved from: https://plato.stanford.edu/archives/fall2020/entries/sellars/

[2]  Quine, Willard Van Orman. "Word and Object. New Edition." MIT Press (1960).

[3]  I.e. the physical world at the level of atoms 

[4]  AI safety relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research. 

Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a whole lot, but in the bigger scheme of things, we still appear to be pretty damn good at this communication thing.)  

Notably, language works without there being theoretically air-tight proofs that map meanings on words. 

Right there, we have an empirical case study of a symbolic system that functions on a (merely) pragmatically reliable regime. We can use it to inform our priors on how well this regime might work in other systems, such as AI, and how and why it tends to fail.

One might argue that a pragmatically reliable alignment isn’t enough - not given the sheer optimization power of the systems we are talking about. Maybe that is true; maybe we do need more certainty than pragmatism can provide. Nevertheless, I believe that there are sufficient reasons for why this is an avenue worth exploring further. 

Personally, I am most interested in this line of thinking from an AI ecosystems/CAIS point of view, and as a way of addressing (what I consider a major challenge) the problem of the transient and contextual nature of preferences. 

[5] People wanting to think about this more might be interesting in looking into vocal (production) learning - the ability to “modify acoustic and syntactic sounds, acquire new sounds via imitation, and produce vocalizations”. This conversation might be a good starting point.

[6] Sellars, Wilfrid. In the space of reasons: Selected essays of Wilfrid Sellars. Harvard University Press (2007).

[7] Quoted from:  https://twitter.com/ironick/status/1324778875763773448 


Appendices to cryonics signup sequence

13 часов 18 минут назад
Published on January 22, 2021 6:40 AM GMT

This post is for reference and is not intended to stand alone. It may be periodically updated with additional appendices.

Appendix A: Cryonics and x-risk timelines

Over the past few years, some people have updated toward pretty short AGI timelines. If your timelines are really short, then maybe you shouldn't sign up for cryonics, because the singularity – good or bad – is overwhelmingly likely to happen before you biologically die. But I bet you don't really believe that. You could already have terminal cancer and not know it, or you could get hit by a car next year. If the singularity is positive and results in the revival of those who got cryopreserved before it happened, then even if you expect the singularity in like, two years, you really should sign up without delay if there's any chance you might die before then (spoiler alert: there is). 

Alternatively, you might think that if timelines are longer than your lifespan it will be because of some specific future sign (like a clear AI winter or a world war), and you think it's a better tradeoff to buy cryonics exactly when that sign shows up.

This particular point seemed like an important one to flag, since this is one of the big ways a lot of rationalists' models have changed since those older cryonics posts came out, and we don't want people to make decisions wrongly based on cached thoughts.

Appendix B: Non life insurance payment optionsAlcor

In addition to life insurance, Alcor's membership application lists (1) trust, (2) prepayment, (3) annuity, and (4) other means. These all fall under the umbrella of 'self-funding.' CI lets you pay in these ways as well. 


The Alcor Standard Trust "has been fully approved by Alcor ... and is therefore immediately available in its existing format and content. [It] is designed to be entirely autonomous from a member’s estate, to provide secure Alcor membership funding."

You can fund a trust with stocks, treasury bonds, life insurance, federally insured money market funds, cash, or other assets approved by Alcor. Linda Chamberlain (Co-Founder of Alcor) is the head of the Trust Department and can guide you through the Trust Approval Process.


If you have a bunch of cash lying around, like, way more than you'll ever ever need, you can prepay out of pocket for your cryopreservation – just give the money to your cryonics provider. They'll keep a portion of it for when you wake up, but for the rest of this lifespan, you just won't have access to that money anymore. 

As far as I can calculate, this option is never cheaper than using life insurance, so you should only do it if you are uninsurable (e.g. if you have already been diagnosed with a terminal illness) or if you're in a huge hurry (e.g. you have less than six months left to live)...

...or, I suppose, you could do it if a lump sum of a few hundred thousand dollars is something it doesn't hurt you at all to give away, and you have literally no other use for the money – not even to leave it to family or charity in your will.


I don't really understand what an annuity is in this context, although I'm pretty sure it's a thing that's paid in installments. If you want to pay with an annuity, contact Rudi Hoffman.

Cryonics Institute

CI just has their own whole page on this, which I recommend you check out; there's not much point in me just reproducing it all here. Their ways of funding are:

  • Revocable trust
  • Transfer on Death account
  • Prepayment
  • Prepayment to a third party

CI shares a long message from a member John de Rivaz on their funding page, under the heading "Is life insurance the best way to fund a contract?", suggesting that they recommend funding a CI contract through an investment trust. (For legal reasons, they can't officially endorse de Rivaz's investment advice – same as how HR people will always say "this is not tax advice, but…").


[Book Review] The Chrysanthemum and the Sword

13 часов 29 минут назад
Published on January 22, 2021 6:29 AM GMT

Japan was the only non-Western[1] country to build an industrial empire before the establishment of the liberal world order. The Chrysanthemum and the Sword: Patterns of Japanese Culture by Ruth Benedict is thus a real-world case study of what culture could been in a counterfactual history where the Industrial Revolution happened without Christianity, the Enlightenment and democracy.

Many books have been written about Imperial Japan. The Chrysanthemum and the Sword is exceptional.

  • It was written in 1946[2], before the United States Westernized Japan. Meiji Japan still existed when The Chrysanthemum and the Sword was published.
  • The Chrysanthemum and the Sword was written for the US State Office of War Information. It is more goal-oriented than most histories and ethnographies. This isn't a book about appreciating a beautiful culture. It is an introductory textbook for reengineering an entire society.

The most interesting thing about The Chrysanthemum and the Sword is its philosophy. Western philosophy has Christian roots. Christianity is monotheistic. Monotheism is built around the idea of one truth. Singular truth is useful in the domain of science because the laws of physics are universal.

Western philosophers often take it for granted that ethics, custom and philosophy ought to be singular too. Daoist philosophy doesn't. In traditional Japan, acting differently in different contexts was a sign of refinement. Inscrutability was a virtue.

There were other things which don't make sense in a Western context but which do make sense in an Eastern context. For example, Japanese soldiers were notorious for the savagery of their attacks on US forces. However, when captured, Japanese prisoners of war were well-behaved and followed the orders of their prison guards. This pair of behaviors is not what you would expect from the same American prisoner of war in 1945.

Though I occasionally discuss Daoism with Taiwanese people, it is rare for me to have a complex productive conversation about Daoist ideas with white people. There isn't enough shared cultural context. If you want to read a good book on traditional Eastern culture written in a modern Western style then I recommend The Chrysanthemum and the Sword.

  1. I include Russia in "Western". ↩︎

  2. For reference, Japan surrendered to the Allies on August 15, 1945. ↩︎


A Simple Ethics Model

17 часов 2 минуты назад
Published on January 22, 2021 2:56 AM GMT

[Cross-posted from my soon-to-be-defunct blog]

Disclaimer #1: I am probably reinventing the wheel. However, reinventing the wheel is good. 

Disclaimer #2: This post probably fundamentally misunderstands non-consequentialists. 

I’ve been thinking about normative ethics lately. There are three major schools. Consequentialism, deontology (“rule-based morality”), and virtue ethics. These are typically said to be in opposition, but I think it’s simple for consequentialism to subsume the other two. Let me know if you find this framework helpful, or if it’s missing something important.


Consequentialism at its simplest looks like this: perform the action that achieves the best consequences. 

Unfortunately, predicting consequences is really hard. It’s even harder to predict nth-order consequences. We often default to only considering immediate consequences because they’re easier to calculate and predict. However, long-term consequences are typically much more important. Thus, approaching decisions as though we can calculate the best consequences will often lead to worse decisions. 

Not only that, we have complicated values we attempt to satisfy. While calculating is great when values are relatively simply (QALYs), it’s often infeasible within our common constraints (especially time: “Do you want this job or not?”)

So even though we’re consequentialists, it’s hard to function consequentially. This isn’t a blow to consequentialism though, it’s a sign that our model of human decision-making is missing something.


The alternative to brute calculation is the use of rules or heuristics. Not eating meat is simpler (meaning: easier, faster, and cheaper) than calculating the negative utility of each meal and finding the equilibrium between personal pleasure and animal suffering. 

Rules also have the benefit of counteracting biases and poor calculation capability. For example, people make worse decisions while drunk. Having a rule like “I never drive drunk” routes around the risk of calculating utility badly. 

When time and processing power permit, calculations allow for superior decisions. When they don’t permit, rules allow for superior decisions. 


Where do virtue ethics come in? Consider virtues like courage, humility, and temperance. These are habits of action. They’re important for three reasons:

  1. The decision-making machine in your head, the one that calculates or checks rules, only delivers knowledge of your preferred action. Whether it’s delivered quickly or slowly, it’s still only knowledge. Execution remains. Execution can be hampered by outside factors like fatigue, resentment, or fear. Virtues are those habits that allow us to execute ethical decision-making in spite of those factors.
  2. We think about rules when we don’t have minutes to calculate. But sometimes we don’t even have seconds to think about rules. Our habits kick in before we have time to access our decision-making. A virtue is whatever habit lends itself to better responses–this includes things like reacting instantly to cruelty (before timidity can set in) or reacting humbly to praise (before scheming can set in).
  3. Even thinking to apply rules or to calculate is a habit. It takes wisdom to think.


The ethical process is two-fold:

  1. Develop toward superior values
  2. Act toward achieving those values

I think that using this framework helps me with #2 in a few ways. 

Firstly, by categorizing different approaches, it helps me choose between rules and calculations. Sometimes I apply rules when I could be calculating (to better ends). Keeping in mind that calculations are sometimes an option will improve my decision-making. I think this is a far more common failure mode than the alternative, choosing to calculate when heuristics would be superior. 

Secondly, it helps me figure out the point of virtue. Why should those of us who want to do the most good we can do think about virtue? Because there will be moments in our lives when we can do significant good and it will come down to habits instead of careful decisions. 

Thirdly, in the past I’ve gotten hung up on how to achieve my values. Should I just focus on being a good person? Should I follow good heuristics? Should I always do what seems to have the best consequences? In practice, I, like everybody, use all three methods, and have worried that this means I’m being inconsistent (which I’m sure I am in other ways!). It doesn’t though. They’re just different components of the same ethical machine. 


[Link] Still Alive - Astral Codex Ten

20 часов 38 минут назад
Published on January 21, 2021 11:20 PM GMT


This was a triumph
I'm making a note here, huge success

No, seriously, it was awful. I deleted my blog of 1,557 posts. I wanted to protect my privacy, but I ended up with articles about me in New Yorker, Reason, and The Daily Beast. I wanted to protect my anonymity, but I Streisand-Effected myself, and a bunch of trolls went around posting my real name everywhere they could find. I wanted to avoid losing my day job, but ended up quitting so they wouldn't be affected by the fallout. I lost a five-digit sum in advertising and Patreon fees. I accidentally sent about three hundred emails to each of five thousand people in the process of trying to put my blog back up.

I had, not to mince words about it, a really weird year.

The first post on Scott Alexander's new blog on Substack, Astral Codex Ten.


[AN #134]: Underspecification as a cause of fragility to distribution shift

21 января, 2021 - 21:10
Published on January 21, 2021 6:10 PM GMT

[AN #134]: Underspecification as a cause of fragility to distribution shift Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world View this email in your browser Newsletter #134
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.
Audio version here (may not be up yet).
Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer. SECTIONS HIGHLIGHTS
        CRITIQUES (AI)

Underspecification Presents Challenges for Credibility in Modern Machine Learning (Alexander D'Amour, Katherine Heller, Dan Moldovan et al) (summarized by Rohin): This paper explains one source of fragility to distributional shift, which the authors term underspecification. The core idea is that for any given training dataset, there are a large number of possible models that achieve low loss on that training dataset. This means that the model that is actually chosen is effectively arbitrarily chosen from amongst this set of models. While such a model will have good iid (validation) performance, it may have poor inductive biases that result in bad out-of-distribution performance.

The main additional prediction of this framing is that if you vary supposedly “unimportant” aspects of the training procedure, such as the random seed used, then you will get a different model with different inductive biases, which will thus have different out-of-distribution performance. In other words, not only will the out-of-distribution performance be worse, its variance will also be higher.

The authors demonstrate underspecification in a number of simplified theoretical settings, as well as realistic deep learning pipelines. For example, in an SIR model of disease spread, when we only have the current number of infections during the initial growth phase, the data cannot distinguish between the case of having high transmission rate but low durations of infection, vs. a low transmission rate but high durations of infection, even though these make very different predictions about the future trajectory of the disease (the out-of-distribution performance).

In deep learning models, the authors perform experiments where they measure validation performance (which should be relatively precise), and compare it against out-of-distribution performance (which should be lower and have more variance). For image recognition, they show that neural net training has precise validation performance, with 0.001 standard deviation when varying the seed, but less precise performance on ImageNet-C (AN #15), with standard deviations in the range of 0.002 to 0.024 on the different corruptions. They do similar experiments with medical imaging and NLP.

Rohin's opinion: While the problem presented in this paper isn’t particularly novel, I appreciated the framing of fragility of distributional shift as being caused by underspecification. I see concerns about inner alignment (AN #58) as primarily worries about underspecification, rather than distribution shift more generally, so I’m happy to see a paper that explains it well.

That being said, the experiments with neural networks were not that compelling -- while it is true that the models had higher variance on the metrics testing robustness to distributional shift, on an absolute scale the variance was not high: even a standard deviation of 0.024 (which was an outlier) is not huge, especially given that the distribution is being changed.


Manipulating and Measuring Model Interpretability (Forough Poursabzi-Sangdeh et al) (summarized by Rob): This paper performs a rigorous, pre-registered experiment investigating to what degree transparent models are more useful for participants. They investigate how well participants can estimate what the model predicts, as well as how well the participant can make predictions given access to the model information. The task they consider is prediction of house prices based on 8 features (such as number of bathrooms and square footage). They manipulate two independent variables. First, CLEAR is a presentation of the model where the coefficients for each feature are visible, whereas BB (black box) is the opposite. Second, -8 is a setting where all 8 features are used and visible, whereas in -2 only the 2 most important features (number of bathrooms and square footage) are visible. (The model predictions remain the same whether 2 or 8 features are revealed to the human.) This gives 4 conditions: CLEAR-2, CLEAR-8, BB-2, BB-8.

They find a significant difference in ability to predict model output in the CLEAR-2 setting vs all other settings, supporting their pre-registered hypothesis that showing the few most important features of a transparent model is the easiest for participants to simulate. However, counter to another pre-registered prediction, they find no significant difference in deviation from model prediction based on transparency or number of features. Finally, they found that participants shown the clear model were less likely to correct the model's inaccurate predictions on "out of distribution" examples than participants with the black box model.

Rob's opinion: The rigour of the study in terms of it's relatively large sample size of participants, pre-registered hypotheses, and follow up experiments is very positive. It's a good example for other researchers wanting to make and test empirical claims about what kind of interpretability can be useful for different goals. The results are also suggestive of considerations that designers should keep in mind when deciding how much and what interpretability information to present to end users.


Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain (Daniel Kokotajlo) (summarized by Rohin): This post argues against a particular class of arguments about AI timelines. These arguments have the form: “The brain has property X, but we don’t know how to make AIs with property X. Since it took evolution a long time to make brains with property X, we should expect it will take us a long time as well”. The reason these are not compelling is because humans often use different approaches to solve problems than evolution did, and so humans might solve the overall problem without ever needing to have property X. To make these arguments more convincing, you need to argue 1) why property X really is necessary and 2) why property X won’t follow quickly once everything else is in place.

This is illustrated with a hypothetical example of someone trying to predict when humans would achieve heavier-than-air flight: in practice, you could have made decent predictions just by looking at the power to weight ratios of engines vs. birds. Someone who argued that we were far away because “we don't even know how birds stay up for so long without flapping their wings” would have made incorrect predictions.

Rohin's opinion: This all seems generally right to me, and is part of the reason I like the biological anchors approach (AN #121) to forecasting transformative AI.


A narrowing of AI research? (Joel Klinger et al) (summarized by Rohin): Technology development can often be path-dependent, where initial poorly-thought-out design choices can persist even after they are recognized as poorly thought out. For example, the QWERTY keyboard persists to this day, because once enough typists had learned to use it, there was too high a cost to switch over to a better-designed keyboard. This suggests that we want to maintain a diversity of approaches to AI so that we can choose amongst the best options, rather than getting locked into a suboptimal approach early on.

The paper then argues, based on an analysis of arXiv papers, that thematic diversity in AI has been going down over time, as more and more papers are focused on deep learning. Thus, we may want to have policies that encourage more diversity. It also has a lot of additional analysis of the arXiv dataset for those interested in a big-picture overview of what is happening in the entire field of AI.


Neurosymbolic AI: The 3rd Wave (Artur d'Avila Garcez et al) (summarized by Zach): The field of neural-symbolic AI is broadly concerned with how to combine the power of discrete symbolic reasoning with the expressivity of neural networks. This article frames the relevance of neural-symbolic reasoning in the context of a big question: what are the necessary and sufficient building blocks of AI? The authors address this and argue that AI needs to have both the ability to learn from and make use of experience. In this context, the neural-symbolic approach to AI seeks to establish provable correspondences between neural models and logical representations. This would allow neural systems to generalize beyond their training distributions through neural-reasoning and would constitute significant progress towards AI.

The article surveys the last 20 years of research on neural-symbolic integration. As a survey, a number of different perspectives on neural-symbolic AI are presented. In particular, the authors tend to see neural-symbolic reasoning as divided into two camps: localist and distributed. Localist approaches assign definite identifiers to concepts while distributed representations make use of continuous-valued vectors to work with concepts. In the later parts of the article, promising approaches, current challenges, and directions for future work are discussed.

Recognizing 'patterns' in neural networks constitutes a localist approach. This relates to explainable AI (XAI) because recognizing how a given neural model makes a decision is a pre-requisite for interpretability. One justification for this approach is that codifying patterns in this way allows systems to avoid reinventing the wheel by approximating functions that are already well-known. On the other hand, converting logical relations (if-then) into representations compatible with neural models constitutes a distributed approach. One distributed method the authors highlight is the conversion of statements in first-order logic to vector embeddings. Specifically, Logic Tensor Networks generalize this method by grounding logical concepts onto tensors and then using these embeddings as constraints on the resulting logical embedding.

Despite the promising approaches to neural-symbolic reasoning, there remain many challenges. Somewhat fundamentally, formal reasoning systems tend to struggle with existential quantifiers while learning systems tend to struggle with universal quantification. Thus, the way forward is likely a combination of localist and distributed approaches. Another challenging area lies in XAI. Early methods for XAI were evaluated according to fidelity: measures of the accuracy of extracted knowledge in relation to the network rather than the data. However, many recent methods have opted to focus on explaining data rather than the internal workings of the model. This has resulted in a movement away from fidelity which the authors argue is the wrong approach.

Read more: Logic Tensor Networks, The Bitter Lesson

Zach's opinion: The article does a reasonably good job of giving equal attention to different viewpoints on neural-symbolic integration. While the article does focus on the localist vs. distributed distinction, I also find it to be broadly useful. Personally, after reading the article I wonder if 'reasoning' needs to be hand-set into a neural network at all. Is there really something inherently different about reasoning such that it wouldn't just emerge from any sufficiently powerful forward predictive model? The authors make a good point regarding XAI and the importance of fidelity. I agree that it's important that our explanations specifically fit the model rather than interpret the data. However, from a performance perspective, I don't feel I have a good understanding of why the abstraction of a symbol/logic should occur outside the neural network. This leaves me thinking the bitter lesson (AN #49) will apply to neural-symbolic approaches that try to extract symbols or apply reason using human features (containers/first-order logic).

Rohin's opinion: While I do think that you can get human-level reasoning (including e.g. causality) by scaling up neural networks with more diverse data and environments, this does not mean that neural-symbolic methods are irrelevant. I don’t focus on them much in this newsletter because 1) they don’t seem that relevant to AI alignment in particular (just as I don’t focus much on e.g. neural architecture search) and 2) I don’t know as much about them, but this should not be taken as a prediction that they won’t matter. I agree with Zach that the bitter lesson will apply, in the sense that for a specific task as we scale up we will tend to reproduce neural-symbolic approaches with end-to-end approaches. However, it could still be the case that for the most challenging and/or diverse tasks, neural-symbolic approaches will provide useful knowledge / inductive bias that make them the best at a given time, even though vanilla neural nets could scale better (if they had the data, memory and compute).


DPhil Scholarships Applications Open (Ben Gable) (summarized by Rohin): FHI will be awarding up to six scholarships for the 2021/22 academic year for DPhil students starting at the University of Oxford whose research aims to answer crucial questions for improving the long-term prospects of humanity. Applications are due Feb 14.

FEEDBACK I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email. PODCAST An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.
Subscribe here:

Copyright © 2021 Alignment Newsletter, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.


Counterfactual control incentives

21 января, 2021 - 19:54
Published on January 21, 2021 4:54 PM GMT

Co-authored with Rebecca Gorman.

In section 5.2 of their Arxiv paper, "The Incentives that Shape Behaviour", which introduces structural causal influence models and a proposal for addressing misaligned AI incentives, the authors present the following graph:

The blue node is a "decision node", defined as where the AI chooses its action. The yellow node is a "utility node", defined as the target of the AI's utility-maximising goal. The authors introduce this graph to introduce the concept of control incentives; the AI, given utility-maximizing goal of user clicks, discovers an intermediate control incentive: influencing user options. By influencing user opinions, the AI better fulfils its objective. This 'control incentive' is graphically represented by surrounding it in dotted orange.

A click-maximising AI would only care about user opinions indirectly: they are a means to an end. A amoral social media company might agree with the AI on this, and be ok with it modifying user opinions to achieve higher clicks/engagement. But the users themselves would object highly to this; they do not want the algorithm to have a control incentive for changing their opinions.

Carey et al designed an algorithm to remove this control incentive. They do this by instructing the algorithm to choose its posts, not on predictions of the user's actual clicks - which produce the undesired control incentive - but on predictions of what the user would have clicked on, if their opinions hadn't been changed.

In this graph, there is no longer any control incentive for the AI on the "Influenced user opinions", because that node no longer connects to the utility node.

Call this construction a "counterfactual incentive". This substitutes the (model of) the users' original opinion where the influenced user opinions used to go. A more detailed description of such an algorithm can be found in this footnote[1]. In words, the process can be phrased:

  • The counterfactual incentive is what the utility would have been, if the influenced opinions were replaced with the original opinions (everything else being equal).

It seems to neutralise a vicious, ongoing cycle of opinion change in order to maximize clicks. But, as this post shows, there are issues with this construction and, if used naively, counterfactual modifications will fail.

Matching players and buying items

In order to analyse the issues with the setup, let's choose a more narrowly defined example. There are many algorithms that aim to manipulate payers of mobile games in order to get them to buy more expensive in-game items.

One such manipulative algorithm[2] is to match up a new player against a more skilled player who is wielding an expensive item. The new player looses, and becomes convinced that the expensive item is key to success. And the new player therefore buys it.

Using the notation from Carey et al, this can be encoded in the following graph[3]:

Here the AI's decision node π.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} is the player to match the user with. Its objective is to get the player to pay more money ($). The node R0 is the user's original estimation of the relative ranking of various in-game items; the node R is the same user's estimation after their first match.

The AI discovers a control incentive on R: by choosing the user's opponent, it will modify the user's ranking in order to cause the player to buy more expensive items (see next section for an example of this in practice).

Suppose we wanted to remove this "dark pattern". First, we'd need some way of the AI measuring or estimating the value of R. Then we could apply the counterfactual incentive construction and get the following causal graph:

Since it now targets the counterfactual predicted purchases P$, it seems that the AI will no longer be incentivised to manipulate the player to change their item rankings in R. In words, this is:

  • The counterfactual incentive is what $ would have been, if the R was replaced with the original R0 (everything else being equal).
The implicit variables are important

That parenthetical aside - "everything else being equal" - is key. In partial derivatives, expressions like ∂/∂x only have meaning if we know x and all the other variables. Similarly, we don't know what "replace R with the original R0" means, unless we know what isn't changed.

To demonstrate this, let's take the arrow connecting π, the AI's decision node, with P$, the utility node. That encodes a lot of information, but does so implicitly. Let's instead make it more explicit.

Let's pick one example. Now, one way π could affect P$, is if the user develops better knowledge of the game through their match with the AI-selected opponent. We could call this knowledge K; adding that explicitly to the graph gives:

In this graph, only K and R (respectively R0) can affect the utility nodes $ (respectively P$). Then, in words:

  • The counterfactual incentive is what $ would have been, if the R was replaced with the original R0 (with K being unchanged).

This K being unchanged (the "everything else being equal" of the original formulation) will be a crucial point of analysis.

All goes well: independent nodes

Assume there are three items that can be used in this game. They are the wooden spoon, the longsword, and the Flaming Poisoning Raging Sword Of Doom.

The wooden spoon is inferior, while the other two swords are exactly equivalent in gameplay; the user starts out knowing this fact (this is the value of R0).

The Flaming Poisoning Raging Sword Of Doom, is, however, extremely flashy, with lots of sound effects, lighting and lightning, sparkles and decorative motifs; it is also a lot more expensive. If the user is matched against a superior opponent with the FPRSoD, they will conclude that their defeat was due to not having the FPRSoD, and will be willing to pay the higher price for it than for the longsword.

Assume that the AI has two possible policies: πl, where it will match the user up with a superior player with a longsword, and πF, where it will match the user up with another superior player, of equivalent skill to the first, but with the FPRSoD.

In both cases, K will be updated to reflect 'a sword is useful'. Assume the πl will leave unchanged the user's rankings (so R will take the value r0, the same as R0), that the longsword and the FPRSoD are equivalent. So, in that case, the user will buy the longsword, since it is cheaper.

In contrast, πF will cause the user to overvalue the FPRSoD (thus changing R to rF), and then buy it. Graphically, we get this (with the obvious convention that the left/right values of a node correspond to the left/right value of the decision node):

Without the counterfactual incentive, πF is the superior policy for the AI, since the user, with its new R=rF, will then buy the more expensive FPRSoD.

With the counterfactual incentive, the algorithm has been instructed to base its predictions off the original rankings of the players, so R is set to r0 (the value of R0). Therefore the AI will model the user as valuing the longsword and the FPRSoD equally, for both πl and πF. Since the longsword is cheaper, it will expect that the user will buy it. Thus the AI has no incentive to manipulate the user by choosing πF over[4] πl.

Problems appear: mutual information

In the previous example, the value of R was set by the choice of πl or πF, while the value of K would be the same in both cases. This makes R and K independent of each other (in a rather trivial sense).

But we'll show that problems appear when there is a dependence between the two nodes. To demonstrate that, model the situation as before, but the three options are the wooden spoon, the longsword, and the Flaming Poisoning Raging Mace Of Doom.

Again, the user starts out with the (correct) impression that the longsword and the FPRMoD are equivalent, and the AI can choose between πl and πF, setting the user against a superior opponent armed either with a longsword or the FPRMoD. And again, πF will change R towards the FPRMoD.

A key difference is that, though the longsword and the FPRMoD are ultimately equivalent in their usefulness in winning the game, they are used differently. If the user sees a superior player use one of the item, they will start to understand how that item should be used.

So, under πF, the user will start developing skills in using a mace rather than a sword. Their gameplay knowledge, K, will update to kF. If instead they see the longsword used, then this will update K to value kl. Graphically, it looks like this:

In the standard case, there are two worlds, (πl,kl,r0) and (πF,kF,rF). These lead to P$ = "longsword price" and P$ = "FPRMoD price", respectively.

If we use counterfactual incentives, the first world stays the same (since R=r0 already), leading to P$ = "longsword price". The second world changes to (πF,kF,r0) leading to... what exactly?

Contradictory counterfactual worlds

As we've laid the situation out, the AI would have no way to learn what happens in world (πF,kF,r0), because that world never happens. It can encounter multiple users, try πl and πF, get back the values of R, K, and P$, and never know that (πF,kF,r0) leads to, because it never sees those values.

Adding the meaning to the symbols, πF means setting up the user with an opponent wielding the FPRMoD; this always changes both K (their knowledge of how to play the game) and R (their relative ranking of the weapons). But somehow the AI has to model a world where only K was changed. The problem is that the node R and the node K have mutual information: they both encode, at least in part, the user's judgements about swords and maces, and it's not trivial to change the value of one while preserving the other.

There are multiple ways the AI can model the dependence of P$ on the other variables, all of them consistent with the data it's seen. One way is to predict that r0 leads to P$ = "longsword price", while rF leads to P$ = "FPRMoD price". In that case, counterfactual incentives will work as intended: the values (πF,kF,r0) will lead to P$ = "longsword price".

However, there's another equally consistent way of seeing the data: predicting that kl leads to P$ = "longsword price", while kF leads to P$ = "FPRMoD price". In that case, the counterfactual incentives will lead to the AI predicting that the values (πF,kF,r0) will lead to P$ = "FPRMoD price". In this setting, the counterfactual incentives are useless: the AI will behave as if they don't exist (since it doesn't use R in its predictions) and push the mace on the user.

In the second case, the value of K is used as a proxy for the value of R. And that proxy is accurate in all actual worlds the AI sees. So it extends this to the counterfactual world, where we don't want it to do so. In fact, generically, mutual information between K and R will tend to undermine counterfactual incentives; see the argument in this footnote[5].

Low-probability worlds

In practice, if the AI has enough depth of data and decent function approximators, it will be able to estimate the outcome of the world for the variables (πF,kF,r0). But function approximators will suffer from the problem mentioned in this footnote[5:1]: they will tend to favour the decision actions π that were maximal before the counterfactual incentives were introduced.

As for estimating (πF,kF,r0) from the data, there is another problem: these values represent a very extreme and unlikely world, with potentially very odd behaviours. For example, maybe the user is an artist who doesn't even notice the relative power of the weapons (so R stays at r0), but will choose to buy the FPRMoD on aesthetic grounds. In that case, the AI will design its approach to all users, based only on the behaviour this artist.

Hence the behaviour of the AI starts to become unpredictable itself. This may result in behaviour that is surprisingly sensible[6] or completely crazy[7].

So, in more realistic and practical cases where R and K have mutual information, we generically expect that either the counterfactual incentives will not remove the manipulation that we are trying to remove, and/or that the AI's behaviour will become unpredictable.

So cannot understand how the AI behaves unless we grok the difference between (R,K) and (R0,K): what really happens when we insert R0 instead of R, while keeping K constant? Both R and K are crucial to this definition.

The general case: changed and unchanged variables

In the general situation, we need to understand the difference between

(variable to take the counterfactual over,variables *not* to take the counterfactual over)


(counterfactual value of variable,variables *not* to take the counterfactual over).

This is a problem, as the variables not to take the counterfactual over are often implicit rather than explicit.

Value indifference and causal indifference

Another way to address this problem is to create a workable counterfactual system without listing all the variables not included in the counterfactual. As an example, my original value indifference post used a counterfactual over a random future event - the counterfactual was that this event would take a specific predefined value. Since this counterfactual is in the future and is random, it is independent of all AI decisions in the present. It has no mutual information with anything at the present time for the AI[8].

  1. Let's simplify the setup as follows; the first graph is the standard setup, the second is its counterfactual counterpart:

    The AI acts through the decision node π. As before, U is the utility node. In the standard setup, the AI receives data on the values of A0, A, and U (and knows its own actions). Its learning process consists of learning probabilities of the various nodes. So, for any values a0, a, p and u of the four nodes, it will attempt to learn the following probabilities:

    P(U=u∣π=p,A=a), P(A=a∣A0=a0,π=p), P(A0=a0).

    Then, given that information, it will attempt to maximise U.

    In the counterfactual setup, the AI substitutes A0 for A. What that means is that it computes the probabilities as above, from the a0, a, p and u information. But it attempts to maximise Uc, the counterfactual utility. The probable values of Uc are defined by the following equality:


    Note that the P(U=uc∣π=p,A=a0) term can be estimated empirically, so the AI can learn the probability distribution on Uc from empirical information. ↩︎

  2. Patent ID US2016005270A1. ↩︎

  3. Note that we've slightly simplified the construction by collapsing "Original item rankings" and "Model of original item rankings" into the same node, R0. ↩︎

  4. One problem with these counterfactual incentive approaches is that they often allow bad policies to be chosen, just remove part of the incentive towards them. ↩︎

  5. For the moment, assume the AI doesn't get any R information at all. Then suppose that π′ is a "manipulative" action that increases $ via R. Then if K=k′ is an outcome that derives from π′, then AI will note a correlation between (π′,K=k′) and high $. This argument extends to distributions over values of K: values of K that are typical for π′ are also typical for high $.

    Now let's put the R information back, and add the counterfactual R=r0. It's certainly possible to design setups where this completely undoes the correlation between (π′,K=k′) and high $. But, generically, there's no reason to expect that it will undo the correlation (though it may weaken it). So, in the counterfactual incentives, there will generically continue to be a correlation between "manipulative" actions π′ and high P$ . ↩︎ ↩︎

  6. See the post "JFK was not assassinated". ↩︎

  7. See the third and fourth failures in this post. ↩︎

  8. The counterfactuals defined in the non-manipulated learning paper are less clear. The counterfactual was over the AI's policy - "what would have happened, had you chosen another policy". It is not clear whether this is truly independent of the other variable/nodes the AI is considering (though some of MIRI's decision theory research may help with this). ↩︎


Covid 1/21: Turning the Corner

21 января, 2021 - 19:40
Published on January 21, 2021 4:40 PM GMT

Aside from worries over the new strains, I would be saying this was an exceptionally good week.

Both deaths and positive test percentages took a dramatic turn downwards, and likely will continue that trend for at least several weeks. Things are still quite short-term bad in many places, but things are starting to improve. Even hospitalizations are slightly down. 

It is noticeably safer out there than it was a few weeks ago, and a few weeks from now will be noticeably safer than it is today. 

Studies came out that confirmed that being previously infected conveys strong immunity for as long as we have been able to measure it. As usual, the findings were misrepresented, but the news is good. I put my analysis here in a distinct post, so it can be linked to on its own. 

We had a peaceful transition of power, which is always a historic miracle to be celebrated.

Vaccination rollout is still a disaster compared to what we would prefer, with new disasters on the horizon (with several sections devoted to all that), but we are getting increasing numbers of shots into increasing numbers of arms, and that is what matters most. In many places we have made the pivot from ‘plenty of vaccine and not enough arms to put shots into’ to the better problem of ‘plenty of arms to put vaccine into, but not enough shots.’ Then all we have to do is minimize how many shots go in the trash, including the extra shots at the bottom of the vial, and do everything we can to ramp up manufacturing capacity. Which it seems can still be meaningfully done.

The problem is that the new strains are coming. 

The English strain will arrive first, within a few months. That’s definitely happening, and the only question is how bad it’s going to get before we can turn the tide. We are in a race against time.

The South African and Brazillian strains are not coming as fast, but are potentially even scarier. There are signs of potential escape from not only vaccination but previous infection, potentially allowing reinfection to take place. See the section on them for details, and if you can help provide better information, please do so. We need clarity on this, and we need it badly.

There are also all the other new strains being talked about, which are probably nothing, but there’s always the chance that’s not true.

But first, the good news, and it is very, very good. Let’s run the numbers. 

The Numbers Predictions

Prediction last week: 14.0% positive rate on 11.7 million tests, and an average of 3,650 deaths.

Results: 11.9% positive rate on 11.3 million tests, and an average of 3,043 deaths.

Both numbers are hugely pleasant surprises, and this is the biggest directional miss I’ve had on deaths. 

Last week we were at 3,335 deaths per day, and I figured things would keep getting worse for another week or two. Instead, things are already on their way to rapid improvement, unless there were massive shifts in when deaths were reported that made last week look worse than it was. 

For infections, I did predict a drop (last week was 15.2%) and we got a much more dramatic drop than I expected. This was wonderful news, and it seems like this should continue.

The caveat is that Tuesday and Wednesday of this week both look suspiciously good on both stats, such that I suspect missing data. I don’t know if somehow Martin Luther King Day actually mattered to reporting, or the inauguration and fears of disruptions around it were distracting, or what, but we should worry that this is getting a bit ahead of ourselves, even though test counts would indicate otherwise.

Test count predictions don’t seem worth doing, so going to stop doing those.

Prediction: 10.5% positive rate and 2,900 deaths per day. I’m being conservative because I worry about the drops from this week being data artifacts, but I am confident things are improving for now. Starting next week I’ll be expecting the IFR to start dropping substantially due to selective vaccinations.

Deaths DateWESTMIDWESTSOUTHNORTHEASTNov 19-Nov 251761416933961714Nov 26-Dec 21628381427421939Dec 3-Dec 92437550842862744Dec 10-Dec 163278532443763541Dec 17-Dec 233826515851313772Dec 24-Dec 303363366841713640Dec 31-Jan 64553412750194162Jan 7-Jan 136280396373834752Jan 14-Jan 205249338672074370

As noted above, this was expected to get much worse, and instead things started improving, although they’re still in a worse spot than two weeks ago. This is very good news, and it sheds new light on what has been happening in the past few weeks. If everything we’d seen previously had been fully reflective of the situation on the ground, we would not have seen a decline in deaths this week.

This graphic of cumulative deaths comes courtesy of Venkesh Rao on Twitter, seemed crisp and useful enough to include, from a few days ago:

Positive Test Percentages PercentagesNortheastMidwestSouthWest11/26 to 12/28.38%17.90%12.45%12.79%12/3 to 12/910.47%17.94%13.70%12.76%12/10 to 12/1610.15%15.63%15.91%13.65%12/17 to 12/239.88%14.65%15.78%13.82%12/24 to 12/3010.65%14.54%17.07%12.90%12/31 to 1/612.18%17.03%19.69%15.94%1/7 to 1/1311.70%14.81%18.14%15.12%1/14 to 1/208.50%11.32%15.75%11.53%

Test counts are up, positive test rates are down everywhere. Great numbers.

Positive Tests DateWESTMIDWESTSOUTHNORTHEASTDec 3-Dec 9354,397379,823368,596263,886Dec 10-Dec 16415,220315,304406,353260,863Dec 17-Dec 23439,493271,825419,230236,264Dec 24-Dec 30372,095206,671373,086225,476Dec 31-Jan 6428,407251,443494,090267,350Jan 7-Jan 13474,002262,520531,046306,604Jan 14-Jan 20360,874185,412452,092250,439

Good news all around, and overall test count was even up about 2%. 

Test Counts DateUSA testsPositive %NY testsPositive %Cumulative PositivesNov 19-Nov 2510,421,69711.8%1,373,7512.9%3.88%Nov 26-Dec 29,731,80411.8%1,287,0104.0%4.23%Dec 3-Dec 910,466,20413.9%1,411,1424.9%4.67%Dec 10-Dec 1610,695,11513.9%1,444,7254.9%5.12%Dec 17-Dec 2310,714,41113.7%1,440,7705.1%5.57%Dec 24-Dec 309,089,79913.8%1,303,2866.0%5.95%Dec 31-Jan 69,334,34516.4%1,365,4737.3%6.42%Jan 7-Jan 1311,084,29115.2%1,697,0346.6%6.93%Jan 14-Jan 2011,300,72511.9%1,721,4405.9%7.35%

In addition to the numbers listed, hospitalizations are also finally on the decline. I don’t generally track hospitalizations because I worry the limiting factor is often hospital beds, but seeing a decline is definitely a very good sign. 

Covid Machine Learning Project

Look at that vaccination line shoot upwards and the newly infected line start heading downwards. You love to see it. 

As of January 6 these projections had us at 24.5% infected, versus 23.4% a week before. This continues to be my rough lower bound for how many people have been infected. Herd immunity from infection is having a big and growing impact.


Relative regional progress remains unchanged. If you were behind last week, you’re almost certainly even further behind now. California continues to do an unusually disgraceful job with its vaccine rollout, which will be discussed extensively later. New York, for all the complaining that gets done about it, is doing relatively fine.

The headline number is 912k doses per day over the course of the week, the bulk of which were first doses. That’s not great, and it isn’t improving that quickly, but it’s much less disastrous than the worst scenarios that were being pondered. It’s also enough that we should start seeing the effect of those vaccinations in both infections and deaths soon if we aren’t seeing it already.


The first graph’s story is: All hail the control system. The United Kingdom is on its way back down in infections once again, despite the domination of the new strain. The peak was January 9th. Ireland is not pictured, but it peaked on the 10th at an even higher rate and is following a similar curve. 

The second graph’s story is that the new strain is still killing an awful lot of people before that happens. We are currently 12 days past the peak of infections, so the line should keep going up for a few more days. 

Now Spain is out of control. I don’t know if that is partly because of the English strain taking over, or entirely for other reasons. 

It seems clear that yes, with sufficiently strong restrictions and private reactions, the United Kingdom at least can stabilize the infection level against the new strain. I still am not sure if America could or would do the same under the same conditions. We’ll find out soon, with conditions that in some ways will be substantially more favorable, with more people vaccinated and more people having already been infected. 

The English Strain

Scott Gottlieb reached the same core conclusion I did, at least by January 17, that the new strain will likely double every week, so we’ll see a few weeks of declines and then things start getting worse again. So did Eric Feigl-Ding, and many others, including the CDC itself. It seems that my core conclusions of December 24 are now rapidly becoming the official Very Serious Person perspective.

The CDC is out with their analysis, accepting the basic premise of increased transmission and modeling outcomes. They are assuming a baseline of 0.5% of cases are the new strain at start of the year, which seems reasonable. I did some toy modeling, and they are doing some toy modeling, except with more toys and less models.

They split into the R0=1.1 and R0=0.9 scenarios for the current situation.

Note that they are assuming only 25% of cases are reported, so their immunity effect from infections is larger. 

Without vaccinations, they 

Then here it is with vaccination of 0.15% of the population per day (e.g. shots per day equal to 0.3% of the population, with two shots per person) or about the same as my model’s assumption:

For those who criticize me for not respecting the control system, the CDC says, what’s a control system and how do we talk to the people in charge?

Their recommendation, of course, is the same as it is for anything else. Universal compliance with existing policies, and more vaccinations. Thanks, CDC!

Here’s how things are progressing, right on schedule (CDC link):

The vaccines are confirmed to work on the English strain and I won’t bother sharing further similar findings unless they put this finding into doubt. 

The Other New Strains

What about all these other new strains, which many claim are even worse than the English strain? How likely is it that things are even worse? What do we know about these other strains?

Early in the week, we knew a lot of things that were possibly scary, but nothing definite. I know experimentation is illegal when it has to be done on people, but in this case the experiments we need can be done in a laboratory in approximately zero time for approximately zero dollars with approximately no risk to anyone – you see whether neutralizing antibodies from various sources are effective against various strains – so it’s (to put it politely) rather frustrating when no one runs the tests.

The test did get done later in the week, at least for the South African strain, and the results were quite alarming. I’ll get to that later in the section.

A question that isn’t getting enough attention is: Why suddenly all these strains now? 

I see a bunch of people saying things like this, starting a high quality infodump thread:

And few if any of them are acting at all suspicious about the whole thing. Whereas my instincts whisper: This Is Not a Coincidence Because Nothing Is Ever a Coincidence

As I noted last week, the timing seems highly suspicious. There are new mutations every day. Why are there suddenly so many scary new strains? 

It can’t be the vaccinations, because the timing doesn’t work on that. If conditions changed, it happened earlier, and it was something else. 

Could it be more use of masks or social distancing somehow applying stronger selective pressure for greater infectiousness? Seems like a stretch.

This was Trevor’s guess on the 14th, from later in the thread:

That’s plausible, but doesn’t explain why the chronic infections hadn’t done this earlier, and the English strain doesn’t escape immunity in this way (and we don’t know about the others) so I notice it doesn’t feel like it explains things. 

There are more people getting infected now than before, so it could be having more viruses around to mutate, but this seems too sudden for that alone to explain things. 

There are also more people who are immune, thus increasing selective pressure to escape from that (Stat News which seems high quality):

This makes sense as an escalating factor, but once again it does not seem like it could be escalating fast enough to explain the sudden phase shift. There are lots of places that were previously very infected, more infected than many of the places where the new strains are now emerging.

Perhaps what changed is largely our perception of what is scary? Which raises the question of whether we’re right to be terrified now, or were right before to mostly not be concerned?

Until the English strain, everyone was treating ‘there’s a new variant out there’ as nothing to be concerned about. Suddenly, every day there’s a new headline announcing another strain or travel restriction. 

The new Brazillian strain is potentially terrifying. This scared the English so badly that they banned travel from not only fifteen countries in Latin America, but Portugal as well. 

There are warnings that the Brazillian variant could have outright escaped not only the vaccine, but the immunity from previous infections. This would be very different from the English strain:

Supporting this is that areas of Brazil that were previously very hard hit, including Manaus with 75% seroprevalence, are being hit again. Now Eric Feigl-Ding says (in a long thread of good data) there are two Brazil variants but both can escape antibodies:

There’s no weasel ‘may’ in that claim, although I suppose ‘can’ still leaves room for a ‘mostly doesn’t.’

Under these circumstances, halting travel as a precautionary principle seems wise. Even if the probability of full escape remains low, the consequences are beyond dire. And as noted above, we should run the tests required to know the real situation. 

There’s reports of a new strain in California, 452R:

This feels like exactly the kind of thing that months ago would have been met with a giant shrug, a ‘mutations be mutating, what you gonna do’ and reminders of the importance of random factors, until more data shows up and we can run the necessary tests. 

That’s especially true given this:

The problem is, California is rather large and has a lot of infections, and you need to explain this along with California having it especially bad right now:

Given that the strain was identified in Denmark in March, it seems unlikely that it could be that much more infectious, or we would already know. One case in California in May, doubling every week, would have fully infected the country if the control systems hadn’t kicked in at some point.  

There’s also the standard ‘we don’t know if the vaccine works on this variant’ talk because this modifies the spike protein. So again, we need to run the tests, but it seems unlikely that there’s a problem. It’s not like California is doing much selecting for vaccine resistance.  

The really scary one, at this point, is the South African strain, because it looks a lot like it reduces neutralization capacity (study preprint), which likely means it can reinfect people. It’s worth quoting a lot of Trevor’s thread:

From what I’ve seen, the expectation is that the vaccine won’t work as well as before, but should still work, and could easily be updated if needed. This is from one of the authors:

As more people are vaccinated and more people have been infected, the selection pressure for strains that escape the vaccine and/or escape prior infection intensifies, and so does the danger of leaving more people only partially protected via vaccination. With the new strains, it is becoming less clear that it would be wise to delay second doses for too long. I’d still be very strongly in favor of not holding second doses in reserve but am becoming more receptive to the precautionary principle, which suggests that we might not want to let people wait around for many months. Either we have the vaccine capacity required to re-vaccinate, in which case we can afford to give everyone two doses, or we don’t, in which case we won’t be able to re-vaccinate quickly if we need to do that.

South Africa’s CDC has issued a rather dire warning:

That’s the kind of thing a CDC would be inclined to say in terms of behavioral prescriptions, whether it was appropriate or not. What’s important and terrifying is that there was so much loss of antibody effectiveness. 

Of course, all of this only emphasizes how important it is now to increase capacity. If we spent a few billion to ramp up mRNA vaccine production capacity now, then by the time the South African strain becomes a problem, we’d have the ability to fix this via re-vaccination, and without effectively taking those doses away from the third world. 

Also of course, we should have very strict travel restrictions around South Africa so we can slow the problem down long enough to get to that point.

But What Do We Do Now?

An interesting pivot seen this week is from ‘everyone wears a mask’ to ‘everyone wears an effective mask.’ 

Until this year, the battle to get people to even pick up a piece of cloth was so much trouble that there was little attempt to do more than that. Periodically we’d say that a surgical mask or N95 was better than a cloth mask, but the overwhelming agreement was to emphasize ‘mask at all.’ If you pressure people to choose better masks, it risks making people throw up their hands and not care at all. 

Now a variety of sources have decided that this won’t cut it, slash certain forces are out of the picture, and we now need to step up our game and push for better masks. 

I’m happy to get behind this attempt by the control system to stay on target. There is an abundance of high-quality masks available for sale on Amazon, and I was quickly able to find one that didn’t feel substantially more annoying than a cloth mask. So I encourage everyone who hasn’t yet done so to up their mask game. 

Note that there is most definitely a More Dakka version of this where you get the fully effective $2,000 filtration systems going, and it’s overwhelmingly correct to just do that, if anyone actually does use them can you share your experiences? 

What about we as in the new administration? This seems to be a list of day one actions:

So out of 23 things, 3 of them concern Covid, whereas 9 concern Racial Equity, one of which involves a fully written bill being sent to Congress. One of the 3 concerning Covid is to not leave the W.H.O. The second is a mask mandate for the places a president can issue a mask mandate, which is better than not doing it. The third is to establish a structure for future action. Arguably the two economic actions are also Covid-related.

This is better than doing actual nothing for months on end, but for now it’s also remarkably similar. 

You’re Vaccinated, Now What?


Israel is seeing the impact and no that can’t be the lockdown:

Yet, as was pointed out last week, many are telling the vaccinated they still have to engage in the same behaviors as everyone else, including avoiding indoor gatherings and maintaining social distancing. These Very Serious People are looking for any way to get people to take any precaution. The Sacrifices to the Gods must continue.

To prove this, they say there is “no evidence” that vaccines prevent transmission. Then if necessary they’ll retreat to “no proof” that they prevent transmission. Then they’ll retreat to the inner motte of “no proof that they entirely 100% prevent transmission.” 

Of course they prevent transmission. Not 100%, we don’t know the exact percentage, but a lot. 

A long term worry on this is that it will make people not want the vaccine, since what’s the point if you still can’t live your life. Then again, one could reasonably say if one were in the “lying to the American people for their own good” business, that’s a problem for Future America. Right now, there are more people who want the vaccine than there are vaccine doses. Which is true in all places that are open to those over 65. We can then turn around and tell other lies later, to solve those future problems, such people would say, and no we are not worried about our credibility, we are the authorized official sources and anyone who disagrees with us should be censored on social media. 

I have exactly PoliMath’s position on masks after vaccination:

Here’s Nate Silver, who also occasionally does some math and also has some understanding of public messaging:

I intend to wear my mask after vaccination, if I can be vaccinated in time for that to matter, in order to reinforce mask norms. It’s easy to wear a mask. There’s even some tiny chance it might physically matter, and again, it’s easy to do. As opposed to continuing to do costly social distancing, and yeah, no. 

This is the attempt to both be honest and split the needle:

This uses the weasel framing of “can” in the third claim, which is technically correct but is chosen to scare people. It is possible that this plane will crash, better drive instead.

Meanwhile, they call saying the vaccine prevents transmission “hiding the truth”:

The difference here is that this would be “hiding the truth” to say things are safe, which is not fine, as opposed to “hiding the truth” to say things are not safe, which is encouraged.

The one point of evidence potentially pointing the other way comes from Israel, where we’re getting some truly bizarre and troubling data about positive test rates for the recently vaccinated. I couldn’t locate the original study, so I’m going by the news report, if you can link to the study please do so in the comments. 

Here’s my summary of key data points:

Tests up to 7 days after vaccination had a 5.4% positive rate. Vaccine shouldn’t be protecting them, so consider this a baseline, out of 100,000 tested.

Tests between days 8 and 14 had an 8.3% positive rate, which is super high, higher than baseline. Perhaps those who get vaccinated go out and have parties quickly? This is out of 67,000 tested.

Tests between days 15 and 21 still had a 7.2% positive rate. Still super high. This is out of 20,000 tested.

Tests between days 22 and 28 were 2.6% positive, including some people vaccinated twice, although the second dose hadn’t had much time to work. This is out of only 3,200 tested.

We don’t know how they determined when and whether to test people. If testing was only done when there was a reason, these numbers don’t worry me. If testing was done at random, then this is rather alarming. The declines in numbers of tests run could be because people are being vaccinated in real time, or because they were only testing people when it seemed necessary, and therefore as time went by they tested less people. 

The 100,000 number is very suspiciously round, making me think that testing was randomized. Israel’s general positive test rate has been rising recently up to about 7%, so that seems like a stretch – testing at random should cause a lower positive rate than that, but perhaps they stopped collecting results after 100k of them? The rise in the second week makes sense if you think those 67,000 tests weren’t at random, or the timing could be weird as the situation was changing rapidly. In any case, without better data, hard to tell.

We know from elsewhere that there’s a lot of protection by day 10, but the positive test rates here did not decline much until substantially after that. 

Without the original source, a lot of key information is lacking, so it’s hard to interpret the information we do have. 

They did also show that antibody responses were robust:

Ideally I’d withhold analysis until I had a better understanding here, but we don’t have that luxury these days. In any case, it’s data that needs to be explained.

Yes, We Can Agree Andrew Cuomo Is The Worst

In good New York State news, the state continues to open and operate additional vaccination sites. The fifth was in SUNY Albany on the 15th. They’ll need to close or slow down soon due to lack of supply, but that’s the right problem to have. 

Cuomo somehow managed to mangle the restrictions sufficiently that restaurants suing for the right to provide indoor dining won in court, so Cuomo is now largely giving up on the zone-based restrictions:

This is part of the general sudden pivot from ‘we must contain the virus’ to ‘we must save the economy’ that is happening in many places right now. The timing seems, shall we say, suspicious, but also a lot is changing quickly. 

Cuomo will pay the legal action forward by threatening to sue the Biden Administration to get more vaccine doses:

The entire NYS budget is split in two, with Cuomo demanding Washington give him money or else, and saying ‘look what you’d make me do if I didn’t have the money.’

The entire system is a giant mess and puts seniors in an impossible position, although to be fair I’ve seen reports that this is mostly true in other places as well:

The Quest to Sell Out of Covid Vaccine 

Good news, everyone. We’re much closer to successfully using all our doses than previously expected, because that reserve of vaccine that Secretary Azar claimed we’d be releasing? It never existed. There were no held back second doses, the government now claims, instead we had far fewer doses than we were led to believe:

The whole thread is wild. Despite the administration shipping out its entire reserve, Pfizer released a statement saying it has second doses on hand for everyone who needs them:

If the government now claims to not have a reserve, after previously claiming it was going to release that same reserve, but Pfizer claims that instead it has the reserve ready to go, what the hell is actually going on

And it’s not only the feds, the states and Pfizer, potentially counties and cities could have their own reserves. New York City does!

It isn’t clear to me whether there was never any reserve of second doses on a mass scale, or if there was a reserve but it’s already gone, or there were two reserves of second doses sitting around idle and now one of them has been deployed, or even possibly if there were three or more distinct reserves of second doses because yes we really are that dumb, and it seems states are talking about distinct distributions of “their” first and second doses and took delivery in two sections, or something? Then combine that with city or county reserves.

I don’t think there were an average of two distinct reserves let alone three or more, but it’s so confusing I can’t rule anything out. 

None of the potential answers cover us in glory.

The quest of then selling out what has been distributed goes better in some places than others. 

Here’s one theory of what went wrong. (Link to WSJ)

New York City is now crossing over into the camp that has successfully sold out (while of course holding onto a complete reserve of second doses for everyone who got a first dose):

This in fact happened, and Thursday and Friday first dose appointments were cancelled en masse. 

How are some places doing the rollout much faster than others? Here’s a CNN article about that, suggesting what matters is basic logistics and planning in advance, and an emphasis on speed. If you focus on allocation to where vaccine can be used, it gets used. Makes sense to me. I’d also add that such techniques require de-emphasizing prioritization, and not threatening people with huge penalties for giving the wrong person a vaccine shot.

If you don’t want to succeed, there are always plausible ways to not succeed. For example,

California has decided to not administer what seems to be hundreds of thousands of doses in a giant Moderna shipment, while they ‘investigate possible severe allergic reactions,’ all of which occurred at only one location, and while as far as I can see none of the other states that got the rest of that shipment (almost a million doses have been given out) either are halting use or reporting any concerns.   

This seems like the latest variation on ‘make vulnerable elderly people sit together indoors in close quarters for observation after getting vaccinated, to monitor for extremely rare reactions, thus exposing them to infection right before they become immune.’

The new administration looks to be moving ahead as quickly as possible to distribute via pharmacies, which seems ideal, and also to use FEMA and the National Guard for distribution, which doesn’t seem like it should be necessary but given how things are going, sure, why not try throwing everything at the wall and seeing what sticks. 

The math on selling out via pharmacies on their own seems rather strong:

If CVS can do a million shots a day, that alone is the entire goal of Biden’s 100 million shots in 100 days. 

CVS has less than 10,000 pharmacies in the United States, out of a total of about 88,000 pharmacies. That seems eminently doable, with the limiting factor being supply.

Here’s a thread on all the people who could put shots in arms now, if we had both shots and arms but needed professionals to bridge the gap. 752k practicing physicians, 3.1mm registered nurses, 125k physician’s assistants, 265k paramedics and EMTs, 322k pharmacists, 422k pharmacy technicians. Professionals simply are not the limiting factor. Full stop.

How is the experience trying to book an appointment for your elderly parents? It could be compared to trying to get concert tickets. We’ve gone over problems in New York, and it seems similar issues exist everywhere. Information is in different places, confusing and contradictory. Everything is booked, no one has supply, the people who want it make tons of calls and try lots of methods. That link has information by state, including links to everyone’s websites and phone numbers. Hopefully that can help.

Vaccine Allocation By Politics and Power

Patrick McKenzie sums up what happens when you go around threatening anyone who disrupts the properly ethical priority order with personal ruin, as New York and California have done:

When you emphasize how bad it is to ‘jump the line’ you also get stories like this:

Or like this:

Or this, from of all places TMZ:

That’s also how you get outcomes like this via the LA Times:

To summarize, emphasis on prioritization has led to large amounts of vaccine sitting around unused because people are waiting for ‘authorization’ to use it, to people blocking valid appointments at vaccination sites, and also to only 5% of the actual most vulnerable people, the group that is 1% of the population and over a third of the deaths, getting their shots a month into the campaign. 

Meanwhile, in Florida, they’re requiring government ID and proof of residency to get vaccinated, to avoid accidentally giving doses to undocumented immigrents, or to people who noticed Florida was doing a decent distribution job and came down to get vaccinated. Nebraska is attempting to exclude immigrants as well. This will doubtless trip up numerous people, especially poor people, who lack or forget or are afraid of using proper documentation:

Meanwhile,in Wales (Twitter HT), they are delaying vaccinations so the curve of vaccination times is smoother for each individual vaccine type, no really they are literally doing that, what more do I have to say:

Prioritization by Lack of Virtue

What do politics and power reward and punish, in the end?

At some point, the system stops pretending it is rewarding virtue and punishing lack of virtue. 

Then, at some point, the system stops pretending it is not punishing virtue, and starts punishing virtue and rewarding lack of virtue.

The official CDC recommended guidelines suggest prioritizing those with various ‘chronic conditions’ and include giving priority to smokers.

This is being followed in at least Alabama, Nevada, New Jersey, Mississippi and Washington D.C.

In other words: If you, on a regular basis, pay for and then consume poison, then that puts you at higher risk, so we will prioritize that you get life-changing and life-saving medicine before others who do not on a regular basis consume poison. 

Every year, the poison in question kills more people, and costs more years of life, than Covid-19 was responsible for in 2020. It is highly plausible that, should this guideline be followed, smoking would gain status, people would have a new excuse for their smoking or not quitting, and this act alone could result in sufficiently more smoking to be a bigger health cost than the entire Covid-19 pandemic.

I expect that, for the rest of time, anyone who wants to justify smoking, or not having a healthy weight, or any other issue they don’t want to deal with, will often pull out “hey, at least it’ll get me priority health care!”

In addition, did you know you can just lie about this? It’s not as if they check in any way whatsoever. So…

In addition, there’s the question of whether you are sufficiently shameless to use the fact that you smoke to step in line ahead of an elderly person who is at actual risk in a way that has nothing to do with their life choices. So in that sense they are prioritizing the selfish and shameless.

Most of all, they are prioritizing liars

You don’t even have to say what you’re lying about! In DC you can simply say you have one of the conditions, never mind which one, and get vaccinated at age 17:

Here’s the actual prioritization scheme they’re about to have in Washington, DC, then:

Would you like a vaccine? If so, check this box. 

At that point, what are the ethics of checking that box? Should this kind of rule be respected? 

Do you think people will respect such a rule? What will that do to their respect for such rules in general?

Once you add obesity as a chronic condition, everybody knows that the dice are loaded, the system’s sole purpose is to punish he honest and honorable, and we’d wish there was no prioritization at all:

Note that starting at 25 not only includes the majority of people, thus making sure that the elderly can’t get vaccinated any time soon, it also doesn’t make physical sense at all even if you buy the supposed premise of people being at higher risk:

This is the ultimate result of allocation by politics and power. Those who learn to work the system, to invest their resources in such games, to be comfortable using special rules and appropriating from others, get the scarce resources. 

Those who play by ‘the rules’ and do ‘what is fair’ are left out in the cold. If you did what the Responsible Authority Figures said to do, you’re now behind most other people and will have to spend additional months of your life hiding at home while those who smoke or are overweight or just decided to lie about it frolic around town like it is nothing.

So let’s be clear. If you don’t want to have priority, you can just… not have priority. Allocate by willingness to make phone calls or stand in lines or reload web pages. Find out who is willing to destroy more real resources.

You could also allocate by willingness to pay more real resources rather than destroy more real resources, but whenever I talk about the only known good way to allocate scarce resources people get into demon threads complaining about how that is Just Awful, so once again I’m not going to suggest that.

I strongly suspect (and hope) that there will be a lot of vaccination sites that are told that this is the priority list, but if you call them and say you’re eligible because you are a smoker or have a BMI of 27, suddenly there won’t be any appointments available and you’ll be put on a waiting list and never called back. 

Luckily, it seems the majority of states realize what these CDC guidelines imply, and are mostly disregarding them.

I considered not writing this section to avoid highlighting the issue, because highlighting the issue risks accelerating the negative consequences involved, and it didn’t seem like anyone was noticing this. Then Nate mentioned it, and I wrote the section. 

On reflection, I shouldn’t have hesitated. 

This is not a small effect. This could easily, where adopted, delay an honest and honorable person’s vaccine access by several months.

Not mentioning destructive behaviors because the noticing of such behaviors creates destruction is a horrible, horrible incentive that leads to harmful crimes being continuously covered up and rewarded. If one sees something, one must say something. If the law is unjust, one should not keep quiet about that out of fear that more others will then notice the law is unjust and might take advantage of it. 

The silver lining of such policies is they absolutely create enough eligible arms in which to put all the shots. This is systematic injustice for injustice’s sake, but at least it does get shots into arms.

Useful Resources

VaccinateCA is a project that calls hospitals and pharmacies in California daily, and checks which are currently administering vaccines. I heard about it from several sources, originating with Patrick McKenzie

Here’s how necessary that project is from another angle:

Here’s a similar project in Massachusetts.

Here’s a similar project in Texas.

Here’s the start of something similar for New York City.

If you’d like to direct few-questions-asked funding to a similar operation to VaccinateCA in another state, I know someone looking to do that, and I’m happy to direct you to that person if you contact me via email, Twitter DM or LessWrong PM. 

Or, if you know about existing similar places for other states, share in the comments, and I’ll include in future updates.

This CNN article linked above has some useful phone numbers and websites to try.

Note that different locations have decided to use different standards for who they will vaccinate. Some are allowing anyone 65+, others are only allowing 75+. Of interest to many readers, Alameda County’s three cites are all (as of writing this section on Tuesday) only doing 75+. Other areas are place after place with no supply.

To get an appointment in New York State from the state’s facilities (as opposed to other places, or using NYC’s system) you are officially asked to start here. It looks like some upstate places have appointments available. It’s up to you to decide how far you’re willing to travel. My answer would be quite far. 

If you’re looking in NYC you could try starting here or look here but I expect best answers to change. There is also now the NYC vaccine list above.

How Bad is it Out There Right Now?

It’s so bad that the states are starting to turn to actual logistics experts. No, not Amazon. Starbucks!

To be clear, I do not say this to mock. This is a very very good development. Let the experts do what they do best.

Meanwhile, in Los Angeles, we have moved on from leaving people to die without transporting them to hospitals, to then having to temporarily suspend air-quality regulations in order to cremate them when they die:

You Should Know This Already 

There are extra vaccine doses in the vials, but you can only fully extract them with a low dead space syringe and we are not reliably using such syringes, wasting a substantial percentage of all potential vaccine doses. This could plausibly be a much bigger loss than throwing unused doses away at day’s end.  

Once again, do not throw doses in the garbage. In an important sense this is the most important thing to care about, for most people, on the margin. Of course, the hospital gets attacked for breaking with ‘priority’ and also roasted alive for wasting doses, meaning that they keep everything quiet and destroy all records of what happened. The key is to choose the side to be on in the dark

Matt Yglesias makes the case for vaccine challenge trials. He makes a strong and clear case, which is admittedly easier when something is overwhelmingly obviously correct. In any case, additional voices on this are always welcome. 

Mentioned from another source above, but reminder: Israeli study on Pfizer vaccine sees 100 of 102 develop significant antibodies, editor says participants likely won’t spread virus further. 

Periodic reminder: Pay for something and you get more of it, well, maybe, but yeah, you do: Utilization of the United Kingdom’s Covid subsidies by region correlates to cases of Covid (pdf).

(Not the biggest concern these days, but can we stop using the word ‘may’ and then providing a range? Studies show I may have between 2 and 17 burgers for lunch next Tuesday.)

The program details were even worse than you know:

This is, shall we say, most definitely not how any of this works (via MR) and explaining why would only insult your intelligence:

Your periodic entirely correct rant that we should consider allocating scarce resources by price rather than by politics and power, and letting people do the things they want to do to stop the pandemic because that would actually work, from The Grumpy Economist John Cochrane. 

Europe has been informed that it will get fewer vaccine doses from Pfizer than expected for a while, so that they can upgrade their factory and produce more doses in the future. That’s an excellent reason to temporarily produce fewer doses, given you are in a world where there’s no better way to increase production capacity that you already implemented months ago for trivial amounts of money. It seems that Europe negotiated a lower price point in exchange for going to the back of the line, so now they’re going to the back of the line

The Very Serious People will always criticize anyone who does not defer to the Very Serious People, even when they are obviously wrong, as we are reminded this week by two old Guardian links from Marginal Revolution. 

First, the Very Serious People declared that since Brexit was in defiance of them and their dictates, that bad things must follow, so they declared pulling out of the European Medicines Agency would slow down the UK’s vaccine rollout. Because, somehow, the union that spends all day telling people what they cannot do and how exactly they must do everything that is still done, and being unable to make any decisions, would obviously get the vaccine first. By implication it is unforgivable therefore to leave since that will deny your people the vaccine. Instead you should participate in the EU’s plan to… negotiate a lower price in exchange for going to the back of the line. 

Then in July, when the United Kingdom decided not to take part in the European Union’s plan of paying less for vaccines in exchange for going to the back of the line while putting lots of regulatory hurdles in place, that was called ‘unforgivable’ because it would ‘set the UK up as a competitor’ and because the UK might decide to secure more doses rather than less doses:

What is morally unforgivable to the Very Serious People? Not going along with their schemes (the title of the Guardian article actually calls it a “scheme” by name) and deciding instead to attempt to create better outcomes rather than worse outcomes, instead of their explicit calls to aim for worse outcomes rather than better outcomes. No wonder, for example, that every single super-rich person was terrified to be seen actually helping. I know it’s easy to not see such statements but perhaps consider that they could be literally true?

Your periodic reminder that people are crazy and the world is mad and none of the rules make about children make any sense:

In Other News

With the new administration, the CDC will now review all of its guidance on everything:

There are two recent studies out about immunity to Covid coming from past infection. My analysis of those studies is available in its own post rather than as part of this post, so it is easy to link to. 

Israel had already secured vaccines in part by promising to provide good data in return. Now it seems they’ve struck another data-for-vaccine deal. For everyone who says there are no more doses to be had, it’s gotta be odd that more doses keep being had. 

Meanwhile the W.H.O. thinks that countries and companies should stop making deals entirely, so they can direct all the vaccine shots wherever they think is best. Anyone who disagrees with this, they declare, is deeply unethical. How dare people with money pay for things to be created, and then take delivery of those things! The horror. Yes, that logic has other implications. Remember to be consistent. 

Could it be happening? Please?

If the attack on the plan is ‘how dare this not have happened sooner’ then that’s perfect, let’s do it now and yell at each other about how awful and political the timing was, come on, everyone, we can do this:

You know who isn’t wasting doses or time? The Department of Corrections!

PoliMath assumes this is a data error, but my presumption is that this is no error. There are extra doses in each vial, so it’s perfectly reasonable to get a few percent more shots in then there were doses allocated to you. That should be the standard by which one is judged. 

Via MR, this long detailed post goes over the mRNA vaccine supply chain. Most of the steps, while non-trivial in an important sense, seem straightforward to scale as far as we’d need to scale them, including making the mRNA itself. It’s known tech. 

The limiting factor seems, according to this article, to be Lipid Nanoparticle (LNP) production. I don’t know anything about that process beyond what is seen here, so I don’t know how much that could be scaled or at what cost. There weren’t any indications we were punching anywhere near the limits of what could be done.

Studies suggest saliva tests are as accurate as swab tests while being cheaper and easier to use (synthetic review 1, review 2). 

It is almost certainly safe to be vaccinated while breastfeeding.

Marginal Revolution links to a Reason interview of Tyler Cowen on First Doses First.

Claim that NSAIDs dampen immune response to Covid in mice.

This seems like a good method of explaining how to stay safe:

For Those Who Actively Want to Give Me Money

An increasing number of people have asked about giving me money, to show appreciation for these posts and the work required to create them. You really really don’t have to do this! I don’t need the money! I don’t do this for money, I have a day job and I don’t need to worry about money any time soon. 

But if you choose to contribute, I believe this would be motivating rather than demotivating, and you have my thanks.  

If you wish to do this on a small scale, I have set up a Patreon for the blog as a means to do that. There won’t be any rewards beyond things like ‘I am happy and motivated, and I respond more to your comments.’ There won’t be any locked posts.

If you want to give enough that the fees involved in Patreon are worth avoiding, you can PM me on LessWrong or DM me on Twitter, or email me, and I’ll provide details for PayPal or the relevant crypto address. 

Once again, please do not consider yourself under any obligation whatsoever to do this. It brings me joy that others are finding these updates useful, and ideally spreading the word about them and putting the information and ideas into practice, and that we are building better models of the world together. That’s what is important.

Until next week.


The Problem of the Criterion

21 января, 2021 - 18:05
Published on January 21, 2021 3:05 PM GMT

I keep finding cause to discuss the problem of the criterion, so I figured I'd try my hand at writing up a post explaining it. I don't have a great track record on writing clear explanations, but I'll do my best and include lots of links you can follow to make up for any inadequacy on my part.


Before we get to the problem itself, let's talk about why it matters.

Let's say you want to know something. Doesn't really matter what. Maybe you just want to know something seemingly benign, like what is a sandwich?

At first this might seem pretty easy: you know a sandwich when you see it! But just to be sure you ask a bunch of people what they think a sandwich is and if particular things are sandwiches are not.

Uh oh...

from the internet

You've run headlong into the classic problem of how to carve up reality into categories and assign those categories to words. I'll skip over this part because it's been addressed to death already.

So now you've come out the other side accepting that "sandwich", like almost all categories, has nebulous boundaries, and that there is no true sandwich of which you can speak.

Fine, but being not one easily deterred, you come up with a very precise, that is a mathematically and physically precise, definition of a sandwich-like object you call an FPS, a Finely Precise Sandwich. Now you want to know whether or not something is an FPS.

You check the conditions and it all seems good. You have an FPS. But wait! How do you know each condition is true? Maybe one of your conditions is that the FPS is exactly 3 inches tall. How do you know that it's really 3 inches tall?

Oh, you used a ruler? How do you know the ruler is accurately measuring 3 inches? And furthermore, how do you know your eyes and brain can be trusted to read the ruler correctly to assess the height of the would-be FPS? For that matter, how do you even know what an inch is, and why was 3 inches the true height of an FPS anyway?

Heck, what does it even mean to "know" that something is "true"?

If you keep reducing like this, you'll eventually hit bottom and run into this question: how do you know that something is true? Now you've encountered the problem of the criterion!

The Problem

How do you know something is true? To know if something is true, you have some method, a criterion, by which you assess its veracity. But how do you know the criterion is itself true? Oh, you have a method, a criterion, by which you assess its veracity.

Oops! Infinite recursion detected!

The problem of the criterion is commonly attested to originate with Pyrrho. Roderick Chisholm, the modern philosopher who rejuvenated interest in the problem, often phrases it as a pair of questions:

  • What do we know? What is the extent of our knowledge?
  • How are we to decide whether we know? What are the criteria of knowledge?

This creates a kind of epistemic circularity where we go round in circles trying to justify our knowledge with our knowledge yet are never able to grab hold of something that itself need not be justified by something already known. If you like, it's the symbol grounding problem generalized (cf. SEP on metaphysical grounding).

To that point, the problem of the criterion is really more a category of related epistemological problems that take slightly different forms depending on how it's manifested. Some other problems that are just the problem of the criterion in disguise:

Really any kind of fundamental uncertainty about what is true, what is known, what can be trusted, etc. is likely a presentation of the problem of the criterion.

Okay, so we have this problem. What to do about it?

Possible Responses

First up, there are no known solutions to the problem of the criterion, and it appears, properly speaking, unsolvable. Ultimately all responses either beg the question or unask, dissolve, or reframe the question in whole or in part. Nonetheless, we can learn something from considering all the ways we might address it.

Chisholm argues that there are only three possible responses to the problem: particularism, methodism, and skepticism. I'd instead say there are only three ways to try to solve the problem of the criterion, and other responses are possible if we give up finding a proper solution. As mentioned, all these attempts at solutions ultimately beg the question and so none actually resolve the problem—hence why it's argued that the problem is unsolvable—but they are popular responses and deserve consideration to better understand why the problem of the criterion is so pernicious.

Particularism is an attempt to resolve the problem of the criterion by picking particular things and declaring them true, trusted, etc. by fiat. If you've ever dealt with axioms in a formal system, you've engaged in a particularist solution to the problem of the criterion. If you're familiar with Wittgenstein's notion of hinge propositions, that's an example of particularism. My impression is that this is widely considered the best solution since, although it leaves us with some number of unverified things we must trust on faith, in practice it works well enough by simply stopping the infinite regress of justifications at some point past which it doesn't matter (more on the pragmatism of when to stop shortly). To paraphrase Chisholm, the primary merit of particularism is that the problem of the criterion exists yet we know things anyway.

Methodism tries to solve the problem of the criterion by picking the criterion rather than some axioms or hinge propositions. Descartes is probably the best known epistemic methodist. The trouble, argues Chisholm and others, is that methodism collapses into particularism where the thing taken on faith is the criterion! Therefore we can just ignore methodism as a special case of particularism.

And then there's skepticism, arguably the only "correct" position of Chisholm's three in that it's the only one that seemingly doesn't require assuming something on faith. Spoiler alert: it does because it still needs some reason to prefer skepticism over the alternatives, thus it still ends up begging the question. Skepticism is also not very useful because even though it might not lead to incorrectly believing that a false thing is true, it does this by not allowing one to believe anything is true! It's also not clear that humans are capable of true skepticism since we clearly know things, and it seems that perhaps our brains are designed such that we can't help but know things, even if they are true but believed without properly grounded justification. So, alas, pure skepticism doesn't seem workable.

Despite Chisholm's claim to those being the only possible responses, some other responses exist that reject the premise of the problem in some way. Let's consider a few.

Coherentist responses reject the idea that truth, knowledge, etc. must be grounded and instead seek to find a way of balancing what is known with how it is known to form a self-consistent system. If you're familiar with the method of reflective equilibrium (SEP), that's an example of this. Arguably this is what modern science actually does, repeatedly gathering evidence and reconsidering the foundations to produce something like a self-consistent system of knowledge, though at the price of giving up (total) completeness (LW, SEP).

Another way of rejecting the premise is to give up the idea that the problem matters at all via epistemic relativism. In its strongest form, this gives up both any notion of objectivity or intersubjectivity and simply accepts that knowledge is totally subjective and ultimately unverifiable. In practice this is a kind of attractor position for folks burnt out on there-is-only-one-ontology scientism who overcorrect too far in the other direction, and although some of the arguments made for relativism are valid, complete rejection or even heavy devaluing of intersubjectivity makes this position essentially a solipsistic one and thus, like extreme skepticism, not useful for much.

Finally, we come to a special case of particularist responses known as pragmatism., and it's this kind of response that Yudkowsky offered. The idea of pragmatism is to say that there is some purpose to be served and by serving that purpose we can do an end-run around the problem of the criterion by tolerating unjustified knowledge so long as it works well enough to achieve some end. In Yudkowsky's case, that end is "winning". We might summarize his response as "do the best you can in order to win", where "win" here means something like "fulfil your purposes or concerns". I'd argue this is basically right and in practice what most people do, even if their best is often not very good and their idea of winning is fairly limited.

Yet, I find something lacking in pragmatic responses, and in particular in Yudkowsky's response, because they too easily turn from pragmatism to motivated stopping of epistemic reduction. If pragmatism becomes a way to sweep the problem of the criterion under the rug, then the lens has failed to see its own flaws. More is possible if we can step back and hold both the problem of the criterion and pragmatism about it simultaneously. I'll try to sketch out what that means.

Holding the Problem

At its heart, the problem of the criterion is a problem by virtue of being trapped by its own framing. That is, it's a problem because we want to know about the world and understand it and have that knowledge and understanding fit within some coherent ontological system. If we stopped trying to map and model the world or gave up on that model being true or predictive or otherwise useful and just let it be there would be no problem.

We're not going to do that, though, because then we'd be rocks. Instead we are optimization processes, and that requires optimizing for something. That something is what we care about, our telos, the things we are motivated to do, the stuff we want, the purposes we have, the things we value, and maybe even the something we have to protect. And instrumental to optimization is building a good enough map to successfully navigate through the territory to a world state where the something optimized for is in fact optimized.

So we're going to try to understand the world because that's the kind of beings we are. But the existence of the problem of the criterion suggests we've set ourselves an impossible task that we can never fully complete, and we are forced to endeavor in it because the only other option is ceasing to be in the world. Thus we seem inescapably trapped by the tension of wanting to know and not being able to know perfectly.

As best I can tell, the only way out is up, as in up to a higher frame of thinking that can hold the problem of the criterion rather than be held by it. That is, to be able to simultaneously acknowledge that the problem exists, accept that you must both address it and by virtue of addressing it your ontology cannot be both universal and everywhere coherent, and also leave yourself enough space between yourself and the map that you can look up from it and notice the territory just as it is.

That's why I think pragmatism falls short on its own: it's part of the response, but not the whole response. With only a partial response we'll continually find ourselves lost and confused when we need to serve a different purpose, when we change what we care about, or when we encounter a part of the world that can't be made to fit within our existing understanding. Instead, we need to deal with the fundamental uncertainty in our knowledge as best we can while not forgetting we're limited to doing our best while falling short of achieving perfection because we are bounded beings.

NB: Holding the problem of the criterion is hard.

Before we go further, I want to acknowledge that actually holding the problem of the criterion as I describe in this section is hard. It's hard because successfully doing what I describe is not a purely academic or intellectual pursuit: it requires going out into the world, actually doing things that require you to grapple with the problem of the criterion, often failing, and then learning from that failure. And even after all that there's a good chance you'll still make mistakes all the time and get confused about fundamental points that you understand in theory. I know I certainly do!

It's also not an all at once process to learn to hold the problem. You can find comments and posts of mine over the past few years showing a slowly building better understanding of how to respond to the problem of the criterion, so don't get too down on yourself if you read the above, aspire to hold the problem of the criterion in the way I describe, and yet find it nearly impossible. Be gentle and just keep trying.

And keep exploring! I'm not convinced I've presented some final, ultimate response to the problem, so I expect to learn more and have new things to say in time. What I present is just as far as I've gotten in wrangling with it.


Having reached the depths of the problem of the criterion and found a way to respond, let's consider some places where it touches on the projects of our lives.

A straightforward consequence of holding the problem of criterion and adopting pragmatism about it is that all knowledge becomes ultimately teleological knowledge. That is, there is always some purpose, motivation, or human concern behind what we know and how we know it because that's the mechanism we're using to fix the free variable in our ontology and choose among particular hinge propositions to assume, even and especially if those propositions are implicit. Said another way, no knowledge is better or worse without first choosing the purpose by which knowledge can be evaluated.

This is not a license to believe whatever we want, though. For most of us, most of the time, our purpose should probably be to believe that which best predicts what we observe about the world, i.e. we should believe what is true. The key insight is to not get confused and think that a norm in favor of truth is the same thing as truth being the only possible way to face reality. The ground of understanding is not solid, so to speak, and if we don't choose to make it firm when needed it will crumble away beneath our feet.

Thus, the problem of the criterion is also, as I hope will be clear, a killer of logical positivism. I point this out because logical positivism is perennially appealing, is baked into the way science is taught in schools, and has a way of sneaking back in when one isn't looking. The problem of the criterion is not the only problem with logical positivism, but it's a big one. In the same way, the problem of the criterion is also a problem for direct realism about various things because the problem implies that there is a gap between what is perceived and the thing being perceived and suggests the best we can hope for is indirect realism, assuming you think realism about something is the right approach at all.

Finally, all this suggests that dealing with knowledge and truth, whether as humans, AIs, or other beings, is complicated. Thus why we see a certain amount of natural complexity in trying to build formal systems to grapple with knowledge. Real solutions tend to look like a combination of coherentism and pragmatism, and we see this in logical induction, Bayesian inference, and attempts to implement Solomonoff induction. This obviously has implications for building aligned AI that are still being explored.


This post covered a lot of ground, but I think it can be summarized thusly:

  • There is a problem in grounding coherent knowledge because to know something is to already know how to know something.
  • This circularity creates the problem of the criterion.
  • The problem of the criterion cannot be adequately solved, but it can be addressed with pragmatism.
  • Pragmatism is not a complete response because no particular pragmatism is privileged without respect to some purpose.
  • The implications of necessarily grounding knowledge in purpose are vast.


Would most people benefit from being less coercive to themselves?

21 января, 2021 - 17:24
Published on January 21, 2021 2:24 PM GMT

Would most people benefit from being less coercive to themselves (I explain more of what that means here)?

I sort of want to start answering straw versions of the question, because I think that they do answer real fears that people have when they hear about non-coercion and have the immediate reaction "You want me to just drop everything and do what I feel like!?"

Should you drop everything and focus on non-coercion?

So the first answer to the question as it is posed is "Of course not!" Goals are highly contextual. If you're the president of the United States, I think the goal of running the country is probably more valuable then processing non-coercion. Similarly, if you're living paycheck to paycheck at a grueling job you hate, it's probably not a good thing to try to mess with your fundamental motivation systems, it could very well lead to negative consequences if you don't find a new positive/stable motivation system. 

Should you focus on non-coercion if you have enough slack to do so?

Now let's answer a less straw version of the question. Let's say a person has lots of slack and there won't be any immediate negative consequences to themselves or others. In that case, do I think it would be beneficial to switch to non-coercive motivation? It depends. Firstly, what do you care about? If it's mostly just about feeling good and having a good state (I often call this an enlightenment orientation) then yes, I do think this would be a good use of your time.

If instead, you mostly care about achievement/saving the world (I often call this a heaven orientation), then you have to look at at your current productivity. If you're already highly productive, it takes a significant investment to switch your strategy for little or no gain.Elon Musk should not focus on non-coercion. On the other hand, if you are a frequent procrastinator, or find you have wildly oscillating motivation, non-coercive motivation could be a highly effective strategy to make you more productive.

Now what if what mostly care about is shaping yourself and becoming your version of the ideal (I sometimes call this an actualization orientation). In that case, it's going to depend quite a bit on your vision of your ideal self. In some cases it will be worth it, others not.

Would most people benefit from non-coercion if it didn't take any effort?

But lets take an even crazier hypothetical. What if instead of people having to work to get to a non-coercive state, I could just press a button that would remove everyone's self coercion immediately without having to work for it. Would I do that? Nope! But I asked myself a trick question. 

When we say "non-coercive motivation" it sounds like, you're talking about a simple lack of coercion. But it turns out a lack of self-coercion isn't enough. If all it did was remove people's coercion, they would be worse off. Instead, when I talk about "non-coercive motivation" I'm talking not only about removing people's coercive motivation structures, but replacing them with positive motivation structures that work as well or better. Only if the button did both would I press it.

Common Objections to Non-Coercion

And as soon as I talk about pressing that button, I can almost hear the fearful objections of people who are worried their lives will be ruined, so I want to make sure to address them before pressing it.

The first one I hear is: "Wait, a bunch of us are just going to end playing video games and vegging out for the rest of our lives." And it's true, if I had pressed the first button, that would likely happen! But the second button gives you strategies to connect with your long term desires using creative tension. Tools like vision contrasting that motivate you for long term goals.

"Ok, but what about things I just don't like to do? Surely without coercion, my house will become a mess and my taxes will lay undone." That's a legitimate worry. I can see how "non-coercive motivation" might sound like "not doing anything you don't like" but it's more nuanced. There's a mode where the dislike is still there, it has no effect on your taxes.. you've fully accepted you want them anyway.

"Yeah, but that just sounds like coercive motivation with extra magical thinking."  Maybe, it could totally just be mindgames. But it's mindgames that makes your taxes 10x easier and more enjoyable. If you could press a button to make that happen wouldn't you?

"Ok, but what about challenge? The thrill of pushing yourself and pushing your limits? You want to get rid of that?" No, in fact I'm a huge fan of challenge and pushing yourself! But I don't think you need internal challenge to do that! The world will give you plenty of challenge without you having to fight yourself as well. I do think that it's probably easier to learn "Pushing yourself" to get through that challenge from a frame of coercion first. "Pushing yourself non-coercively" is an advanced move. So if you tend to avoid challenging yourself, it may behoove you to spend a period of time deliberately forcing yourself to do challenging things in order to learn that muscle, before switching over to non-coercion. But, it's still possible to do it non-coercively, simply pushing against the challenges the world gives you while being 100% behind that pushing.

"Sure ok, you can be productive and challenge yourself, but what about the moral dimension? Surely without people holding themselves back the world would see unbridled evil." Ok, I'm gonna say some stuff here that may be the biggest leap yet if you haven't experienced it, but fuck it, we've made it this far. I've done a lot of work with practices like CT Charting and Core Transformation that use non-coercive practices to try to get to people's underlying motivations... and I think most people (75%+) fundamentally want themselves and others to be happy (but like, REALLY happy). When people act in ways contrary to that, it tends to be because they feel threatened in a way that if they don't do something "evil/rude/bad/wrong," the thing in front of them will be a threat to making themselves and others happy.

A huge portion of the time( although not always) this is due to trauma, which causes them make gross generalizations about what counts as a threat. Trauma that would be healed in a truly non-coercive environment. Other times, that assessment could actually be accurate due to scarcity of resources, or they could have an ontology in which the "other people" in "making other people happy" doesn't include certain ethnicities or something. And sure, pressing this button could make some of these people more effective in their evil. BUT, a huge majority of trauma-related evil would clear up by pressing that button, and I think that would vastly outweigh other effects on morality.

Do you have to force yourself to be non-coercive?

"Yeah ok, I guess I'll just have to take your word that people are fundamentally good. And I think I'm convinced that non-coercion is good for most people. But for me there's this one thing I KNOW I won't do if I don't coerce myself, and I DO NOT WANT THAT!"

So, what you're saying is, you feel the need to fight against the way you imagine implementing non-coercion? If you're fighting against it, I think that shows it's not a true form of non-coercion. It might be a lot of work to fully integrate to the point where you have a coherent non-coercive motivation system that you're not resistant to, but it is possible.

"Ok sure, but that sounds like a lot of work! I'm not sure if it's worth it!"

You know what, I 100% agree with you, it might not be worth it, I'd refer you to this post on when non-coercive motivation might benefit you or not benefit you.


The Multi-Tower Study Strategy

21 января, 2021 - 11:42
Published on January 21, 2021 8:42 AM GMT

Boxed Topics, Jenga Towers, And The Spacing Effect.

An undergraduate class on molecular biology teaches you about DNA transcription, the Golgi apparatus, cancer, and integral membrane proteins. Sometimes, these sub-topics are connected. But most often, they're presented in separate chapters, each in its own little box. So let's call these Boxed Topics.

The well-known Stewart calculus textbook teaches you about functions in chapter 1, limits and the definition of derivatives in chapter 2, rules for derivatives in chapter 3, and the relationship of derivatives with graphs in chapter 4. Woe betide you if you weren't entirely clear on the definition of a derivative when it gets used, over and over again, in next week's proofs of derivative rules.

Taking a calculus class can be like building a Jenga Tower. If you've never played it, Jenga is a game where each player starts with a tower of wooden blocks in crisscrossing layers of three blocks. Each turn, you take one block out of your tower, and put it on top. The towers get taller and less stable every turn. The first person whose tower collapses is the loser.

In a Jenga Tower Topic, the blocks of last week's memories get pulled out, and placed back on top of the tower as you learn new material that depends on them. You just hope it doesn't collapse before the end of the class.

One is not necessarily easier than the other. A Boxed Topic that's jam-packed with disorganized information can be a lot harder than a well-structured and manageable Jenga Tower.

But all else being equal, Jenga Towers have trouble dealing with the spacing effect. This is the well-established finding that people build memories better by spacing out their practice, rather than massing it all together at once.

This might seem counterintuitive. With a Boxed Topic like molecular biology, you get all your practice on the Golgi apparatus in week 5, and all your practice on cancer in week 12. That's a perfect example of massed practice. It seems like Boxed Topics should have the most trouble with the spacing effect.

By contrast, Jenga Towers, like calculus, give you sustained practice with the early material. You learn about functions in week 1, and you continue using them throughout the course. You learn the definition of the derivative in chapter 2, and it keeps popping up over the long term. That sounds like a built-in form of spaced practice, which should make them easier to learn.

To go further with this objection, homework and quizzes for Boxed Topics and Jenga Towers are typically on whatever students learned most recently. This encourages massed practice, no matter whether the topic is boxed or in a tower.

Let me explain why I still think that Jenga Tower Topics have a bigger problem with the spacing effect than Boxed Topics, despite these objections.

While Jenga Towers do give you lots of practice with the earlier material, it is almost always in the context of presenting more complex concepts that depend on what you most recently studied.

You learn a simple idea. That's not so hard. But you'd better have a firm grasp on it very quickly. Because next week, your mind will be crowded with more complex ideas that absolutely depend on it. And then you'll have to take those more complex ideas, synthesize them, and use that to comprehend something even more complicated.

Even if you can build that tower very high without it collapsing, you don't want to build a career as, say, an engineer, on such a shaky structure.

With a Boxed Topic, you can move on to Box 2 without worrying if you've mastered Box 1. You can just review Box 1 at your leisure. This is why they're more compatible with the spacing effect.

With a Jenga Tower, no such luck. You'd better mass your practice heavily on whatever you're learning this week, because it's going to be foundational for next week, which will be foundational for what follows, and so on.

Imagine you take a year off of school for self-study. You decide to teach yourself three units of Boxed Topics and three units of Jenga Tower Topics during that time. Knowing about the spacing effect, you want to space out your practice sessions on each topic.

This is easy to do for the Boxed Topic. At first, you read straight through the textbooks for each topic, without worrying about remembering much as you go or trying to grasp all the fine details. This takes a month or two.

Then you have over half a year left to go back over the material. You can touch on each chapter once every week or two. In six months, you can re-study each chapter perhaps 8 times. The spacing effect promises that you'll reap much greater dividends than if you'd spent week 1 exclusively studying chapter 1, week 2 on chapter 2, and so on. Yet it's the same amount of work, on the same amount of material. You're just not pushing your brain to master any particular sub-topic in an unnaturally compressed time period.

By contrast, you'll have a very hard time with this strategy for the Jenga Tower Topics. My Calc I-III classes covered a total of about 14 chapters out of the Stewart textbook. Go ahead and try reading them all right straight through, without stopping to focus hard on the individual details along the way, and without doing problems to reinforce your learning. You're gonna have a bad time. That's because calculus is a Jenga Tower Topic.

So what does the devoted autodidact do with Jenga Tower Topics?

The solution is actually simple. Study several at the same time!

Wait, what? You want me to make life easier on myself by, instead of studying calculus, studying calculus, linear algebra, and statistics all at once?

Well, yes, but just a little bit of each one.

There's a certain amount of material in any given subject that you can start to pick up in a casual, just-read-through-it sort of manner. It might be one chapter, just one section of a chapter, or even less. Call this an "Easy Chunk," because it's the most you can learn in one go that doesn't require extreme, detailed focus to get something out of it. You pick up some vocabulary, some general ideas, and perhaps nothing more on your first go 'round.

So pick up your calculus, linear algebra, and statistics textbooks, and read an Easy Chunk. That's your practice for day 1.

Effectively, you're turning several Jenga Tower Topics into a single big Boxed Topic. You learn a little from each one at a time, but you're not trying to integrate them. In theory, over 4 years, you could take the same amount of material. But instead of doing three calculus classes in year 1, three statistics classes in year 2, and three linear algebra classes in year 3, you'd take a class that combines a little of each every year for all three years.

In this way, some of the dependencies are de-coupled. Beginners in college-level math would learn about functions, the basics of linear systems, and the difference between quantitative and qualitative data, all at the same time.

The Multi-Tower Study Strategy

It might be easiest to start by showing you, in the abstract, how these approaches might work. Call this the Multi-Tower Strategy, because the idea is that we study several Jenga Tower topics slowly, at the same time, rather than sequentially, one after the ther.

Let's say that every week, a student can learn five ideas, one per week day.

If they're studying calculus, they can learn C1, C2, C3, C4, and C5. In statistics, they can learn S1-S5. In linear algebra, they can learn L1-L5.

Because these are all Jenga Tower Topics, C2 depends on C1, while C3 depends on C2. Let's represent this as C1 -> C2 -> C3 -> C4 -> C5. Likewise, S1 -> S2 -> S3 -> S4 -> S5 and L1 -> L2 -> L3 -> L4 -> L5.

Now, in a conventional math class, this is exactly how students learn. In week 1, the calculus student studies this:

C1 -> C2 -> C3 -> C4 -> C5

By the time Friday rolls around, they're trying to understand C5, a topic that depends on earlier concepts they learned just four days ago. Even if they had a bit of time each day to review the previous day's material, that's the least possible amount of spacing between their practice sessions.

Let's compare this to an approach to learning that mixes three different math topics together. In week 1, the student studies like this:
C1, S1, L1, -> C2, -> S2

That's a little hard to understand with the arrows, but it's just meant to remind you that when they learn C2 on Thursday and S2 on Friday, these topics depend on topics C1 and S1 that they learned earlier in the week. But now, the student has had two full days in between to learn other, unrelated material. When they do a little bit of review on C1 on Thursday prior to diving into C2, the spacing effect is making that review go much further.

This is not new or fragile science. Again, the spacing effect is one of the oldest, best-studied, strongest findings in the psychology of learning. Our unconventional math student might feel on Thursday like she's having a harder time remembering how C1 worked when she learned it three days prior, that's true. By contrast, the conventional math student feels on Tuesday like he remembers C1 pretty well, since he just learned it the day before.

But psychological science tells us that at the end of three years, although both students will have covered the same material in the same amount of time, the unconventional student who used the Multi-Tower approach will remember it all much better.

Wouldn't this be confusing?

We might theorize that calculus, statistics, and linear algebra are such different subjects that students might really struggle to shift gears on a daily basis.

This objection is easy to dismiss. After all, many students take very different subjects simultaneously. A bioengineering student might take a biology, chemistry, and mathematics class all at the same time, and nobody bats an eye.

Or perhaps the problem is that these math classes are too similar? But then again, it's no problem if a student takes organic chemistry, biochemistry, and molecular biology simultaneously. Not to mention a simultaneous set of classes on French poetry, French literature, and French philosophy!

Why should mathematics be any different?

Why is mathematics a young person's game?

I'm going out on a limb here for a minute.

There's a story in psychology, and I believe it, that young children have brains that are far more plastic than those of adults. They learn and remember far better than older people. Perhaps it is for this reason that math is a "young person's game." Kids can remember new math much more easily than adults, whose intelligence is less fluid and more crystallized. They can cope more easily with the relentless onslaught, with building the Jenga Tower to the point that they can push the boundaries of the field further in their late teens and 20s.

This is probably an important factor for learning generally, but why does math specifically have a reputation as a "young person's game" when other intellectually demanding fields do not?

Perhaps it's at least partly because we take a Jenga Tower approach to teaching math more than we do in any other subject.

And perhaps we do this, in turn, because the structure of the subject itself seems to lend itself to the Jenga Tower approach so naturally, as proof follows proof follows proof. Reaching down from their lofty heights of knowledge, the math teacher wants to reach down and scoop the best and brightest up to their level as quickly as possible.

Other subjects lend themselves more naturally to being boxed. Molecular biology can focus on a specific organelle, protein type, or pathway at a time. It can gesture at connections, but present them in a way that is fundamentally decoupled. If a student is learning about how a certain signaling molecule triggers apoptosis in a cancerous cell, then don't need to recall the exact structure and mechanism of action of the integral membrane protein that is its receptor.

When molecular biology students move on to the next topic, it's without the stress of worrying that what's to come depends on a mastery of what they just studied. When they review, it's with the benefit of the spacing effect.


These ideas about Boxed, Jenga Tower, and Multi-Tower strategies are based on my experience as a teacher, student, autodidact, and study of learning psychology. They make sense, are grounded in established research, and reflect how I am structuring my own self-study. I've been scrutinizing and refining my approach to study over the last two years, and have plans to turn it into a sequence that fully explains the techniques, theory, and underlying science behind the project.

Individual anecdotes and experiences with learning are valuable and interesting. However, there are many variables at play in each individual circumstances. Just because I use the Multi-Tower strategy and find it helpful, or just because you tried it and it didn't work for you, isn't a strong point of evidence one way or the other about which is best.

That depends on which makes the most sense based on what we know about learning psychology, the practical details of a learning effort, and whatever empirical evidence might exist on the subject. It's these forms of evidence that I'm most interested in to develop these ideas further.

Stay tuned over the next year for more posts in my forthcoming Sequence on Scholarship.


A ghost

21 января, 2021 - 10:14
Published on January 21, 2021 7:14 AM GMT

(Cross-posted from Hands and Cities)

This post describes a type of thought experiment I sometimes perform in thinking about what to do. I find it a helpful tool for stepping back from what’s immediately salient to me. It’s mostly just a somewhat hokey variant on a classic type of thought, and I’m not sure how helpful it will be to others, but I thought I’d share anyway.


I imagine a ghost that slips from my body and floats out into the world. The ghost is invisible, and it can move freely in space and time. Often I imagine it able to fast-forward, slow-down and rewind time at whatever speed it chooses; and sometime, as capable of some more comprehensive vision of the fabric of space-time itself — able to see time and space arrayed before it like a kind of pool, to fly from one part to another, to zoom in and out, to dip in and out at will.

(Adam Magyar’s Stainless videos series is some glimpse of what a “slow-down” might be like — h/t Jaan Tallinn, here).

The ghost sometimes has some kind of power to see counterfactual worlds as well. It can alter variables in the world, and then examine what changes as a result. These powers can vary. In some cases, I might imagine the ghost capable of altering Joe’s actions, but nothing else. In others, it might be capable of its own small actions, like causing someone to trip on a pebble. And in others, it might be capable of controlling basically any variable.

In each case, it’s generally able to observe in detail the consequences of different changes, and it can rewind and reset over and over, trying out and comparing different possibilities, sitting with them, reflecting, learning, understanding. The ghost-time available for this is limitless, and the ghost never gets tired or bored.

(There is also some assumption that none of these “tries” has any moral significance in itself — e.g., the ghost is able to create and consider worlds where there is lots of suffering, and to witness it in its full significance, but without that suffering “actually happening” in a way that subjects the ghost’s exploration to its own moral significance. I sometimes imagine some separate action of “making a possibility real,” but the ghost only takes such actions after deep exploration of the alternatives.)

Regardless of the scope of its powers, though, the ghost is not involved in the world. It does not, for example, try to send messages to the people in the world, or to interact with them in other ways. It does not fear punishment or seek recognition; it sees communities and families and friendships flourishing and changing and falling apart, but it is not a part of any of them. Rather, it is centrally a witness: it sees, wanders, attends.

(Something like this perspective is accessible to non-ghosts as well. Here I think in particular of a certain way of walking through cities, or of sitting in cafes; and of a way of treating one’s awareness as somehow insubstantial — a kind of space or light in which the world appears — and one’s life as centrally something witnessed, even as choices steer it in different directions.)


The ghost has lived my life up until leaving my body. It steps into the world with the same broad set of concerns and beliefs as I have. Perhaps I am the first thing it sees — Joe, frozen in time, some particular expression on my face, in the midst of some gesticulation or thought or emotion. As I start to move, speak, go about my life, the ghost sees me in a way it never has from the inside. It starts to understand better the impact of my actions and habits, my follies and blind-spots and virtues, on others; and it follows these impacts as they warp and twist through time.

Perhaps, at the beginning, the ghost is particularly interested in Joe-related aspects of the world. Fairly soon, though, I imagine it paying more and more attention to everything else. For while the ghost retains a deep understanding of Joe, and a certain kind of care towards him, it is viscerally obvious, from the ghost’s perspective, unmoored from Joe’s body, that Joe is just one creature among so many others; Joe’s life, Joe’s concerns, once so central and engrossing, are just one tiny, tiny part of what’s going on.

(See Benjamin Grant’s Overview for photos that can make this sort of viewpoint vivid.)

Exactly where I imagine the ghost’s attention going varies. Often, I’ll imagine the ghost following other people as they go about their lives, watching their joys, pains, and cares played out across time. I don’t tend to imagine the ghost actually living people’s lives, from the inside — though that’s a worthwhile thought experiment too. But I imagine its attention infused with empathy and receptivity, and the inner lives of others as very directly present to it — written into gestures and postures and the lines of faces. With this sort of attention, the ghost watches humans and animals and ecosystems; it explores the past and the future; it wanders the scales of time and space on which the events of the universe unfold.

In my head, this wandering is often flavored with a certain type of sadness and poignancy, but I think this might be an idiosyncratic feature of my own emotional landscape, and other flavors seem very possible. That said, while the ghost’s emotional relationship to what it sees matters a lot to the thought experiment, questions about what it thinks and feels about its own condition — e.g., how long it expects to be doing this, whether it gets lonely or depressed, etc — aren’t center stage. What matters is the type of comprehension of the world that this condition makes possible.

I don’t, though, tend to imagine the ghost as theoretically omniscient. It can access the whole fabric of space and time, but it need not have considered all arguments, proved all theorems, discovered all final theories. Limitations of memory or energy or boredom do not hinder it, but it’s not a trans-humanist super-being along every dimension that could be ethically relevant; and in this sense, I think, it is likely not an “ideal observer” in the sense most relevant to various ideal observer theories of meta-ethics (though we can modify it to fit the bill better). Rather, it’s something in between such an observer, and my everyday self. This makes it more fallible, but also easier to imagine.


Really, this is just a particular way of accessing a familiar perspective, and involved thought experiments are far from necessary. The world the ghost sees, after all, is just the real world — just the place we all live, just what’s actually happening. Imagining what the ghost sees and feels is mostly a way of starting to imagine what’s there to be seen, and how you would feel if you saw it.

Nor is it the only perspective, or one of overriding authority. We aren’t ghosts, and would each be different ghosts in any case. And we’re not outside the world: we’re in it, embodied, ignorant, vulnerable, with no rewinds or retries. The ghost has no family or friends or communities of its own; but we do.

Still, I find that imagining what different types of ghost would feel or think or see makes a difference to my own deliberation. For example:

  • In considering my relationship to a given person, I might imagine how the ghost would feel about this person, if they had followed them throughout their life, and could see their current situation clearly.
  • In considering taking a risk, I might consider how the ghost would want me to choose, if it was first deeply acquainted with the different possible outcomes, but then thrust back into my current epistemic position with respect to which actions will bring them about.
  • In considering the possibility that I am making a mistake of some sort, I might imagine what type of mistake this would look like to a ghost who was watching me make it.
  • In considering different sorts of moral dilemmas, I might imagine how a ghost who had replays/retries, clear knowledge of and acquaintance with the different outcomes, and who was not part of the social world in which a given action would be embedded, might act, or might hope that I act.
  • In considering the importance of what happens in the future relative to some present-day concern, I might think about how a ghost who had really been to the future would feel about the stakes involved.
  • In considering some kind of justification that I’m offering for an action, I might consider how a ghost would think about my situation and the argument I’m offering.

I don’t necessarily treat the ghost’s input as definitive. Often, though, it moves me — helping me to step out of a certain type of myopia; to distinguish between map and territory; to clarify which considerations are relevant to which perspectives; and to remember the vividness and detail of everything I cannot see.


A few thought on the inner ring

21 января, 2021 - 06:40
Published on January 21, 2021 3:40 AM GMT

I enjoyed C.S.Lewis’ The Inner Ring, and recommend you read it. It basically claims that much of human effort is directed at being admitted to whatever the local in-group is, that this happens easily to people, and that it is a bad thing to be drawn in to.

Some quotes, though I also recommend reading the whole thing:

In the passage I have just read from Tolstoy, the young second lieutenant Boris Dubretskoi discovers that there exist in the army two different systems or hierarchies. The one is printed in some little red book and anyone can easily read it up. It also remains constant. A general is always superior to a colonel, and a colonel to a captain. The other is not printed anywhere. Nor is it even a formally organised secret society with officers and rules which you would be told after you had been admitted. You are never formally and explicitly admitted by anyone. You discover gradually, in almost indefinable ways, that it exists and that you are outside it; and then later, perhaps, that you are inside it.
There are what correspond to passwords, but they are too spontaneous and informal. A particular slang, the use of particular nicknames, an allusive manner of conversation, are the marks. But it is not so constant. It is not easy, even at a given moment, to say who is inside and who is outside. Some people are obviously in and some are obviously out, but there are always several on the borderline. And if you come back to the same Divisional Headquarters, or Brigade Headquarters, or the same regiment or even the same company, after six weeks’ absence, you may find this secondary hierarchy quite altered.
There are no formal admissions or expulsions. People think they are in it after they have in fact been pushed out of it, or before they have been allowed in: this provides great amusement for those who are really inside. It has no fixed name. The only certain rule is that the insiders and outsiders call it by different names. From inside it may be designated, in simple cases, by mere enumeration: it may be called “You and Tony and me.” When it is very secure and comparatively stable in membership it calls itself “we.” When it has to be expanded to meet a particular emergency it calls itself “all the sensible people at this place.” From outside, if you have dispaired of getting into it, you call it “That gang” or “they” or “So-and-so and his set” or “The Caucus” or “The Inner Ring.” If you are a candidate for admission you probably don’t call it anything. To discuss it with the other outsiders would make you feel outside yourself. And to mention talking to the man who is inside, and who may help you if this present conversation goes well, would be madness.

My main purpose in this address is simply to convince you that this desire is one of the great permanent mainsprings of human action. It is one of the factors which go to make up the world as we know it—this whole pell-mell of struggle, competition, confusion, graft, disappointment and advertisement, and if it is one of the permanent mainsprings then you may be quite sure of this. Unless you take measures to prevent it, this desire is going to be one of the chief motives of your life, from the first day on which you enter your profession until the day when you are too old to care. That will be the natural thing—the life that will come to you of its own accord. Any other kind of life, if you lead it, will be the result of conscious and continuous effort. If you do nothing about it, if you drift with the stream, you will in fact be an “inner ringer.” I don’t say you’ll be a successful one; that’s as may be. But whether by pining and moping outside Rings that you can never enter, or by passing triumphantly further and further in—one way or the other you will be that kind of man.

The quest of the Inner Ring will break your hearts unless you break it. But if you break it, a surprising result will follow. If in your working hours you make the work your end, you will presently find yourself all unawares inside the only circle in your profession that really matters. You will be one of the sound craftsmen, and other sound craftsmen will know it. This group of craftsmen will by no means coincide with the Inner Ring or the Important People or the People in the Know. It will not shape that professional policy or work up that professional influence which fights for the profession as a whole against the public: nor will it lead to those periodic scandals and crises which the Inner Ring produces. But it will do those things which that profession exists to do and will in the long run be responsible for all the respect which that profession in fact enjoys and which the speeches and advertisements cannot maintain.

His main explicit reasons for advising against succumbing to this easy set of motives are that it runs a major risk of turning you into a scoundrel, and that it is fundamentally unsatisfying—once admitted to the ingroup, you will just want a further in group; the exclusive appeal of the ingroup won’t actually be appealing once you are comfortably in it; and the social pleasures of company in the set probably won’t satisfy, since those didn’t satisfy you on the outside.

I think there is further reason not to be drawn into such things:

  1. I controversially claim that even the good of being high status is a crappy kind of good relative to those available from other arenas of existence.
  2. It is roughly zero sum, so hard to wholly get behind and believe in, what with your success being net bad for the rest of the world.
  3. To the extent it is at the cost of real craftsmanship and focus on the object level, it will make you worse at your profession, and thus less cool in the eyes of God, or an ideal observer, who are even cooler than your local set.

I think Lewis is also making an interesting maneuver here, beyond communicating an idea. In modeling the behavior of the coolness-seekers, you put them in a less cool position. In the default framing, they are sophisticated and others are naive. But when the ‘naive’ are intentionally so because they see the whole situation for what it is, while the sophisticated followed their brute urges without stepping back, who is naive really?


Notes on Optimism, Hope, and Trust

21 января, 2021 - 02:00
Published on January 20, 2021 11:00 PM GMT

This post examines the virtues of hope, optimism, and trust. It is meant mostly as an exploration of what other people have learned about these virtues, rather than as me expressing my own opinions about them, though I’ve been selective about what I found interesting or credible, according to my own inclinations. I wrote this not as an expert on the topic, but as someone who wants to learn more about it. I hope it will be helpful to people who want to know more about these virtues and how to nurture them.

What are these virtues?

These virtues have in common a sort of “look on the bright side” / “expect the best” approach to life. But there are a number of ways to interpret this, and if we are looking for a virtue — that is, a characteristic disposition that promotes human flourishing — we would be wise to be precise and careful.

There is a naive version of optimism that I would not want to try to defend on LessWrong:

the belief that the probability of an outcome is increased by one’s positive disposition toward it

So let me say at the start that this is just one possible way the optimistic outlook can be defined, and that others can bear more rational weight.

There is little controversy about hope, optimism, and trust being parts of a flourishing life. However there is controversy about whether encouraging a hopeful, optimistic, trustful outlook puts the cart before the horse. Are hope, optimism, and trust ingredients of a life well lived, or are they results of a life well lived (or perhaps of good fortune)?

Related virtues and vices

These virtues are closely related to some others: Cheer and joy, for instance, are easier to maintain when one is optimistic. Confidence, boldness, and courage are aided by the hope of triumph. Imagination can help you to discover possible good outcomes to be hopeful about. Intimacy, openness, and vulnerability can be boosted by trust. Solidarity can include an extension or expectation of trust. Trust also complements trustworthiness. Richard Rorty thought shared hope was at the base of civility (“Contingency, Irony, and Solidarity,” 1989).

There is some tension between optimism and prudence (in the sense of caution/vigilance), if optimism causes you to underweight the possibility of bad outcomes that you ought to prepare for. If optimism becomes an excuse to discount evidence that disagrees with a positive outlook, it can undercut rationality and skepticism.

The vices associated with a lack of optimism / hope / trust include cynicism, distrust, doubt, paranoia, suspicion, pessimism, and despair. Sometimes lack of hope in particular becomes fatalism, a feeling of lack of agency (“nothing I do matters”), or, philosophically, a loss of faith in free will. There are also vices associated with an excess of optimism / hope / trust, like unpreparedness, gullibility, or bliss-bunnyishness. “Williams syndrome” is a genetic disorder that includes a dangerous overabundance of trust among its symptoms.


The Catholic Encyclopedia defines hope as “the desire of something together with the expectation of obtaining it.” However, in common use people often hope for things they do not expect or even think likely (e.g. to win the lottery). Hope does seem to at least require some possibility that the hoped-for thing could come to pass, but also some possibility that it might not: you cannot “hope” for something that is a certainty (though you might have a pleasant expectation of it) or an impossibility (though you might wish it could be otherwise). Hope (secular hope, anyway) isn’t the expectation that the hoped-for thing will come to pass but includes the tension of some fear that it will not.

Usually, hope refers to the future. Occasionally, however, people seem to express hopes for past events (“I hope I remembered to turn off the oven”). Some people interpret this as a sort of roundabout way of expressing a future hope (“I hope I find the oven is off when I return home”).

It probably goes without saying, but a thing hoped for is always a positive thing (for the hoper, anyway). You can have expectations of positive or negative things, or fear / dread / foreboding about negative things, but hope always has positive content.

Hope can be motivating: it allows you to imagine the possibility of a future state and its benefits, and so can help drive you to do what needs doing to get there. A sort of mundane hope underlies most all purposeful activity: you hope that by doing something you will get some result. Hope can also be motivating in the way it helps you to set your sights on long-term goals. For example, the hope of what you will accomplish as a doctor can help you endure your long hours of residency. Hope can in this way be active, more like aspiring than passively anticipating.

Hope might also be thought of as a sort of imagination that helps to prepare us for the future. In this it is a complement to fear: fear encourages us to prepare for a future where fears come to pass; hope prepares us for a future where hopes come to pass. (But we don’t call “fear” a virtue; instead we talk about courage or caution or prudence or preparedness. Maybe we need something similar for hope.) Utopian and dystopian writings are elaborate sorts of hopes and fears that ask us to imagine possible futures in a way that invites us to prepare for them or be vigilant about them.

Hope is inherently valuable in that imagining positive things is a pleasurable sort of daydreaming. Hoping also helps you to discern and make salient your values: you can learn what you value in part by paying attention to what you hope for.

Intransitive, superstitious, and self-fulfilling hopes

What I have described so far is “hope for” something. There is also a sort of intransitive hope — hope in general — that is more like what I’ll cover in the “Optimism” section below. Hope as a virtue — habitually, characteristically adopting a hopeful stance — resembles this intransitive sort of hope, though it may exhibit itself through specific acts of transitive hope.

There is also the superstitious hope that I alluded to earlier: “wishful thinking” in which you have the irrational belief that your hope will influence future events to align with the contents of your hope. There is all sorts of common magical thinking surrounding this kind of hope: totems and rituals and beliefs that if you just hope strongly enough or sincerely enough you can materialize your hopes just like that. In its worst form, hope of this sort can displace the kind of practical action that might actually help make hopes come to pass.

That said, there are such things as self-fulfilling prophesies in the realm of hope (see the discussion of William James below for more on this). By anticipating fortunate opportunities, we may better prepare ourselves to take advantage of them if they arrive, so in this way hope can indeed help good things come to pass. If we believe superstitiously that “everything happens for a reason,” this may prompt us to look more confidently for the silver lining in the cloud, which may help us find it where we otherwise would have missed it. And sometimes “hope” is the name we use to describe confidence that comes from careful preparation and experience.

There is also such a thing as an irrational lack of hope: a superstitious pessimism (“I’ve always had bad luck”) or paranoia. In such a case, something like “hope” might be recommended as a corrective to help you evaluate reality more rationally. It may be that the virtue is called “hope” not so much because it is a good thing to be more hopeful than reason permits, but because on the continuum of hopeful-to-hopeless the hopeless side is more harmful and so it is wise to err on the hopeful side, all else being equal.

Hope as an intellectual virtue

Hope is sometimes described as an intellectual virtue. It can come to the assistance of intellectual, rational pursuits. William James (see below) noted, for example, that while science declares allegiance to dispassionate evaluation of facts, the history of science shows that it has often been the passionate pursuit of hopes that has propelled it forward: scientists who believed in a hypothesis before there was sufficient evidence for it, and whose hopes that such evidence could be found motivated their researches. Nancy Snow, in “Hope as an Intellectual Virtue” (2013), says that hope works as an intellectual virtue in three ways: “(1) hope that knowledge/truth can be obtained furnishes a motivation for its pursuit; (2) hope imparts qualities to its possessor, such as resilience, perseverance, flexibility, and openness, that aid in the pursuit of knowledge/truth; and (3) hope, through imparting such qualities to its possessor, functions as a kind of method in the pursuit of knowledge/truth.”

Hope in a Christian context

Faith, hope, and charity (or love) are the traditional Christian virtues. In a Christian context, hope is the confidence that God has your back, and will extend His help to you in your efforts to “reach eternal felicity.” Hope is said to be an “infused virtue” (one that is implanted in us by God, not one that we develop ourselves through habit). There ultimately is no hope to be had in the material world and our mortal lives; if we try to put our hope there, we will end in despair (the opposite of hope). Hope is a matter of will, not of intellect: hope “elevate[s] and strengthen[s] our wills” as we work toward unity with God. If we reject God, we end up losing hope (“Abandon Hope All Ye Who Enter Here” is legendarily inscribed on the gates to hell).

William James

The philosopher William James believed he had discovered rational grounds for making our opinions about the truth depend to some extent and in some circumstances on our emotional disposition towards it. He defended this approach in his essay “The Will to Believe.”

In short, James said that when

  1. you have to decide between hypotheses and cannot just remain comfortably in doubt,
  2. you are unable to rationally and scientifically decide between hypotheses (no time, no data, competing hypotheses match the data equally well), and
  3. it would be better for you if some particular hypothesis were the truer one

then you are justified in choosing — indeed you ought to choose — the preferred hypothesis as the one to provisionally believe.

For one thing, he notes that for some hypotheses, belief in them can have causal influence on their being true. For instance, if you believe that someone is your friend, and you therefore behave in a friendly way toward them, this may influence them to become your friend if they were previously on the fence about it. If you do not believe that anyone can be trusted, you will never extend trust to anyone, and sure enough you will inhabit a world in which nobody behaves in trustworthy way towards you. There are many things like this in life, James says, where if you treat them as “live hypotheses” you can in fact bring them about, just by orienting your life in such a way as to make a space for them to occupy.

James applied this idea to a sort of generic religious hypothesis (that the most perfect and most eternal things are the important ones, and that this religious outlook underlies the best sort of life). This is an unproven hypothesis, and one that James asserts you have to accept or reject: there’s no middle ground. If you wait until you have air-tight proof either way, you’ll be waiting your whole life — and this amounts to the same thing in practical terms as to reject the hypothesis dogmatically. The risk of accepting the hypothesis if false is the fear that you will have been duped by a fairy tale; the hope of accepting the hypothesis if true is that you don’t waste your “sole chance in life of getting upon the winning side.” Given this balance: “what proof is there that dupery through hope is so much worse than dupery through fear?” (This of course resembles Pascal’s Wager.)

James goes further and says that the religious outlook tends to personalize the perfect/eternal in such a way that — as with our beliefs in trust or friendship — our religious beliefs can have causal influence on the truth value of those beliefs. In order to live in a god-filled, hopeful universe, James suspects, you must meet the religious hypothesis half-way: you must extend some belief in its direction in order to receive the flow of evidence that supports the belief in return. Stubborn skepticism and despair becomes a self-fulfilling prophecy (in the same sense as with the person who does not believe people can be trusted), and, according to James, is thereby irrational: “A rule of thinking which would absolutely prevent me from acknowledging certain kinds of truth if those kinds of truth were really there, would be an irrational rule.” Take the hypnotoad pill, man.

James ends his essay by quoting Fitz James Stephen:

We stand on a mountain pass in the midst of whirling snow and blinding mist through which we get glimpses now and then of paths which may be deceptive. If we stand still we shall be frozen to death. If we take the wrong road we shall be dashed to pieces. We do not certainly know whether there is any right one. What must we do? “Be strong and of a good courage.” Act for the best, hope for the best, and take what comes… If death ends all, we cannot meet death better.


Optimism is the belief that things are good, or at least are heading in that direction. It sometimes also includes the belief that we ourselves are doing well or getting better, and that we have some control over our fate. John Dewey preferred the term “meliorism” which emphasized this active element: a meliorist is optimistic that our efforts can make things better.

Is the universe benign, malign, or indifferent? Can people be good to one another, or does self interest make them necessarily antagonistic? Is the future bright or are we doomed? Is life worth living or would it be better had we never been born? Is history a story of progress and enlightenment or an embarrassing chronicle of horrors? Is there a caring God who watches over us, or are we playthings of cruel forces? Optimists know which side of these questions they’re rooting for, and they may be tempted to tilt the scales a bit as they weigh the evidence.

The philosophy of optimism

Optimism has at times been taken to unlikely extremes. There was an 18th-Century vogue for philosophical proofs that the universe we inhabit is not merely good, but “the best of all possible worlds.” The more typical optimist is satisfied with the belief that life is on the whole significantly more good than not.

Optimism at first glance seems to presuppose a scale of good and bad on which you measure your life, or history, or the universe, or whatever, in order to then find it good. Some philosophers have tried another approach to optimism, which is to unconditionally love their life without first judging it in this way. Nietzsche put it this way: “My formula for greatness in a human being is amor fati: that one wants nothing to be different, not forward, not backward, not in all eternity. Not merely bear what is necessary, still less conceal it — all idealism is mendacity in the face of what is necessary — but love it.”

Most philosophies and religions are optimistic, even the ones that are very critical of people or very suspicious of the material world. They typically either explain why things are good, or explain how to get out of a bad predicament. LessWrong-style rationalism is full of this sort of optimism: we are plagued by cognitive biases, yet with effort we have the power to more closely approach the truth; artificial intelligence threatens to extinguish human values, but we can learn how to restrain it; death is everyone’s destiny, but perhaps we can be the generation to defeat destiny!

Some philosophies make an effort to show how certain paths to optimism are dead ends but others remain open. If we condition our optimism on being able to avoid things like sickness, aging, and death, the Buddha says, we will inevitably fail — yet we can still transcend suffering. If we lay up treasures upon earth, where moth and rust doth corrupt, where thieves break through and steal, we will be disappointed, says Jesus — but we can lay up for ourselves treasures in heaven and then we’ve got it made in the shade. If we condition our happiness on things that are not in our control, we will be forever at their mercy, says Epictetus — but we can stop doing that, and when we do so, the world is our oyster.

It is a rare philosopher who says “nope; it’s hopeless” — even Schopenhauer, who is usually trotted out as the poster child for pessimism, seems to have left an escape route open to certain exceptional people. The optimist in me thinks that this is a good sign: whatever else they disagree on, the philosophers agree that things are more-or-less okay and maybe even marvelous. The pessimist in me wonders whether philosophers are protesting too much — shouting optimistic words into an uncaring void in a whistling-past-the-graveyard way — or whether there’s a bias at work in which pessimistic philosophers fail to find followers (or publishers) in favor of those who hold out false hopes.

Pessimism considered harmful

Pessimism has become medicalized. Humoral theory held that pessimism was associated with an excess of black bile (from which we get the word “melancholy”); optimism with blood (from which we get the word “sanguine”). Nowadays, excessive pessimism and hopelessness are considered to be pathological symptoms of a medical diagnosis of depression (though not sufficient in themselves for such a diagnosis). If you come to believe that the universe is a cold, uncaring place, that all human effort is pointless vanity as all we come to love in our futile attempts to muffle our howling loneliness decays and is ripped from our fingers by ever-looming death, which is usually preceded by decrepitude, confusion, and painful suffering… you may be asked to consider one of an array of medications that shows promise for treating people who find such concerns to be of urgent import. A certain amount of pessimism is tolerated in the eccentric cranks among us, but above a certain threshold it becomes suspiciously pathological.

Optimism on the other hand is associated with better life outcomes. There’s an obvious correlation/causation problem to be cautious about here, but there has also been a lot of research done that attempts to tease out a possible causal role for optimism. Optimists tend to use engagement-based (as opposed to disengagement-based) coping mechanisms to deal with adversity (addressing vs. avoiding the problem). For example, they take proactive steps to protect their health and wind up healthier as a result. They seem to do better in terms of educational persistence, relationships, and income. A good overview of the research on benefits and drawbacks of optimism can be found in Charles S. Carver, Michael F. Scheier, Suzanne C. Segerstrom “Optimism” Clinical Psychology Review 30 (2010).


“What loneliness is more lonely than distrust?” ―George Eliot, Middlemarch

“Suspicion often creates what it suspects.” ―C.S. Lewis, The Screwtape Letters

“Is not he a man of real worth who does not anticipate deceit nor imagine that people will doubt his word; and yet who has immediate perception thereof when present?” (Analects of Confucius, ⅩⅣ.ⅹⅹⅹⅲ)

You can extend trust to someone particularly or generally. That is to say, you can trust them to do something (water your plants while you’re on vacation) or trust them more generally (you consider them a trustworthy person). You can trust someone to do what they say they’ll do, to follow-through, to make good on their promises. You can also trust someone with something, that is, you trust that they will be a good caretaker of it. You can also trust someone in the sense that you believe that they are being truthful in what they say. You can also trust someone’s judgement: you trust them to make good decisions; you feel comfortable leaving things in their hands.

You might trust someone by virtue of the role they play in your life. For instance you might trust someone as-a-friend to follow through on your expectations of what friendship entails (e.g. keep secrets). Or you might trust someone as-a-fellow-X (Mason, proletarian, Dodgers fan) to show certain signs of solidarity (e.g. never testify against a fellow-cop). If trust of these sorts is betrayed, this can undermine the pillars of the preexisting relationship (e.g. “I thought you were my friend!”).

Trust might be extended in reciprocal expectation: I behaved in a trustworthy way towards you, so now I can expect that you will honor my trust in return. Two people may work together to establish a relationship of trust by alternatingly extending trust and honoring trust. There’s a sort of prisoners’ dilemma at work here, where each party is most apt to flourish by living a life in which they have relationships of trust, but each relationship of trust requires that they be willing to extend trust in an act that may make them vulnerable.

Trust towards people in general, or by default

You can also have a more or less trusting outlook towards people in general: You might come to assume that most people have more-or-less good intentions, or on the contrary you might come to assume that everyone is ultimately selfish and has ulterior motives for their ostensibly benevolent actions. Excess distrust of that kind can be an overcorrection in the face of bad experiences. The saga of “learning to trust again” when you’ve been hurt in romance is a common human tale. Distrust also seems very contextual: healthy distrust in one context would be an unhealthy level of suspicion in another.

People who abuse trust can sometimes be very crafty about doing so, and difficult to detect. Learning to detect untrustworthy people, or filtering them out of your life, can help you maintain a more trusting attitude in general. This way, you do not need to distrust people in general in order to avoid harm from those people who would abuse your trust. This is a difficult skill, and more of an art than a science, but is an important facet of trust-as-a-virtue.

Humans are so extraordinarily helpless as infants and children, and as a result are so reliant on others, that trust relationships are crucial parts of our early lives. Someone who grew up with abusive or neglectful caregivers may have extraordinary difficulty with trust because they lacked the opportunity to place trust in someone trustworthy as a child.

Sometimes people use “trust” to describe their relationships with institutions or impersonal things. Sometimes this seems mostly metaphorical: trusting gravity to keep your belongings from floating away; trusting that the sun will come up tomorrow. But in the case of human institutions, it’s more of an open question whether “trust” is a metaphor or really is something we can extend to collective endeavors or human-created algorithms: For instance, someone might say that they trust what they read on Wikipedia, or they trust Google to keep their email private, or they trust “the science” about global warming. They might feel betrayed if they trusted their software to keep reliable backups and they find out those backups were corrupted.

Trust vs. expectation and reliance

Trust is more than mere expectation. You might expect that someone will act in some way because it would be in character for them, or because it would be in their best interests. But if you trust them to act in some way, it seems to imply that you expect them to do so in part because of the trusting relationship you have with them: you trust them to act in such a way because you believe they consider themselves duty-bound to do so — perhaps because of the trust you have extended to them. (And an attitude of “trust, but verify” is hardly one of trust at all.)

This also suggests that in order to trust some person to do something, that person needs to be aware that they are trusted to do that thing, or at least they need to believe that doing that thing is part of what makes them trustworthy. You can’t, in other words, reasonably trust someone to do something they wouldn’t otherwise do, without bothering to communicate your trust to them somehow. If someone betrays your trust, this probably implies that you find them blameworthy for doing so; if so, then it doesn’t make sense to put your trust in someone in such a way that they would be blameless for betraying it (for instance if they didn’t have any way of knowing they were being trusted, or if you trusted them to do something you knew to be impossible, or if they did not accept the trust you offered). You can hope that your doctor cures your cancer; you can trust your doctor to take all reasonable and necessary steps to treat your cancer; but you can’t reasonably trust your doctor to cure your cancer, because it might not be curable.

Not all trust is explicitly granted and accepted, however. Some is implicit. You may extend a certain amount and type of trust to a stranger implicitly. For instance if you stop to ask someone directions, you implicitly trust them not to mislead you. You would be reasonable to feel that they betrayed your trust if they instead sent you on a wild goose chase, even if they never explicitly accepted your trust or never promised not to mislead you. The amount and type of trust that you are willing to implicitly extend in this way is one measurement of your expectations for people. Extending more such trust means you have higher expectations for them (and also, by extension, for yourself). This can be a way of promoting other virtues by setting-the-bar (to trust people to be honest, kind, etc. is also a way of saying I expect people to be honest, kind, etc. and I will judge them poorly if they are not).

Different cultures and subcultures have different expectations of trust: to whom it can be safely extended, in what areas, and to what extent. Violating the trustworthiness expectations of a culture can provoke a sort of exile from that culture in the form of shunning or shaming. Not extending trust to someone in a context where there is a cultural expectation of trust can be seen as insulting. Inter-culture conflicts are sometimes rooted in different trust expectations. Being “multilingual” in your sense of trust can be valuable, particularly if you straddle cultures. You are not merely a passive interpreter of your culture’s trust norms, of course: you also help to shape them. By being somewhat-more-trusting than the norm, you can help move the norm in a more trusting direction.

In my discussion of the virtue of honesty I mentioned Thoreau’s observation that “It takes two to speak the truth—one to speak, and another to hear.” If you cannot trust someone to be honest with you, you become deaf to honesty: it becomes more difficult for you to hear the truth. If someone knows that you do not trust them, this may also demotivate them from attending carefully to the the truth, precision, and clarity of what they say to you (“why bother, after all you’re just going to believe what you want to believe”).

Trust and vulnerability

Trust is closely related to vulnerability. When you extend trust to someone, you may be making yourself doubly-vulnerable: the person you trust may not do what you trust them to do (which may have negative consequences for you, e.g. your plants go unwatered and die), and they may betray your trust (which is an additional negative consequence, e.g. you feel a fool for having trusted them). Being vulnerable in this way can require courage. Extending trust, and accepting the cost in vulnerability to do so, can be a sort of generosity or charity, in some contexts. (If you extend different amounts of trust to people for unfair reasons, this may have implications for justice as a virtue as well.)

Trust seems to be one of those “golden mean” virtues. If you have too little trust, you’ll miss out on important opportunities for cooperation. If you have too much, you’ll be a gullible mark who is easily taken advantage of.

How can we improve in hope, optimism, and trust

Most of the advice I have found about how to improve in these virtues has to do with how to be more hopeful, more optimistic, and more trusting. In other words, it assumes that you have undershot the mark. Advice is harder to come by on how to stop clinging to exhausted hopes, accept that life is vain suffering, and stop being so darned gullible.

Benjamin Franklin, in his essay “The Deformed and Handsome Leg” noted that some people tend to dwell on the negative things in life, others on the positive, and the former are are at a great disadvantage in life. He asserted that the pessimistic outlook is a “disposition… perhaps taken up originally by imitation, and unawares grown into a habit, which though at present strong, may nevertheless be cured.” His cure involves deliberately adopting the habit of redirecting one’s attention away from the negative things in life. This suggests that gratitude and appreciation can be ways of becoming more optimistic.

Cognitive-behavioral therapy is a modern approach to Franklin’s cure. It is used to combat the persistent pessimistic thoughts that are symptoms of (and may exacerbate) depression, and the cognitive biases that reinforce pessimistic assessments.

Positive psychology researcher Martin Seligman suggested that there is a “learned optimism” that people can cultivate (Learned Optimism: How to Change Your Mind and Your Life, 1990). You become more optimistic by changing how you describe things that happen in your life. If you describe good things using language that implies that they are “permanent and pervasive,” and bad things with language that implies that they are “temporary and narrowly-focused” an optimistic outlook will result. (And a pessimistic one will result if you get this backwards.) So, for example, rather than describing a negative occurrence as an example of “my bad luck” you describe it as just a one-time setback. This is a sort of “framing” technique used as a mind hack.


Contra StatNews: How Long to Herd Immunity?

21 января, 2021 - 01:43
Published on January 20, 2021 10:06 PM GMT

Cross post from applieddivinitystudies.com/stat-immunity Warning: Speculative armchair epidemiology. All emphasis mine. See also Youyang Gu's projection.

Summary: In an article for Stat, Dr. Zach Nayer misrepresents research, makes indefensibly flawed assumptions, and fumbles basic arithmetic. Per CDC, actual US Covid cases are 4.6x higher than reported, and currently around 2.4x higher. Using improved parameters, our toy model finds that herd immunity may occur in less than 4 months, although neither estimate should be taken too seriously. It all depends on the transmissibility of the new strain, as well as our ability to ramp up vaccine production, distribution and acceptance.

1) Dr. Nayer Misrepresents the Evidence on Monthly Infection Rates

Last month, Dr. Zach Nayer [1] at Stat published an estimate of time to herd immunity, suggesting that without vaccines it may take as long as 55 months.

The model itself is straightforward. Assume we need to hit 75% immunity, then figure out when we'll get there based on existing prevalence and monthly infection rate:

Unfortunately, Nayer's parameters are totally off. Citing a study which found antibody prevalence of 9.3%, Nayer writes:

In late September, a Stanford study estimated that 9.3% of Americans have antibodies against SARS-CoV-2.... If the base prevalence at the end of September --- eight months from the onset of the epidemic in the United States on January 21, 2020 --- was 9.3%, the coronavirus has an infection rate of approximately 1.2% of the population per month.

But take a closer look. Although the study was published in September, it was based on data collected in July. As the authors make explicit:

Our goal was to provide a nationwide estimate of exposure to SARS-CoV-2 during the first wave of COVID-19 in the USA, up to July, 2020

Instead of dividing 9.3% by an eight month range, Nayer should have used the 6 months from January through July. This yields an estimated monthly infection rate of 1.6% rather than 1.2%.

To his credit, Nayer attempts to confirm this result against another source of data, but fumbles the arithmetic. He writes:

one study [estimates] 52.9 million infections in the U.S. from February 27 to September 30, or an infection rate of 1.3% per month.

52.9 million infections is 16% of the US population. Over a 7 month time period, that's a monthly infection rate of 2.3% per month, nearly double Nayer's result.

Of course, the biggest problem with Nayer's parameters is not even that he's misinterpreted historical studies, it's that he naively projects them into the future.

Nayer's prediction isn't based on linear growth or exponential growth, it's based on 0 growth. He assumes that historical cases will be a good proxy for future cases, including the February base rate of 17 total confirmed monthly cases, and then uncritically takes this base rate as a future projection.

2) What is the Actual Monthly Infection Rate?

Rather than start in January, we can consider the monthly infection rate for December, the month Nayer's article was published. That month, cumulative confirmed cases rose from 13.8 million, up to 20 million, for 6.2 million new cases, or a monthly infection rate of 1.9%.

But remember, confirmed cases are not a good proxy for actual infections. Nayer's cited research reported 9.3% antibody prevalence in July, equivalent to 31 million total cases. Meanwhile, only 4.56 million cases had actually been confirmed by July 31st, suggesting a confirmed-to-actual multiple of 6.8x. Using this multiple, December's 6.2 million confirmed cases represent 42.16 million actual cases, for a 12.8% monthly infection rate.

But again, that data is from July, and testing may have improved since such that a greater number of actual cases are correctly reported.

In late November, CDC researchers set out to estimate cumulative incidence by correcting for undercounting. They report 52.9 million total infections through the end of September, even though only " 6.9 million laboratory-confirmed cases of domestically acquired infections were detected and reported". That implies a multiple of 7.67x, or as the authors write:

This indicates that 1 in 7.7, or 13% of total infections were identified and reported.... Our preliminary estimates indicate approximately 1 in 8, or 13%, of total SARS-CoV-2 infections were recognized and reported through the end of September

If this multiple held true in December, it would imply 47.7 million new infections, or 14.5% of the population.

Most recently, the CDC reports 83.1 million total infections through December. Since there were 20 million confirmed cases, that's a multiple of 4.2x, and an actual monthly infection rate for December of 7.8%. [2] They also report a 4.6x multiple for total COVID--19 infections reported.

Having said that, if we were undercounting by 7.7x through September, and by 4.2x overall, that implies we were undercounting by less than 4.2x after September. With 52.9 million actual cumulative cases as of 9/30 and 83.1 as of 12/31, we can infer 30.2 million actual new cases in between. By comparison, confirmed cumulative cases rose from 7.27 million to 20.03 million in the same period, for 12.76 million confirmed new cases. Using this estimate, the confirmed-to-actual multiple since September is 2.4x.

Here's a table of monthly infection rates, depending on how you measure it. Full table and sources in the original post:

Estimate | Monthly Infection Rate (% of US Population) Dr. Nayer's Stat Article | 1.3% Anand et al. January - July | 1.6% Reese et al. February – September | 2.3% December, confirmed cases | 1.9% December, 6.8x multiple | 12.8% December, 7.7x multiple | 14.5% December, 4.6x multiple | 8.7% December, 2.4x multiple | 4.6%

Of these, I think 4.6% is the best estimate, though note that there is a lot of uncertainty as to which multiple applies best for December, as well as underlying uncertainty in the original studies. [3]

In any case, Nayer's 1.3% estimate was substantially off. It was the result of flawed arithmetic, a misreading of his cited study, and the incredibly naive assumption that the January - July average would project into the future with no growth.

3) Conclusion: How Long to Herd Immunity?

Using the CDC's estimate of 25% base prevalence, a monthly infection rate of 4.6% and Nayer's original model, we'll achieve 70% immunity in 8 months.

Incorporating further information about vaccinations, antibody loss and a more pessimistic 80% threshold, my best guess is herd immunity by July 3rd. You can find detailed explanations for these parameters in the appendix.

You should not interpret these estimate too seriously.

Here's an abbreviated table of results based on vaccine acceleration rate (how many more vaccinations today than yesterday), and herd immunity threshold. Formatted table in the original post:

Threshold | 10k | 30k | 50k

70% immune | 6/4 | 4/27 | 4/9

80% immune | 6/27 | 5/11 | 4/21

90% immune | 7/19 | 5/24 | 5/1

Edit: After talking to Alvaro again, I am less confident about antibody loss. See footnote 6 for a revised table.

I hope this is of interest, but do not let the table of results fool you into thinking this is a rigorous model with well tested assumptions. It assumes, in decreasing order of certainty:

  • Vaccines last several years
  • Antibodies last 8 months [6]
  • One administered dose is "worth" 50% as much as a full infection
  • There is a 2.4x multiple between December's confirmed cases and actual infections
  • No one who already has antibodies receives a vaccine
  • We administer 50,000 more vaccines each day than the day before [7]
  • Confirmed cases remain at 200,000 / day

In particular, the last two are totally up in the air.

There is a new strain, soon to be a new administration, and we can still do dramatically better than we have done so far. Predictions are helpful, but the important thing is to actually create the future we want.

Even stupid models can be useful. In this case, I hope the findings illustrate how sensitive our timeline is to an accelerated vaccination schedule, and highlight the urgency of ramping up distribution.


Original Article Models in Google Sheets Data from Our World in Data on cases and vaccines Multiples:

  • 6.8x: Anand et al.
  • 7.7x: Reese et al.
  • 4.6x: CDC
  • 4.2x: CDC, computed based on 83 million actual vs 20 confirmed
  • 2.4x: Computed, "With 52.9 million actual cumulative cases as of 9/30 and 83.1 as of 12/31, we can infer 30.2 million actual new cases in between. By comparison, confirmed cumulative cases rose from 7.27 million to 20.03 million in the same period, for 12.76 million confirmed new cases. Using this estimate, the confirmed-to-actual multiple since September is 2.4x." The 52.9 is from Reese et al, 83.1 from CDC. Confirmed cases from Our World in Data.
Appendix: Details on Parameter Values and Questionable Assumptions

So far, out model has relied on a number of untenable assumptions:

  1. Cases will remain at December levels
  2. Antibodies last indefinitely
  3. There are no vaccinations

Forecasting cases Cumulative cases have been rising exponentially at a fairly consistent rate since April, so it might feel easy to project into the future.

Having said that, I am not very confident that the trend will hold. Given that we are ramping up vaccine distribution, facing a more transmisible strain, and launching a new administration, there is much more uncertainty to come. [4]

I'll continue to use December's estimated rate of 4.6%, and accept that I am making the same mistake as Nayer in assuming no growth, with the hope that I am at least doing so with better reason. Let this be an additional warning that this model is purely for illustrative purposes, and should not be taken too literally.

Antibodies With regards to antibodies, there appears to be some ongoing controversy. A recent study from Science Immunology found "infection generates long-lasting B cell memory up to 8 months post-infection"; however, another second study suggests it might be shorter. Discussion of the conflict here.

In 8 months, we will start to see more and more re-infections as time goes on. There were 1.5 million confirmed cases 8 months ago, which is 11.6 million using the 7.7x multiple. It is possible all of their antibodies have now "expired".

If we have to wait another 6.2 months, everyone infected until November 25th could lose their antibodies as well. That's 12.9 million confirmed cases, or 59.3 million actual cases using the CDC's 4.6x multiple.

As a first approximation, that's another 5 month delay, but note that it cascades. As we wait to "make up" for the 59.3 million lost antibodies, more and more people's antibodies will "expire".

At a monthly infection rate of 4.6% and 8 month "shelf-life" for antibodies", we will never be able to hit more than 36.8% immunity at any time. Under this model, we never achieve herd immunity at current infection rates, even for conservative estimates.

In absolute numbers, 70% herd immunity would mean 231 million people with antibodies simultaneously. If antibodies last 8 months, that means we would need to hit 29 million cases per month, and sustain that continuously for 8 months. That's all assuming that everything immediately clears up on the day we achieve herd immunity.

Given out current growth rate, and the increased transmissibility of a new strain, those numbers might be more achievable than they sound. Our recent high of 0.25 million cases in a single day (7-day rolling average) extrapolates to 7.6 million cases per month. With the 2.4x multiple, that's 18.2 million cases.

Although I say "achievable", this would not actually be a good thing. We would defeat the virus, but only through immense human sacrifice.

Vaccines Okay, so it's looking quite bad, can vaccines save us? You may have heard that vaccines are 90% or 95% effective, but that's for preventing symptoms, not preventing transmission through asymptomatic infection.

A Moderna report to the FDA writes:

Amongst baseline negative participants, 14 in the vaccine group and 38 in the placebo group had evidence of SARS-CoV-2 infection at the second dose without evidence of COVID-19 symptoms. There were approximately 2/3 fewer swabs that were positive in the vaccine group as compared to the placebo group at the pre-dose 2 timepoint, suggesting that some asymptomatic infections start to be prevented after the first dose.

More recently, Tyler Cowen cites this article claiming that Pfizer vaccine is very effective in preventing transmission. The author writes "Data from 102 subjects shows 98% of them developed significant presence of antibodies; survey's editor says participants most likely won't spread the disease further" I am not sure what "most likely" means, but I'll take it at face value.

Okay, so we have data on sterilizing immunity and vaccine administration, the problem is we don't know how much of the latter is first vs. second doses. I also don't know if being "66% immune" is worth 66% as much as full immunity. So a few simplifying assumptions:

  • Each vaccine dose is "worth" 50% of full immunity
  • No one who already has antibodies receives a vaccination
  • We administer 50,000 more vaccines each day than the day before [5]

Using this model (available here), I estimate 70% immunity on April 2nd, and 90% immunity on April 24th.

With sufficiently high vaccination, it turns out lost antibodies are just not that big a deal. 8 months before April 24th was August 24th, at which point we had 5.73 million confirmed cases. Using the CDC's 7.7 multiple, that's 44.1 million actual.

But even if 27 million people lose their antibodies, our model has vaccinations at nearly 6 million / day by April 24th, so the delay isn't that costly. Incorporating antibody loss, we only get pushed back to April 17th for 70% immunity, and May 6th for 90%.

There is also a cascading loss of antibodies between April 24th and May 6th, but this only pushes out estimates by another day or so.

What if 50,000 more vaccines per day is too optimistic? Alvaro mentions this Metaculus estimate giving 82.5 million by May 13th. Note that the 82.5 million refers not to administered doses, but to people who have completed both vaccinations, so this is 165 million doses total. That's consistent with around 10,000 more vaccines per day, rather than the 50,000 I suggest.

Frequently Asked Questions

Why do you care? Stat isn't an academic publication and it's not peer reviewed. No, but they are widely acclaimed, and often cited on Marginal Revolution. Until now, I would have felt confident taking their word at face value.

How poorly does this reflect on Stat? To Stat's credit, Dr. Nayer is not a regular contributor. His forecast was also not presented as a serious prediction, but was mostly intended to illustrate the importance of vaccines. Even there, it is bad that he made these basic errors, and it is bad that Stat did not fact check his writing.

Anyone can make mistakes. If you're emboldened by my findings, you should go and run checks against more articles and try to find additional errors. Perhaps this is a one-off mistake, or perhaps there is a more systematic problem.

Why do you use different multiples at different points? The CDC estimates a 4.6x multiple overall, but previously reported a 7.7x multiple for data up to September. Based on those numbers, I inferred a 2.4x multiple for data after September.

In section 4, I use a 7.7x multiple for cases before September to estimate antibody loss. I also use a but a 2.4x multiple for December's cases which I'm using as our monthly infection rate. I also use the overall 4.6x multiple in one paragraph referring to data across a broad range of time:

If we have to wait another 6.2 months, everyone infected until November 25th could lose their antibodies as well. That's 12.9 million confirmed, or 59.3 million actual using the CDC's 4.6x multiple.

Okay, but really, when can I go outside? I have no idea. If you put a gun to my head, I would say cases rise more than expected, and vaccinations go worse than expected, but I don't know how those factors balance out. Maybe early summer, but it is still in our collective power to do better.

This isn't a question, I just need a reason to feel optimistic. I have been using a flat rate of infections, but they have been growing quite rapidly historically. If this remains true, the timeline would be greatly accelerated. A new strain might increase infections as well. That's all bad news for America, but if you're a cautious introvert taking appropriate precautions, it might be good news for you.

There is also hope on the vaccine side. Biden claims the Trump administration is to blame for distribution delays. I don't know if this is true, but it could be, and it could mean improved distribution starting today! So far we have seen vaccines administered per day increase rapidly, but there may be a 2nd degree acceleration as well (i.e. the daily increase is itself increasing).

Also note that if you live in a hot spot, your region may achieve herd immunity before the nation as a whole.


[1] I am not an epidemiologist, but for the record, neither is he. As per his bio on StatNews: "Zach Nayer is a transitional year resident physician at Riverside Regional Medical Center in Newport News, Va., and an incoming ophthalmology resident at Harkness Eye Institute at Columbia University in New York City."

[2] 4.2 is the multiple I get by dividing 83.1 million by 20 million reported cases, but the CDC states a multiple of 4.6 for "total COVID--19 infections were reported". I don't know how to explain the discrepancy.

[3] The CDC's 95% UI for "total COVID--19 infections were reported" is a multiple of 4.0 -- 5.4. Anand et al. report 9.3% with a 95% CI of 8.8%--9.9%. Reese et al. does not provide a CI for the 7.7x, but gives an related 7.1x multiple a 95% UI of 5.8-9.0.

[4] If you're curious, you can look at Zvi's toy models.

[5] This is really just guesswork. 50,000 is based on the rate of increase from January 5th to January 15th. If you started counting 1/1 you would get 35,000, and if you started 12/21 you would get 30,000. Using 10,000 gets us consistent with the Metaculus estimate.

[6] I expressed confidence after seeing "8 months" cited in multiple reports, but this may be limited by the data we have available. It seems the studies may actually be saying "at least 8 months". From Dan et al.:

Overall, at 5 to 8 months PSO, almost all individuals were positive for SARS-CoV-2 Spike and RBD IgG.... Notably, memory B cells specific for the Spike protein or RBD were detected in almost all COVID-19 cases, with no apparent half-life at 5 to 8 months post-infection... These data suggest that T cell memory might reach a more stable plateau, or slower decay phase, beyond the first 8 months post-infection.

Thanks to Alvaro for pointing this out. Here's a revised table of results, removing antibody loss considerations from the model:

Threshold | 10k | 30k | 50k

70% immune | 5/15 | 4/16 | 4/2

80% immune | 6/6 | 4/30 | 4/13

90% immune | 6/26 | 5/13 | 4/24

[7] I do worry that there's some kind of logistical maximum rate of vaccinations, and it is not realistic to think we could ever be at 6 million / day. You may have heard that NYC alone did 400,000 vaccines / day in 1947, but that was a very different problem. Note also that this depends on vaccines actually being accepted! As I wrote in the appendix here, trust is still low, though it depends on who you ask, and may increase as more people get the vaccine.