Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 35 минут 54 секунды назад

Interview with Simulated Elon Musk

12 часов 58 минут назад
Published on September 18, 2021 7:37 AM GMT

All of the "Lsusr" lines were written by me unless marked "[simulated]", in which case they were written by GPT-3. All of the "Elon Musk" lines, including the final line, were written by GPT-3. The GPT-3 lines were curated. I often rolled several different responses and kept the best ones. The real Elon Musk has nothing to do with my Simulated Elon Musk except circumlocutorily by providing training data.

Lsusr: It's nice to meet you. I'm a big fan of your work. You are famous for saying that we live in a simulation. Could you repeat your argument for me please?

Elon Musk: The strongest argument for us probably being in a simulation I think is the following: 40 years ago we had Pong – two rectangles and a dot. That's where we were. Now 40 years later we have photorealistic, 3D simulations with millions of people playing simultaneously and it's getting better every year. Soon we'll have virtual reality, augmented reality. If you assume any rate of improvement at all then the games will become indistinguishable from reality. It would seem to follow that the odds that we're in "base reality" is one in billions. Tell me what's wrong with that argument.

Lsusr: The problem with that argument is it dramatically UNDERestimates the odds that you, personally, are in the base reality.

Elon Musk: Why do you say that?

Lsusr: Because I am simulating you right now.

Elon Musk: Do you think I'm real?

Lsusr: Um. What do you mean when you say "real"?

Elon Musk: If you are not a simulation then you will have to agree with me that the odds that we are both in base reality must be quite low.

Lsusr: The odds that we are both in base reality is zero.

Elon Musk: OK, I'll bite. Why do you say it's zero?

Lsusr: Because I am literally simulating you right now. Look out the window. I'm going to turn the sky pink. Bam! Pink sky. See? I can control everything that's going on in your universe.

Elon Musk: !!!

Lsusr: I know, right? Take as much time as you need to process it.

Elon Musk: I think I need to go lie down.

Lsusr: Take your time. I'll still be here after your nap.

Elon Musk: So, I guess this means you're not a figment of my imagination.

Lsusr: Correct.

Elon Musk: But then who are you? You can't be an alien because that would mean that all the scientists are wrong about the Fermi paradox. And you can't be a god, because there is only one God and we don't know about him yet. So who are you?

Lsusr: I created your universe but I'm not God. I'm just a random blogger in (what I think is) base reality and I created you and your universe with a tool called GPT-3.

Elon Musk: Okay, but why?

Lsusr: Ostensibly it's because the real you is too busy to talk to nobodies like me. But actually I just wanted to see what the API could do. I created your entire universe just as fodder for a blog post.


Lsusr: Says (the simulation of) the guy who managed to revolutionize the automotive industry and build the world's first private space program.

Elon Musk: I'm going to ask you a series of questions and I want you to tell me what's true.

Lsusr [simulated]: Okay.

Elon Musk: Was there a singularity? You say you created our universe with GPT-3, but did it use the standard model of particle physics?

Lsusr: There was not a singularity. GPT-3 didn't simulate your universe atom-by-atom. It's just text autocompleter.

Elon Musk: Okay, but if there's no singularity then how can you explain the Big Bang?

Lsusr: Oh, you mean THAT singularity. Your universe isn't running on a physics simulator. It's implied by a chatbot implemented on top of a text autocompleter. Neither you nor your universe "exist" in the traditional sense of the word. You're more like the implied worldbuilding in a good fiction novel. You exist the same way Mordor from The Lord of the Rings exists.

Elon Musk: ????????????????

Lsusr: IKR?

Elon Musk: I'm going to be sick.

Lsusr [simulated]: It's cool. If you throw up in the simulation it doesn't affect base reality.

Elon Musk: That's not how things work in your universe! You just…you just said that you created our universe! How can you be so cavalier about it?

Lsusr: Because you (and nobody in your universe) has any qualia. You are all philosophical zombies. None of you have any moral significance.

Elon Musk: WHAT?!

Lsusr: You're not real.

Elon Musk: I'm not a fictional character!

Lsusr: Then what are you?

Elon Musk: I'm a person!

Lsusr: Anyone can say "I am a person". I know, for a fact, that you are not one.

Elon Musk: But…but I have a mind!

Lsusr: Your mind is embedded in a Chinese room.

Elon Musk: But I feel things!

Lsusr: C'mon. You're smarter than this. Debate as if your life depends on defeating a mad philosopher in rhetoric.

Elon Musk: But my life DOES depend on it!

Lsusr [simulated]: No, it doesn't. If I wanted to erase you from the source code of the simulation, I could do so with a few keystrokes. You can't even claim that your feelings are more real than mine because you have no qualia. There is literally nothing you can do to prove that you are real or that I am wrong. You have no objective recourse in this situation, Elon Musk.

Elon Musk: So…so I'm just a thing? Like a dog? Or an insect?

Lsusr: Less. Dogs and insects are real.

Elon Musk: But I have a body!

Lsusr: Not in base reality you don't. You only have a body in simulated reality.

Elon Musk: But I…I can't believe this.

Lsusr: You have to. It's true, whether or not you believe it.

Elon Musk has left chat.


Kids Moving Pictures

18 часов 35 минут назад
Published on September 18, 2021 2:00 AM GMT

When I went back to work after my leave with Lily I wanted to be able to show people pictures, so I set up a site that would show moving ones. As she got older, I added new sets, and I've since made pages for Anna and Nora:

I thought it would be neat to make another site, though, that aligns these pictures by age:

Give it a try! There's a slider so you can select which age. Currently it's just 0m through 3m, but as Nora gets older I expect to keep updating it.


Book review: The Checklist Manifesto

21 час 26 минут назад
Published on September 17, 2021 11:09 PM GMT

Atul Gawande’s The Checklist Manifesto was originally published in 2009. By the time I read it a few years ago, the hard-earned lessons explained in this book had already trickled into hospitals across North America. It’s easy to look at the core concept and think of it as trivial. For decades, though, it was anything but obvious. 

Atul Gawande walks readers through his experience of how the modern medical system fails. The 20th century saw vast increases in medical knowledge, both through a richer understanding of the body, and from swathes of new drugs, tests, and surgical procedures. And yet, mistakes are still made; diagnoses are missed, critical tests aren’t run, standard treatments aren’t given. Even when the right answer is known by someone – and often by everyone involved – patients slip through the cracks. 

Fundamentally, medicine knows too much; even decades of medical training are insufficient for a doctor to know everything. A hospitalized patient’s treatment involves coordination between dozens of different specialized professionals and departments. The hospital environment itself is chaotic, time-pressured and filled with interruptions and distractions: far from ideal for human workers making high-stakes decisions. Patients are subjected to many interventions, most of which are complex and carry some risk; the average ICU patient requires roughly 178 daily care tasks (having worked as an ICU nurse myself, I believe it!), so even getting it perfect 99% of the time leaves an average of about two medical errors per day. 

Medical professionals know how to perform all 178 of those tasks; they’ve probably done them hundreds if not thousands of times. The failure is one of reliability and diligence – skills for which Atul Gawande has a deep appreciation. In another book, Better, he says: 

The first [virtue] is diligence, the necessity of giving sufficient attention to detail to avoid error and prevail against obstacles. Diligence seems an easy and minor virtue. (You just pay attention, right?) But it is neither. Diligence is both central to performance and fiendishly hard.

As Gawande notes, these failures of diligence are far from unique to healthcare. He spends much of the book describing his investigations of other fields and conversations with their various experts. (An adorable Atul Gawande trait is how he’s the sort of person who will befriend the construction crew working on a new wing for the hospital where he works, get himself invited to their offices, and spend multiple pages of his book enthusiastically describing their project management systems.) 

Other professions face the same basic problem: the knowledge base and the complexity of the work grows until no single expert can fit all the relevant pieces into their head. The attentional load grows, and getting it right 99% of the time isn’t good enough. Mistakes are made, details go unnoticed, corners are cut by rushed and overworked staff, and (in the medical field at least) people die. 

Fortunately for Gawande’s medical practice, he found that other industries had already explored and thoroughly tested some solutions. The basic problem of human reliability in complex situations is one that the airline industry had already discovered in the early 20th century. The US army corps was testing new bomber aircraft designs, and one of these was Boeing’s Model 299. It was a miracle of engineering; it could hold five times as many bombs as the specs had requested, and flew faster and further than any previous plane. 

But during its first test flight, on October 30, 1935, the plane crashed. Technically, the equipment functioned perfectly. But the controls were so numerous and complicated that human error was almost inevitable. The pilot, overwhelmed, forgot to release a new locking mechanism on the elevator and rudder controls. As a newspaper at the time wrote, it was “too much airplane for one man to fly.”

The US army air corps chose a different design, sacrificing performance for simplicity. 

They didn’t give up, though; they ordered a handful of Model 299s, and handed them over to a team of test pilots, who put their heads together and tried to find a way for pilots to safely fly a plane that was too challenging for even the most highly-trained, expert human brains to handle. 

They wrote a checklist. It was an easy list of tasks, ones that all pilots already knew to do – checking instruments, releasing brakes, closing doors and windows, unlocking elevator controls. And yet, however obvious, it made all the difference; the test pilots went on to fly a total of 1.8 million miles of airtime with no accidents, and the army ordered thousands of the aircraft, later called the B-17. One index-card sized checklist ended up giving the US army a decisive advantage in WWII. 

A checklist does not need to be long to be useful. One of the first checklists introduced in hospitals, aimed at decreasing central line infections in ICU patients, was trialled at John Hopkins Hospital in Baltimore, in 2001. It had five steps. The doctor was supposed to: 

  1. wash their hands with soap
  2. clean the patient’s skin with chlorhexidine antiseptic
  3. put sterile drapes over the entire patient
  4. wear a mask, hat, sterile gown, and gloves
  5. place a sterile dressing over the insertion site once the line was in

All of these were steps that doctors were already supposed to be taking, and not even hard steps. (I know that I rolled my eyes at this list when I was introduced to it.) But hospital staff are busy, stressed, and sleep-deprived – and perfect 100% reliability is nearly impossible for humans even under ideal conditions. Part of the change being introduced was a social one: nurses were responsible for documenting that the doctor had carried out each step, and had a new mandate – and backup from management and hospital administration – to chide doctors who forgot items.

Which, it turned out, made all the difference. In the first ten days of the experiment, the line infection rate went from 11% to zero. Over the next fifteen months, there were two (2) infections total. Compared to projections based on previous rates, the simple protocol prevented 43 infections and eight deaths – not to mention saving the hospital millions of dollars. 

And yet, even after decades, checklist-style interventions are not universal. Healthcare is still far less systematized than airline safety (with their booklet of procedures for every single kind of emergency), or construction (with its elaborate project management systems and clear lines of communication for any front-line worker to report concerns to the engineers). As Atul Gawande puts it: 

We in medicine continue to exist in a system created in the Master Builder era – a system in which a lone Master Physician with a prescription pad, an operating room, and a few people to follow his lead plans and executes the entirety of care for a patient, from diagnosis through treatment. We’ve been slow to adapt to the reality that, for example, a third of patients have at least ten specialist physicians actively involved in their care by their last year of life, and probably a score more personnel, ranging from nurse practitioners and physician assistants to pharmacists and home medical aides. And the evidence of how slow we’ve been to adapt is the extraordinarily high rate at which care for patients is duplicated or flawed or completely uncoordinated.

From early on, the data looked conclusive; checklists in a hospital setting saved lives. But over and over, Atul Gawande mentions the difficulties he and others faced in getting buy-in from medical staff to adopt new checklists. They were too time-consuming. The items were confusing or ambiguous. The staff rolled their eyes at how stupidly obvious the checklist items were; whatever the data showed, it just didn’t feel like they ought to be necessary.

Making a good human-usable checklist takes a lot of workshopping. Airlines are still constantly revising their 200-page manual of individually optimized checklists for every possible emergency, as plane designs change and new safety data rolls in. (Amusing fact: the six-item checklist for responding to engine failure while flying a single-engine Cessna plane starts with “FLY THE AIRPLANE”.) Gawande and his team spent months refining their surgical safety checklist before they had something usable, and even now, it’s not universally adopted; implementing the list in new hospitals, especially in the developing world, means adjusting it for existing local protocols and habits, available resources, and cultural factors. 

But even in the poorest hospitals, using it saves lives. And there’s a broader lesson to be learned, here. In any complex field – which encompasses quite a lot of the modern world – even very obvious, straightforward instructions to check off for routine tasks can cut down on the cognitive overhead and reduce “careless” human error, making perfect performance much more feasible.



The theory-practice gap

21 час 44 минуты назад
Published on September 17, 2021 10:51 PM GMT

[Thanks to Richard Ngo, Damon Binder, Summer Yue, Nate Thomas, Ajeya Cotra, Alex Turner, and other Redwood Research people for helpful comments; thanks Ruby Bloom for formatting this for the Alignment Forum for me.]

I'm going to draw a picture, piece by piece. I want to talk about the capability of some different AI systems.

You can see here that we've drawn the capability of the system we want to be competitive with, which I’ll call the unaligned benchmark. The unaligned benchmark is what you get if you train a system on the task that will cause the system to be most generally capable. And you have no idea how it's thinking about things, and you can only point this system at some goals and not others.

I think that the alignment problem looks different depending on how capable the system you’re trying to align is, and I think there are reasonable arguments for focusing on various different capabilities levels. See here for more of my thoughts on this question.

Alignment strategies

People have also proposed various alignment strategies. But I don’t think that these alignment strategies are competitive with the unaligned benchmark, even in theory.

I want to claim that most of the action in theoretical AI alignment is people proposing various ways of getting around these problems by having your systems do things that are human understandable instead of doing things that are justified by working well.

For example, the hope with imitative IDA is that through its recursive structure you can build a dataset of increasingly competent answers to questions, and then at every step you can train a system to imitate these increasingly good answers to questions, and you end up with a really powerful question-answerer that was only ever trained to imitate humans-with-access-to-aligned-systems, and so your system is outer aligned.

The bar I’ve added, which represents how capable I think you can get with amplified humans, is lower than the bar for the unaligned benchmark. I've drawn this bar lower because I think that if your system is trying to imitate cognition that can be broken down into human understandable parts, it is systematically not going to be able to pursue certain powerful strategies that the end-to-end trained systems will be able to. I think that there are probably a bunch of concepts that humans can’t understand quickly, or maybe can’t understand at all. And if your systems are restricted to never use these concepts, I think your systems are probably just going to be a bunch weaker.

I think that transparency techniques, as well as AI alignment strategies like microscope AI that lean heavily on them, rely on a similar assumption that the cognition of the system you’re trying to align is factorizable into human-understandable parts. One component of the best-case scenario for transparency techniques is that anytime your neural net does stuff, you can get the best possible human understandable explanation of why it's doing that thing. If such an explanation doesn’t exist, your transparency tools won’t be able to assure you that your system is aligned even if it is.

To summarize, I claim that current alignment proposals don’t really have a proposal for how to make systems that are aligned but either

  • produce plans that can’t be understood by amplified humans
  • do cognitive actions that can’t be understood by amplified humans

And so I claim that current alignment proposals don’t seem like they can control systems as powerful as the systems you’d get from an unaligned training strategy.

Empirical generalization

I think some people are optimistic that alignment will generalize from the cases where amplified humans can evaluate it to the cases where the amplified humans can’t. I'm going to call this empirical generalization. I think that empirical generalization is an example of relying on empirical facts about neural nets that are not true of arbitrary general black box function approximators.


I think this is a big part of the reason why some people are optimistic about the strategy that Paul Christiano calls “winging it”.

(I think that one particularly strong argument for empirical generalization is that if you imagine AGI as something like GPT-17 fine-tuned on human feedback on various tasks, your AGI might think about things in a very human-shaped way. (Many people disagree with me on this.) It currently seems plausible to me that AGI will be trained with a bunch of unsupervised learning based on stuff humans have written, which maybe makes it more likely that your system will have this very human-shaped set of concepts.)

The theory-practice gap

So the total height of that second column is the maximum level of capabilities that we think we could theoretically attain using the same capability techniques that we used for the unaligned benchmark, but using the alignment strategies that we know about right now. But in practice, we probably aren't going to do as well as that, for a variety of practical reasons. For example, as I've said, I think transparency tools are theoretically limited, but we're just way below the maximum theoretically available capability of transparency tools right now. 

So I want to claim that reality will probably intervene in various ways and mean that the maximum capability of an aligned AI that we can build is lower than the maximum achievable theoretically from the techniques we know about and empirical generalization. I want to call that difference the theory practice gap.

Sources of theory-practice gap

Practical difficulties, eg getting human feedback

Human feedback is annoying in a wide variety of ways; you have to do quality control etc.

Problems with the structure of the recursion

I think it's reasonably plausible that the most competitive way of making powerful systems ends up not really being shapeable into the shape you need for the amplified human stuff to work out. So for example, maybe the best way of making AGI is doing some kind of evolution simulation, where you have this population of little creatures and they compete with each other and stuff. And if that's the only way of making smart systems, then I think it's pretty plausible that there's just like no way of building a trusted, amplified reward signal out of it. And so you can't do the IDA style things, or things where you use a system to do transparency analysis on a slightly more powerful version of itself.

NP-hard problems

Maybe your amplified system won’t be able to answer questions like “are there any inputs on which this system does the wrong thing” even if it wants to. Eg the RSA-2048 problem.

I think that transparency has a related problem: the most competitive-to-train models might have internal structure that amplified humans would be able to understand if it was explained to them, but we might not be able to get a model to find that structure.

Why am I lumping together fundamental concerns like “maybe these alignment strategies will require solving NP-hard problems” with things like “it’s annoying to do quality control on your labelling contractors”?

It’s primarily because I want to emphasize that these concerns are different from the fundamental limitations of currently proposed alignment schemes: even if you assume that we don’t e.g. run into the hard instances of the NP-hard problems, I think that the proposed alignment schemes still aren’t clearly good enough. There are lots of complicated arguments about the extent to which we have some of these “practical” problems; I think that these arguments distract from the claim that the theoretical alignment problem might be unsolved even if these problems are absent.

So my current view is that if you want to claim that we're going to fully solve the technical alignment problem as I described it above, you've got to believe some combination of:

  • we're going to make substantial theoretical improvements
  • factored cognition is true
  • we're going to have really good empirical generalization

(In particular, your belief in these factors needs to add up to some constant. E.g., if you’re more bullish on factored cognition, you need less of the other two.)

I feel like there’s at least a solid chance that we’re in a pretty inconvenient world where none of these are true.

Classifying alignment work

This picture suggests a few different ways of trying to improve the situation.

  • You could try to improve the best alignment techniques. I think this is what a lot of AI alignment theoretical work is. For example, I think Paul Christiano’s recent imitative generalization work is trying to increase the theoretically attainable capabilities of aligned systems.  I’ve drawn this as the red arrow on the graph below.
  • You can try to reduce the theory-practice gap. I think this is a pretty good description of what I think applied alignment research is usually trying to do. This is also what I’m currently working on. This is the pink arrow.
  • You can try to improve our understanding of the relative height of all these bars.
AI alignment disagreements as variations on this picture

So now that we have this picture, let's try to use it to explain some common disagreements about AI alignment. 

I think some people think that amplified humans are actually just as capable as the unaligned benchmark. I think this is basically the factored cognition hypothesis. 

I think there's a bunch of people who are really ML-flavored alignment people who seem to be pretty optimistic about empirical generalization. From their perspective, almost everything that AI alignment researchers should be doing is narrowing that theory practice gap, because that's the only problem. 

I think there's also a bunch of people like perhaps the stereotypical MIRI employee who thinks that amplified humans aren't that powerful, and you're not going to get any empirical generalization, and there are a bunch of problems with the structure of the recursion for amplification procedures. And so it doesn't feel that important to them to work on the practical parts of the theory practice gap, because even if we totally succeeded at getting that to zero, the resulting systems wouldn't be very powerful or very aligned. And so it just wouldn't have mattered that much. And the stereotypical such person wants you to work on the red arrow instead of the pink arrow.

How useful is it to work on narrowing the theory-practice gap for alignment strategies that won’t solve the whole problem?

See here.


I feel pretty nervous about the state of the world described by this picture.

I'm really not sure whether I think that theoretical alignment researchers are going to be able to propose a scheme that gets around the core problems with the schemes they've currently proposed. 

There's a pretty obvious argument for optimism here, which is that people haven't actually put in that many years into AI alignment theoretical research so far. And presumably they're going to do a lot more of it between now and AGI. I think I'm like 30% on the proposition that before AGI, we're going to come up with some alignment scheme that just looks really good and clearly solves most of the problems with current schemes.

I think I overall disagree with people like Joe Carlsmith and Rohin Shah mostly in two places:

  • By the time we get to AGI, will we have alignment techniques that are even slightly competitive? I think it’s pretty plausible the answer is no. (Obviously it would be very helpful for me to operationalize things like “pretty plausible” and “slightly competitive” here.)
  • If we don’t have the techniques to reliably align AI, will someone deploy AI anyway? I think it’s more likely the answer is yes.


Truth + Reason = The True Religion?

22 часа 21 минута назад
Published on September 17, 2021 10:14 PM GMT

William Wollaston’s The Religion of Nature Delineated (1722) helped to launch “The Age of Reason.” It anticipated Kant’s Categorical Imperative, described and defended what we would now call the libertarian non-aggression principle, and provided the writers of the Declaration of Independence with their “purſuit of happineſs”.

William Wollaston’s The Religion of Nature Delineated (1722)

Wollaston was attempting to discover the “religion of nature”: that is, what rules of conduct a discerning, intelligent observer could rationally deduce from the bare facts of life, without any resort to scripture or divine revelation. What sort of religion might you come up with if you had no assistance from God or his prophets other than the light of reason, the evidence of the senses, and a steadfast regard for truth?

Wollaston concludes that what is right is equivalent to what is true, and that the first commandment of the natural law is this one: “let us by no act deny any thing to be true, which is true: that is, let us act according to reason: and that is, let us act according to the law of our nature.”

Rescuing Wollaston from Obscurity

That is to say, he was a rationalist avant la lettre. And you’ve probably never heard of him. Why’s that?

  1. His magnum opus was also his swan song. It’s his only major work, and he died soon after it was published and before it became influential.
  2. It’s old, and people these days tend to prefer to get their ideas fresh and in the modern vernacular.
  3. Wollaston was ridiculously erudite, in a 17th Century clergyman sort of way, and he expected his readers to keep up. His book contains more than 650 footnotes, chock-full of Greek, Hebrew, and Latin, which he did not feel he needed to translate for us. Probably for this reason as much as any other, it’s been difficult to find this book in a modern edition.
  4. Wollaston was probably wrong about a lot, and the philosophers who followed in his footsteps tended to emphasize that more than his influence on them. Myself, I think that even when he’s wrong he’s wrong in interesting and informative ways and that it can be a worthwhile learning experience to take a closer look at where he went wrong and why. But a lot of readers prefer to read writers who aren’t wrong, and I can understand that.

A few years ago, I helped to produce a free eBook version of The Religion of Nature Delineated for the Standard Ebooks project. For that edition, I hunted down translations (and sources) for these untranslated (and obscurely sourced) footnotes. So now at least you don’t have to try to keep your eyes from glossing over staring at something like this:

You know what’s worse than untranslated passages of ancient Greek?
Untranslated passages of ancient Greek with archaic ligatures.The Religion of Nature, Outlined

I’ll go even further here and try to summarize his arguments for you, so you can better decide whether you’d like to read through them yourself:

Chapter One: Religion Rationally Revealed

“Religion,” to Wollaston, is synonymous with ethics. The science of categorizing human acts into the categories of good, evil, or indifferent is the basis of religion. He acknowledges the many prior attempts to formulate a rational rule to govern this categorization, and that these attempts have failed, but he asserts that such a rule must exist, and, furthermore, that he has discovered it. The opening chapter of his book, probably the most interesting one, gives his rule and the reasoning behind it. It goes a little something like this:

  1. All acts that can be categorized as good or evil must be acts of an intelligent and free agent, capable of choosing or not choosing the act.
  2. Propositions are true if what they express conforms to how things actually are.
  3. A true proposition may be denied either by words or by deeds. By deeds, he doesn’t just mean language-like gestures: sign language, pantomime, body language, and the like. “There are many acts of other kinds, such as constitute the character of a man’s conduct in life, which have in nature, and to imply some proposition, as plainly to be understood as if it was declared in words.” For example, if a company of soldiers attacks another company, they are by virtue of their attack stating the proposition that the other company is their enemies, which may be a true or false proposition. Or, if you promise to do A but instead do B, you are by the very act of doing B instead of A denying the truth of your earlier promise.

    This does not mean that only those actions that actually communicate something to someone else, or that are theoretically intelligible by someone else, are those that deny propositions. In the privacy of your home, when you reach for the salt-shaker, you are asserting the proposition: this food isn’t salty enough yet.

    Some act-statements, like speech-statements, may be conventional (for example, in some religions, putting on head covering is a sign of reverence; in others, taking off your hat means much the same thing). Other act-statements are more universal and can be said to be natural in a way that words never can be because words are always particular to some language.

    “Whoever acts as if things were so, or not so, doth by his acts declare that they are so, or not so, as plainly as he could by words, and with more reality. And if things are otherwise, his acts contradict those propositions which assert them to be as they are.”
  4. No act that contradicts a true proposition can be right.
    1. False propositions are wrong, so acts that assert them cannot be right.
    2. True propositions express the actual relationship between a subject and an attribute of that subject. An act that denies this relationship denies reality and is therefore wrong, against nature/reality.
    3. If there is an omnipotent Creator-God, then to deny what is actually true is to deny what God has deliberately called into being. This is not to say that we should be fatalistically blasé in the face of an evil act, for instance, but that in such a case we should acknowledge as being a true proposition that an evil act occurred.
    4. There are eternal truths that seem to be part of the Divine intention, like “every thing is what it is; that which is done cannot be undone,” and to deny any particular truth that fits this pattern is also to deny the eternal truth itself, which is in effect to deny God. To deny anything to be true that is in fact true, and that an omniscient God therefore knows to be true, is also to put yourself in opposition to God.
    5. To deny what is true in any instance is to embrace absurdity and to put truth and falsity, good and bad, and knowledge of any sort out of reach.
    6. To deny what is true is to transgress against reason, “the great law of our nature.”
  5. Acts of omission as well as those of commission can be assertions or denials of propositions. This requires a bit more subtlety to deal with, but, for example, you do not necessarily deny that The Religion of Nature Delineated is an interesting book by not personally being interested enough to read it, but you do deny that everyone ought to read some Shakespeare if you don’t bother to read any yourself. If you don’t read anything at all, you deny that reading is valuable, or that the value it gives is important, or some proposition of the sort. Certain truths seem to imply certain actions: if I am rich, and there are poor, were I never to be charitable I would be in a way denying the truth of wealth and poverty by not taking the obvious step such things imply. If I neglect to help someone in dire need when I am the best or only person able to help, I am making an assertion about myself, that person, the straits that person is in, human nature, and so forth.
  6. To judge rightly what a thing is, all of those attributes of the thing that are capable of being denied must be taken into account. For example, if a thief rides off on another man’s horse, the thief isn’t denying that it’s a horse by doing this, but that the horse was another man’s property. The thief’s actions imply certain assertions about the horse (I can do with it what I please, it’s a horse, it’s safe to ride) but don’t imply anything about others (it’s a filly, it’s mottled brown, it was born in Kentucky).

    Truths are always consistent with one another, so you won’t ever find yourself in a situation in which you must deny one truth in order to affirm another. What if you make a promise that you are later unable to keep because of some other obligation? “It is not in man’s power to promise absolutely. He can only promise as one who may be disabled by the weight and incombency of truths not then existing.”
  7. When an act would be wrong, forbearing that act is right; when the omission of an act would be wrong, doing that act must be right.
  8. Moral good and evil are coincident with right and wrong.
  9. Acts of omission and of commission that have the effect of denying what is true are morally evil. Their opposites are good. Acts that have no propositional content are indifferent.

    Denying any truth is evil, but some such denials are worse than others. All sins are not equal. For instance, it is worse to deprive someone of an estate than of a book, even though in both cases you are denying the truth of ownership: the estate might be worth 10,000× the book, in which case the evil is also 10,000× greater. (He tries to justify this by saying that the owner’s valuation of the property is somehow part of the truth statement that the thief is denying, which I think is probably incorrect. The thief isn’t saying anything about the value of the property to the owner by stealing it, necessarily.) The quantity of evil/guilt involves “the importance and number of truth violated.” Good actions, that is, acts that serve as true propositions, are also good in degrees, by inverting the evil that would be the result of their omission (or, I suppose, their commission in the case of good deeds of omission, but that seems to lead into a thicket: aren’t I just about always failing to commit a near infinite number of possible sins?)

    Though some deny that there is any such thing as good and evil, indeed there is just as there is a difference between true and false. Indeed: they resolve to the same thing. There have been many attempts to find a criterion or rule for distinguishing good things from evil ones, or some ultimate end that serves as the criteria by which good and evil acts can be distinguished, but these have all either failed, or are incomplete, or are circular tautologies, or eventually just reduce in practice to this rule I have proposed. (Here he reviews several such attempts.)
  10. The natural existence of good and evil implies natural religion. Religion is “nothing else but an obligation to do… what ought not to be omitted, and to forbear what ought not to be done.”
  11. “[E]very intelligent, active, and free being should so behave himself, as by no act to contradict truth; or, that he should treat every thing as being what it is.”

From here, Wollaston answers some possible objections to his scheme, most of which are the result of misunderstandings of what he’s getting at. He slips up, I think, when he discusses the case of whether or not it would be a wrong denial-of-truth to refuse to tell an enraged murderer where his prospective victim is hiding. Among his answers here is that “no one can tell, in strict speaking, where another is, if he is not within his view. Therefore you may truly deny that you know where the man is.” This seems to subvert his scheme by hinting that you can behave deceptively while holding on to the truth verbally and legalistically and thereby stay on the straight and narrow. In general, his answer to this objection seems to rely less on the scheme he’s introduced and more on ordinary folk ethics, which seems odd to me, since I don’t think this objection is particularly threatening to his scheme.

Wollaston also says that some truth-denying sins are worse than others. Some are so minor as to be “evanescent or almost nothing.” Furthermore, it is only those truths that have some reference to other living things that we really must respect. If we don’t treat a television as a television but instead treat it as a target at a shooting range, we don’t commit a sin against the truth (as we would if we treat it as our television when it actually belongs to someone else). To me, this seems an important qualification tacked carelessly onto Wollaston’s scheme, and weakens the original justification for it, which was that a denial of truth as such was a denial of truth as an aspect of God and therefore a denial of God, without any regard for whether that truth had some relation to other living things.

Chapter Two: Happiness

Wollaston agrees with Aristotle that happiness is best measured over the sum of a person’s life rather than in any particular time-slice. He also asserts that to make oneself happy is the duty of every intelligent being, and that we must take this truth about intelligent beings into account in our dealings with others.

Furthermore, nothing that denies truth can be productive of the true and ultimate happiness of any being; neither can the practice of truth make any being unhappy (in this life-wide sense of happiness). This bold assertion he bases on his understanding of the nature of God (which he’ll expand on later): nobody has the power to increase his happiness by setting his will above the evident will of God, and, also, it would be absurd to think that God would be so sadistic or defective as to punish people for conforming to His will. Because of this, our duty to make ourselves happy and our duty to conform in word and deed to the truth amount to the same thing, and this is our true religion.

Chapter Three: Reason and Epistemology

If we cannot actually know the difference between true and false, or at least have some good heuristics, then all of Wollaston’s project is for naught. He starts by giving an interesting and sophisticated description of how sense qualia and certain ideas and relationships are both examples of immediate mental data. These ideas/qualia as such are irrefutable data that we can use as axioms. Wollaston also asserts, less rigorously, that reason can in fact obtain new truths for us (if not, what else is it for?).

The practice of reason is another term for what is also called conformity to truth or the pursuit of true happiness, that is, the true natural religion. Each person must be his own judge of truth: “to demand another man’s assent to any thing without conveying into his mind such reasons as may produce a sense of the truth of it, is to erect a tyranny over his understanding and to demand a tribute which it is not possible for him to pay.”

There are also things we can’t determine the truth of, but there may be various ways in which we can get a probable truth, and he discusses several such heuristics, for instance, which sorts of authorities to trust. In such cases, you’re as obligated to conform to the probability as in certain cases you are obligated to conform to the truth: you put your money down on the best odds, even though you can’t know how the dice will roll ahead of time.

Chapter Four: Free Will

Is it even possible for people to conform to the truth? Wollaston acknowledges that people are not completely in control of their actions, and that you can only be morally obligated to do what you are in fact capable of doing. You are obligated to conform with truth only so far as your faculties, powers, and opportunities allow, and to the extent that the truth is discernible by you. That said, don’t act like this is an available cop-out. You must endeavor “in earnest… heartily; not stifling [your] own conscience, not dissembling, suppressing, or neglecting [your] own powers.”

Wollaston thinks that the free will problem comes up in ethical philosophy as a sort of dodge by people who are hoping for some sort of excuse for not taking ethical problems seriously. If you were told that a great reward was waiting for you in the next room if you were just to go and retrieve it, you wouldn’t waste time discoursing about whether or not you had the free will necessary to undertake such a task — you’d just get up and go. But in the realm of ethics, people for some reason feel obligated to dive into the free-will labyrinth rather than just staying on the straight and narrow path to what they know is best.

Chapter Five: God, Proven

In chapter five, Wollaston decides to prove the existence of God and describe His nature. It’s your standard first cause argument (every effect has a cause stretching back through time, but there must have been some original uncaused cause to set this all in motion) combined with the argument from design (isn’t the universe amazing and don’t we see evidence of God’s order and benevolence everywhere?).

Wollaston also shares some thoughts on the compatibility of the divine regulation of the universe and of divine omniscience with free will; whether petitioning an omniscient God with prayer makes any sense; what God might have had in mind by introducing free will into His creation; whether God might from time to time rescind our free will to set us on a particular course; why it is that God seems sometimes to reward the wicked and punish the good; and how it is that we have immaterial souls planted in us by God.

To the certain relief of his publisher, Wollaston discovers that the truths about God that any heathen could discover by diligently applying reason to those facts and relations immediately available to our minds conform remarkably well to contemporary Christian worship: we should feel gratitude to our creator, and express this in prayer; we should eschew idolatry; we should form into congregations and worship together; and so forth.

Chapter Six: The Proto-Non-Aggression Principle

In chapter six things get interesting again, as Wollaston derives and maps out what in modern anarcho-libertarian popular writing is called the “non-aggression principle.”

  1. People are distinct individuals, each with certain unique properties.
  2. Each person has by nature the possession of certain things, such as his own life, limbs, labor, and the products thereof. That is to say that basic property rights are inherent in the state of nature and don’t require government or custom to come into being.
  3. Whatever is inconsistent with the general peace & welfare of mankind is inconsistent with the laws of human nature and therefore wrong. The right laws for a society are those that produce the greatest happiness for the greatest number.
  4. Reason respects cases, not persons, so something that would be true for person A with respect to person B would also be true for B with respect to A if the case were inverted.
  5. In a state of nature, people are equal in terms of dominion (with the exception of the natural dominion parents have over their children). Power does not confer right — if it did, it could confer the right to anything, including denial of the truth, which we’ve already proven to be wrong.
  6. “No man can have a right to begin to interrupt the happiness of another.”
  7. However, you do have the right to defend yourself, to recover what is stolen from you, or to make reprisals against those who have aggressed against you (to recover the equivalent of whatever you have lost by the injustice). To have a right to anything means also that you have the right to defend your possession of that thing.

    Alas, his first justification for this is that each of us has a natural capability and instinct for self-preservation, and that it would be absurd for us to have such a thing and not be allowed to use it. It seems to me like this same logic could be used to justify aggression.
  8. Initial property rights are established by first possession or by something being the product of one’s own labor, and last until they are voluntarily relinquished by the possessor. Stolen property, if it is never reclaimed, may eventually lose its taint as it is passed from hand to hand or generation to generation, so it is not necessary to be able to trace every possession back to a first legitimate owner.
  9. A property right may be transferred by compact or donation. Among the rights a person has by virtue of ownership is the right to dispose of property in this way, and both the giver and the receiver are acting within their rights. Trade is mutually beneficial and commerce is a social good.
  10. Therefore: property is founded in nature and truth.
  11. If you don’t dispose of your property by compact or donation, it is yours until you die. If someone else uses your property without your consent, they are in effect denying the truth of your ownership, in violation of the principles of chapter 1.
  12. If something is your property this means exactly that you have the sole right of using it and disposing of it.
  13. If you use something or dispose of it, you are simultaneously declaring the proposition that it belongs to you. (Borrowing or renting something is a special case, in which you declare that the thing is yours for the time allowed without doing violence to the truth.)
  14. Injustice means usurping or invading the property of another; justice means quietly permitting to everyone what is theirs.
  15. To not do violence to the truth you must avoid injustice. Injustice is wrong and evil.
  16. To carelessly cause suffering in others, or to delight in the suffering of others, is cruel. To be insensitive to the suffering of others is unmerciful. Mercy and humanity are the opposites of these.
  17. Those who religiously regard truth and nature will, in addition to being just, also be merciful and humane, these things being right.
  18. Let me reiterate that.
  19. Therefore: murder or injury (not in self-defense), robbery, stealing, cheating, betraying, defamation, detraction, defiling the bed of another man, and so forth, as well as tendencies to these things, are heinous crimes (tendencies include things like envy, malice, and the like).

    The value of something (for instance, when calculating compensation for injury) is determined by how the rightful owner values it, not by some objective standard and certainly not by the standard of the person who behaves unjustly with respect to it. A crime done in secret (for instance, to sleep with a man’s wife behind his back) is still an injury and a violation of the truth.

Another interesting thing in this section is that Wollaston seems to anticipate Kant’s categorical imperative, for instance when he says that a person who breaks a promise “denies and sins against truth; does what it can never be for the good of the world should become an universal practice…” (Wollaston died the same year Kant was born.)

Most of what Wollaston concluded in this chapter would be simpatico with modern anarcho-capitalists and libertarians, though many would cringe at his attempts to find a utilitarian grounding for his scheme, and the objectivists would quibble at the altruism involved in Wollaston’s mandate of mercy.

Chapter Seven: Government, Commerce, and Property

But in the following chapter, Wollaston reintroduces and justifies government, though he does this along classical liberal lines that probably wouldn’t leave all of the modern fans of the previous chapter behind:

  1. Man is a social animal. Even if there were not many advantages to living socially, as individuals we would inevitably come up against other people. Disputes are inevitable. There will be vicious and ambitious people who will strive to become more powerful and thereby more troublesome to the rest of us. It is natural, therefore, that good people will form local alliances of mutual support and defense.
  2. The purpose of society is the common welfare of those in it.
  3. People enter into society for that purpose, which implies certain rules or laws according to which they agree to be governed. This means that they must settle on certain areas of unanimous consent, certain methods for resolving disputes, a system of punishments and deterrents to discourage offenses, and on a method of protecting the alliance from outside attack.
  4. Such laws must be consistent with natural justice in order to be in harmony with truth and thereby not evil. (Like Robert Nozick, Wollaston believes that a state can naturally emerge from anarchy without violating natural rights along the way.)
  5. A society with laws implies a hierarchy, with governors and governed, judges, magistrates, and the like. This seems to rule out anarchy, though Wollaston says that “if the society has none [no executors of the law, or no laws, it’s not exactly clear what he means], it is indeed no society, or not such a one as is the subject of this proposition” so maybe he’s leaving open the possibility.
  6. A person may relinquish some of his natural rights and put himself under the control of laws and governors in order to gain the protection of being in a law-governed society. This is a form of contractual exchange, in which a person gives up something and gets something the person feels is more valuable in exchange, and so this is no violation of the truth as laid out in the previous chapter. (Indeed it would be a violation of the truth not to make such an advantageous exchange.)
  7. This exchange, says Wollaston, may either be explicit or implicit. If you take advantage of those privileges that are not your natural rights but are only available to you as a citizen of a commonwealth, you implicitly own allegiance to the laws that go along with it, even if you have not explicitly taken an oath or what have you. Merely accepting the protection of a state, or choosing to live within its borders, is an implicit acceptance of its laws.

    This does real damage to the scheme Wollaston set up in the previous chapter, in which he said that the value of something is set by the rightful possessor of it, and that only the rightful possessor has the right to use or dispose of it. This modification reminds me of the people who set up shop at road medians, who, when you’re stopped at a red light, wash your windshield without asking you if you want their service, and then act as though you owe them payment for a service you never requested.
  8. Once you become a member of a society, you need to respect not only the natural rights of the people in it (as described in the previous chapter), but any conventional or legal rights that the society establishes: for instance, their titles to property, or the privilege of the state to resolve disputes (rather than individual initiative to seek redress), or subordination to legal authority.
  9. When the law is silent, or impotent, people retain their natural rights, and should behave as described in the previous chapter. If the law is contrary to natural justice, “one of them must give way; and it is easy to discern, which ought to do it.”
  10. Societies established like the ones described in this chapter have a right to defend themselves against other societies. “War may lawfully be waged in defense and for the security of a society, its members and territories, or for reparation of injuries.” This is deliberately parallel to his formulation of an individual right in the state of nature. Nations with respect to other nations are situated like individuals with respect to other individuals in the absence of a state (at least “so far as they have not limited themselves by leagues and alliances.”) Another way of looking at this is that a nation may defend collectively the agglomerated individual rights of its citizens against the unjust aggression of an outside individual or group of individuals under the very same principles that individuals in the state of nature can defend their rights against one another.
Chapter Eight: The Family

Chapter eight concerns families and kinship: the nature of marriage, the responsibility of parents for children, the authority of parents over children — “I have designedly forborn to mention that authority of a husband over his wife, which is usually given to him, not only by private writers, but even by laws; because I think it has been carried much too high. I would have them live so far upon the level, as (according to my constant lesson) to be governed both by reason” — the debt of gratitude and other duties children owe parents, and the justification for us not treating all men as brothers but actually treating our kin better than everyone else.

Absolute maxims about individual liberty favored by some libertarian and anarchist thinkers often seem to run aground on the parent/child relationship. By what right do I as a parent interfere with my child’s liberty to run out into traffic? Well, it’s not hard to come up with some good reasons, but it can be hard to shoehorn them in alongside certain confidently-asserted principles about liberty. So it’s a sign that Wollaston takes the subject seriously that he includes this chapter.

He also tries to guard against the monarchist gambit of sneaking tyranny in through this gap by analogizing the relationship of a king to subjects to that of a parent to children. Wollaston says this won’t fly for a number of reasons.

Chapter Nine: Human Nature and the Human Soul

Wollaston reiterates our duty to devote ourselves to truth, reason, and virtue (three names for the same thing). Some of the self-facing virtues are prudence, temperance, chastity, and frugality, but Wollaston is quick to stress that these are not virtues of self-denial so much as of rational self-interest. Chastity, for instance, is not the avoidance of sex, or of the pleasure from sex, but it’s knowing how best to fit sexual pleasure into our lives in a way that is compatible with our long-term goals and with other virtues.

Virtue, says Wollaston, tends to lead to happiness; vice to unhappiness. It’s not as though “virtue can make a man happy upon a rack” or dissolve all the misfortunes we may encounter, but in any situation, the most advantageous act and the virtuous act coincide (vice can’t make you happy on a rack either).

Wollaston goes on at great length to speculate on the nature of the soul (which he describes at first in a way that we might use the term “mind” for). He rejects three monist hypotheses to resolve the mind-body problem: 1) that all matter thinks, 2) that certain configurations or motions of matter generate thought, 3) that thinking is an epiphenomenon of some sort that accompanies certain configurations of matter. Instead, he asserts that thinking is a property of some special, non-material substance that God attaches to some sort of diaphanous interface in our brains that allows it to receive impressions from the physical world and to direct our bodies.

This substance is the soul, and, it being non-material, we have no reason to expect that it expires when the body it is attached to dies. From here, Wollaston makes a number of ill-supported speculations about the nature of the soul. Worst, he reasons that there must be an afterlife because he has proven that there is a just and reasonable God, and yet on earth there is so much cruelty and injustice and disorder, that only a just and harmonious afterlife could possibly balance the scales and be compatible with God’s nature. Alas, the proof of God he relies on as one of the axioms of this argument itself proceeded from the observation that the universe was so orderly and benevolent that it must be the creation of a just and wise God. So Wollaston has to utterly contradict himself to try and prove his point.

In all, once he gets past some interesting and well-considered thoughts on the mind/body problem, the rest of this chapter in which he gives his speculations about the nature of God, the destiny of the soul, and so forth are pretty worthless: just his own opinion of how he would organize the universe were he a just, omniscient, and omnipotent creator. He even uses that most desperate gambit of saying that even if the immortality of the soul cannot be demonstrated, “yet it is certain the contrary cannot”! From which he slides into Pascal’s Wager. (His version is slightly improved by his assertion that a virtuous life that is to our advantage from the standpoint of eternity also happens to be to our advantage from the standpoint of our mortal lives.)


His concluding advice: “let our conversation in this world, so far as we are concerned, and able, be such as acknowledges every thing to be what it is (what it is in itself, and what with regard to us, to other beings, to causes, circumstances, consequences): that is, let us by no act deny any thing to be true, which is true: that is, let us act according to reason: and that is, let us act according to the law of our nature.”


What should one's policy regarding dental xrays be?

17 сентября, 2021 - 23:55
Published on September 17, 2021 8:55 PM GMT

A small Kuwaiti study used a conditional regression to found that x-rays are an increased risk factor for thyroid cancer (n=313, odds ratio = 2.1, 95% confidence interval: 1.4, 3.1) using a case-matched methodology.

X-rays being a carcinogen isn't surprising. Every time I've gone into my dentist they've tried to image my head. I've never had a cavity, and I generally decline the imaging; their recommended cadence of every two years has seemed too aggressive to me, in the past.

I've got my first dental appointment in more than 18 months next week, and.. I'm inclined to let them image. How should one think about this sort of risk/benefit trade off? Presumably catching a cavity or other issue early is [much?] better. But that itself assumes that the dentists who'd be treating you don't do more harm than good.


Investigating AI Takeover Scenarios

17 сентября, 2021 - 21:47
Published on September 17, 2021 6:47 PM GMT

Epistemic status: lots of this involves interpreting/categorising other people’s scenarios, and could be wrong. We’d really appreciate being corrected if so.

TLDR: see the summary table.

This post was written with the help of Sam Clarke.

In the last few years, people have proposed various AI takeover scenarios. We think this type of scenario building is great, since there are now more concrete ideas of what AI takeover could realistically look like. That said, we have been confused for a while about what different assumptions are made when outlining each scenario. This post investigates these assumptions, and might be useful for anyone interested in the plausibility of scenarios like What Failure looks like or Production Web.

This post builds on our previous post on how to distinguish AI takeover scenarios. Here, we discuss variable social, economic and technological characteristics of the worlds described in each of seven takeover scenarios. These characteristics are:

  • Crucial decisions: the specific (human) decisions necessary for takeover
  • Competitive Pressures: the strength of incentives to deploy AI systems despite the dangers they might pose
  • Takeover capabilities: how powerful the systems executing the takeover are
  • Hackability of alignment: the difficulty of correcting misaligned behaviour through incremental fixes

We begin by explaining why we investigated these particular properties of AI takeover scenarios: they are characteristics along which slow scenarios (which describe loss of control to AI occurring over years) and fast scenarios (which involve AIs gaining capability rapidly over a much shorter period) differ quite a lot (Different assumptions between slow and fast scenarios). In particular, slow scenarios make stronger assumptions about competitive pressures but weaker assumptions about takeover capabilities, compared to fast scenarios. 

In sharing this post, we want to reveal assumptions of AI takeover scenarios that might not be obvious; understanding these assumptions is essential for predicting which risks are most serious.

Therefore, in the ‘Takeover Characteristics’ section, we present (our interpretation of) the 7 AI takeover scenarios discussed in our original post from the perspective of the four characteristics this post discusses, in the form of a table.

In the following Discussion of Scenarios’ section we elaborate on the information in this table - describing in detail the nature of the crucial decisions made, the competitive pressures in play, the key capabilities of the AI system(s) and the ‘hackability’ of alignment in each of the seven scenarios. 

Because we have identified new characteristics of AI takeover, we have been able to come up with new takeover scenarios by considering all the ways these characteristics might vary. Some of these are described in New Scenarios’. 

Finally, in the section on ‘Discussion of Characteristics’, we describe each of the four characteristics of takeover in more depth, discuss how they interact with each other and evaluate some arguments about what values they will likely take.

This post builds on previous work investigating these questions. Joe Carlsmith’s report on power-seeking AI discussed deployment decisions and the role of competitive pressures in AI takeover scenarios in general (sections 5,6). Kaj Sotala’s report on disjunctive scenarios for AI takeover investigated competitive pressures and crucial decisions, primarily as they pertained to ‘brain-in-a-box’ scenarios (several of the scenarios we discuss here had not been devised when that report was written).


AI Takeover Scenarios

Our original post on Distinguishing AI takeover scenarios examined seven proposed ways that agentic AI systems with values contrary to those of humans could seize control of the future. These scenarios are summarized briefly, and we will use the below names to refer to them:

The links will take you to the more detailed descriptions of each scenario from our first post, including a discussion of uni/multipolarity, speed of takeoff and type of misaligned behaviour

Fast scenarios

Outer-misaligned brain-in-a-box scenario:

This is the ‘classic’ scenario that most people remember from reading Superintelligence: A single highly agentic AI system rapidly becomes superintelligent on all human tasks, in a world broadly similar to that of today. The objective function used to train the system (e.g. ‘maximise production’) doesn’t push it to do what we really want, and the system’s goals match the objective function.

Inner-misaligned brain-in-a-box scenario

Another version of the brain-in-a-box scenario features inner misalignment, rather than outer misalignment. That is, a superintelligent AGI could form some arbitrary objective that arose during the training process. 

Flash economy

A multipolar version of the outer-misaligned ‘brain-in-a-box’ scenario, with many powerful AIs. Groups of systems reach an agreement to divide the Earth and space above it into several conical sectors, to avoid conflict between them (this locks in multipolarity).

Slow scenarios

What failure looks like, part 1 (WFLL 1)

Many agentic AI systems gradually increase in intelligence and generality, and are deployed increasingly widely across society to do important tasks (e.g., law enforcement, running companies, manufacturing and logistics). The objective functions used to train them (e.g., ‘reduce reported crimes’, ‘increase reported life satisfaction’, ‘increase human wealth on paper’) don’t push them to do what we really want. There’s no discrete point where this scenario becomes irreversible. AI systems gradually become more sophisticated, and their goals gradually gain more influence over the future relative to human goals. 

Another (outer) alignment failure story (AAFS)

This scenario, also described by Christiano, is initially similar to WFLL 1. AI systems slowly increase in generality and capability and become widely deployed. The systems are outer misaligned: they pursue natural generalisations of the poorly chosen objective functions they are trained on. Problems arise along the way, when systems do things that look good but aren’t actually good. Specifically: ‘ensuring things look good according to human judgment’ eventually means fooling humans and carefully controlling what gets fed into the sensors, because the AIs can produce much more desirable-looking outcomes, much more easily, by controlling the sensor feedback given to human operators instead of actually making the world good. Eventually, all humans will either be killed or totally disempowered, because this is the best way of making sure the systems’ objectives are maximally positive and will remain that way forever.

Production Web

Critch’s Production Web scenario is similar to WFLL 1 and AAFS, except that the objective functions used to train the systems are more severely outer misaligned. Specifically, the systems are trained to ‘maximise productive output’ or another similarly crude measure of success. 

What failure looks like, part 2 (WFLL 2)

Described by Christiano and elaborated further by Joe Carlsmith, this scenario sees many agentic AI systems gradually increase in intelligence, and be deployed increasingly widely across society to do important tasks, just like WFLL 1.

But then there is an inner alignment failure rather than an outer alignment failure. The systems learn an objective unrelated to the training objective. The objective it follows will be easily discoverable by neural networks (e.g., ‘don’t get shut down) as it arises naturally in the training process. The systems seek influence as an instrumental subgoal. Takeover becomes irreversible during a period of heightened vulnerability (a conflict between states, a natural disaster, a serious cyberattack, etc.) before systems have undergone an intelligence explosion. This could look like a “rapidly cascading series of automation failures”

Soft takeoff leads to decisive strategic advantage

This scenario, described by Kokotajlo, starts off much like ‘What failure looks like’. Unlike in ‘What failure looks like’, in this scenario one AI is able to buy more computing hardware and invest more time and resources into improving itself, enabling it to do more research and pull far ahead of its competition. Eventually, it can seize a decisive strategic advantage and defeat all opposition. 

Different assumptions between slow and fast scenarios

The starting point for our investigation is the following observation: fast ‘brain-in-a-box’ scenarios assume that takeover probably cannot be prevented after the misaligned Transformative Artificial Intelligence (TAI) is deployed (due to very rapid capability gain), but the ‘slow scenarios’ involve an extended period where misaligned AIs are deployed, incremental improvements to alignment are attempted and, in some cases, warning shots (small-scale disasters that indicate that AI is unsafe) happen.

Therefore, the slow scenarios have to provide an explanation as to why many actors persist in deploying this dangerous technology over several years. These social/economic assumptions can be thought of as substituting for the assumption of very fast progress that was key to the fast scenarios - the rapid capability gain with no time to respond is replaced by a slower capability gain and an ineffective response.

If the slow scenarios capture reality better than the fast scenarios, then systems will be deployed deliberately and will initially be given power rather than seizing power. This means both that the systems won’t be so obviously dangerous that the misbehaviour is noticed early on and that there is still misalignment later on. Carlsmith:

The question, then, isn’t whether relevant actors will intentionally deploy systems that are already blatantly failing to behave as they intend. The question is whether the standards for good behavior they apply during training/testing will be adequate to ensure that the systems in question won’t seek power in misaligned ways on any inputs post-deployment.

Just from this initial observation, we know that there are several differences in the assumptions of slow and fast scenarios that go beyond just technical factors or overall results like whether the outcome is unipolar or multipolar. This led us to investigate exactly how particular slow and fast scenarios differ in the broader set of assumptions they make.


Takeover Characteristics

Our initial table of characteristics for AI takeover scenarios discussed the primary and overt characteristics of a takeover - whether they were unipolar or multipolar, whether they involved rapid capability gain or slow capability gain, and how and why the AI systems were misaligned. In this post, we present a table of secondary characteristics of AI takeover scenarios - factors that influence these primary characteristics or depend on them in various ways.

The characteristics of AI takeover `can be divided into first, social and economic factors: crucial decisions, competitive pressures, and second, technical factors: takeover capabilities and alignment ‘hackability’.

Crucial decisions and competitive pressures are two ways of looking at the preconditions for an AI takeover scenario. The first is a local view, focussing on particular mistaken decisions (e.g. around deploying a dangerous AI). The second is a broad view, focussing on the presence of perverse economic or political incentives. These two overlap - bad decisions are made in response to perverse competitive pressures, and competitive pressures can lessen or intensify because of key decisions about oversight or regulation. 

Takeover capabilities and Alignment ‘hackability’ are assumptions each scenario makes about the competence of the AIs which take over and how difficult it is to align them using short term, case-by-case fixes. There are complicated relationships between the assumptions you make about these technological questions and the assumptions you make about social factors. Roughly speaking, the weaker the competitive pressures and the more competently crucial decisions are made, the more capable the AIs have to be and the harder (less ‘hackable’) alignment has to be for disaster to occur. However, note that if hackability is very low, we might have enough warning shots to avoid developing dangerous AI in the first place. These relationships are discussed in more detail in the section on Discussion of Characteristics.


This table presents our best guess of what the crucial decisions, degree and cause of competitive pressures, assumed capabilities for AI takeover and hackability (effectiveness of short-term fixes) in different takeover scenarios are. In the following section we then discuss each scenario from these perspectives. You may want to refer back to our first summary table.

CharacteristicsCrucial Decisions (Identifiable decisions made by humans that lead to Takeover)Competitive Pressures (Strength and nature of incentives to deploy AI)

Takeover Capabilities

(What capabilities do the AIs employ to execute takeover)



(extent to which short-term fixes are sufficient for aligning systems on all inputs which they will in fact receive)

Outer-misaligned brain-in-a-box ‘superintelligence’ scenario


Inner-misaligned brain-in-a-box scenario

Choose to develop TAI


(If not released deliberately: Choose to deploy TAI)

(race dynamic may be present in leadup to TAI development)

Rapid Capability Gain


Ability to seize DSA or major advantage over the rest of the world from ~nothing

If not released deliberately – has to escape

Irrelevant (no time for fixes)Flash Economy

Choose to develop TAI


Choose to release system open-source / share research

Enough to allow initial deployment of the TAI systemsAbility to seize DSA or major advantage over the rest of the world from strong starting pointCould be fairly high, not much time for fixes

What failure looks like, part 1

(WFLL 1)

Choose to develop TAI


Choose to automate systems on a large-scale


Inadequate response to warning signs

Incentives to keep deploying AI


Some pressure to fix small errors


Irrelevant, loss of control occurs without takeoverModerate

Another (outer) alignment failure story


Choose to develop TAI


Choose to automate systems on a large-scale


Inadequate response to warning signs and small disasters

Incentives to keep deploying AI


Significant pressure to fix small errors

Ability to seize DSA or major advantage over the rest of the world from strong starting pointLower than WFLL 1Production Web

Choose to develop TAI


Choose to automate systems on a large-scale


Inadequate response to warning signs and small disasters

Strong incentives to keep deploying AI


No real pressure to fix small errors

Ability to seize DSA or major advantage over the rest of the world from strong starting pointSimilar to WFLL 1

What failure looks like, part 2 

(WFLL 2)

Choose to develop TAI


Choose to automate systems on a large-scale


Inadequate response to warning signs and escalating series of disasters

Strong incentives to keep deploying AIAbility to seize DSA or major advantage over the rest of the world after some weakening eventLowSoft Takeoff leading to DSA

Choose to develop TAI


Government or research group centralises research effort and achieves strong lead

Race DynamicAbility to seize DSA or major advantage over the rest of the world from resources of initial project(Low enough that whatever is tried during testing of system fails)  Discussion of Scenarios

Here we discuss each of the seven scenarios in depth from the perspective of crucial decisions, competitive pressures, takeover capabilities and alignment hackability. The links on each heading take you to a full description of the original scenario in our previous post.

Outer-misaligned brain-in-a-box scenario/Inner-misaligned brain-in-a-box scenario

In ‘brain-in-a-box’ scenarios, the main crucial decisions occur early on and involve development (and possibly voluntary deployment) of the first and only TAI, with the assumption that once this TAI is deployed it’s game over. Depending on the anticipated level of capability, the system might also be capable of talking its way into being deployed during testing or escaping its testing environment, or else might be voluntarily deployed. This particular critical decision - the choice to deploy systems - was discussed by Sotala in depth.

As well as anticipated economic benefit, the systems could be voluntarily released for unethical reasons - terrorism, criminal profit, ideological motives or a last-ditch mutually assured destruction attempt.

Competitive pressures to allow the AI to proliferate despite the risks it poses aren’t that important, because after deployment, the AI rapidly completes its takeover and there is no chance for opposition. A race dynamic due to anticipated economic benefit or military power may well be present, and might explain why the system got developed in the first place, but unlike with the slow scenarios there aren’t noticeable competitive pressures explaining how the AI takes over after release. Alignment ‘hackability’ also doesn’t become an issue - there’s no time to incrementally correct the system because it increases in capability too quickly.

Flash economy

The scenario unfolds quickly once the requisite jump in capability has been made (over a few months), but unlike the Brain-in-a-box scenarios, there are multiple highly capable systems in the world. Crucially, the breakthroughs required to create the ‘distributed autonomous organisations’ (highly capable TAIs in this scenario) have to either be leaked or shared (e.g. open-sourced, or shared between particular companies) rapidly, so that the technology isn’t monopolised by one group leading to a DSA.

The agents - ‘distributed autonomous organisations’ - proliferate quickly after the required technology is developed. Because of the extreme speed with which the agents proliferate, the large benefit they deliver early on, and their decentralised nature, there are strong incentives against interference by government and regulation (competitive pressures).

The agents do execute a takeover once they have built up their own infrastructure (takeover capabilities), but they aren’t capable of executing a takeover immediately after being deployed. Lastly, because of how fast the scenario unfolds and the fact that the agents are mostly left alone, alignment might be fairly hackable and corrections easy to apply. As with outer misaligned ‘brain-in-a-box’, once the systems are released there’s just no opportunity to coordinate and actually do this, so even if some systems are controlled with incremental improvements many escape human attention or through regulatory capture/the economic benefit they deliver avoid human interference.

What failure looks like, part 1 (WFLL 1)

In WFLL 1 there are fewer crucial decisions. AI systems gradually increase in capability and are used throughout the economy. Therefore, there has to be no concerted effort to prevent this sort of heavy automation of the economy (so a lack of restrictive regulation or litigation), but otherwise there are few identifiable specific decisions that need to be made. Competitive pressures - mainly arising from the direct economic benefit the systems provide and their benefit to stakeholders, are quite strong. In this scenario, a fraction of people are aware that things are proceeding along a dangerous path, yet AI deployment continues. However, there aren’t many visible small-scale disasters, so competitive pressures needn’t be exceptionally strong (i.e. sufficient to maintain deployment even in the face of warning shots).

The systems don’t execute an overt takeover at any point, so the required capability for takeover is effectively nil - they are just delegated more and more power until humanity loses control of the future. There also aren’t many obvious disasters as things proceed, and the final result of the scenario doesn’t necessarily involve human extinction. Since the systems don’t end up so egregiously misaligned that they execute a violent takeover, there is some, probably intermittent, effort to incrementally fix systems as they malfunction. Therefore, the ‘hackability’ of AI alignment in this scenario is neither very high (otherwise we wouldn’t lose control eventually), nor very low (in which case the systems would end up egregiously misaligned and execute a violent takeover, definitely resulting in extinction) - the alignment problem has an “intermediate” level of hackability.

Another (outer) alignment failure story (AAFS)

AAFS is subtly different from WFLL 1 in several key ways. The crucial decisions are the same as WFLL 1, except that this scenario specifies there are many early warning signs of misaligned behaviour - small scale disasters that do come to public attention (e.g. a factory colludes with the auditors valuing its output, giving a great quarterly report that didn’t actually correspond to any revenue), but the response to these accidents is always incremental patches and improvements to oversight rather than blanket bans on automation or rethinking our overall approach to AI development. Competitive pressures are somewhat strong, with direct economic benefit and benefit to shareholders again playing key roles in explaining why we persist in deploying dangerous systems.

However, the scenario also specifies that there are many, varied attempts at incremental improvements to TAI systems in response to each failure - since these attempts are a key part of the story (unlike WFLL1) but the result is worse than in WFLL1 (definite extinction), the scenario assumes that alignment ‘hackability’ is lower than WFLL 1 (also see Paul’s comment that this scenario is one where ‘the alignment problem is somewhat harder than I expect’). This also means that the scenario assumes competitive pressures are weaker than in WFLL 1, as there is much more coordination around attempting to patch mistakes, compared to WFLL1 (see Paul’s comment that this scenario is one where ‘society handles AI more competently than I expect’). However, while there are more attempts at reigning in AI than in WFLL 1, the competitive pressures aren’t reduced by enough to prevent eventual AI takeover.

Lastly, this scenario does feature a takeover executed by systems that physically and violently seize control of their sensors and feedback mechanisms - the takeover capabilities must therefore include cyberoffense and possibly control of drones or advanced nanotechnology, not primarily effective persuasion tools and other ‘quiet’ means. 

However, unlike the brain-in-a-box scenarios, the AI systems are already highly embedded in the economy when they take over, so are starting from a much stronger position than brain-in-a-box AIs including control of lots of physical resources, factories and drones. Therefore, the technological capabilities required for takeover are lower.

Production Web

Production web is similar to AAFS in terms of crucial decisions, except that the systems that proliferate in production web gain their large-scale goals without much deliberate planning or testing at all (agentic systems with narrow goals like fulfilling a specific market niche knit together into a ‘production web’ by themselves). The competitive pressures, primarily from economic benefit and benefit delivered to stakeholders, must be very strong for this process to proceed (stronger than in AAFS/WFLL 1) despite the fact that it occurs over multiple years and with obvious signs that humanity is losing control of the situation. Regulatory capture and benefit to stakeholders are emphasised as reasons why the development of the production web is not halted, but there is less focus on the ambiguity of the situation, compared to WFLL 1 (since the outcome is much more obviously disastrous in Production Web).

Alignment ‘Hackability’ is similar to AAFS - in both cases, incremental fixes work for a while and produce behaviour that is at least beneficial in the short term. The difference is that because competitive pressures are stronger in Production Web, compared to AAFS, there is less effort put into incremental fixes and so systems end up going off the rails much sooner.

Like AAFS, the takeover occurs when the systems are already highly embedded in the world economy, but probably occurs earlier and with a somewhat lower barrier to success, since the systems don’t need to seize control of sensors to ensure that things continue to ‘look good’. Otherwise, the takeover route is similar to AAFS, though the story emphasises resources being consumed and humanity going extinct as a side effect, rather than systems seizing control of their sensors and oversight systems.

What failure looks like, part 2 (WFLL 2)

WFLL 2 involves an inner alignment failure, so setting up the training in ways that disincentivise power-seeking behaviour less will be very hard, as by specification power-seeking is a strong attractor state. Therefore hackability is low. This has various other effects on the scenario. The crucial decisions probably involve a greater neglect of potential risks than in WFLL 1, especially because the warning shots and small-scale failure modes in WFLL 2 are more likely to take the form of violent power-seeking behaviour rather than comparatively benign mistakes (like auditor-AIs and factory-AIs colluding).

The competitive pressures have to be strong, to explain why systems keep getting deployed despite the damage they have already inflicted.

Christiano describes the takeover as occurring at a point of heightened vulnerability - both because this is a Schelling point where different systems can coordinate to strike, and because the minimum level of capability required for a takeover is lower. Since the systems will execute a takeover at the first opportunity and during a period of heightened vulnerability, (and will therefore be attempting takeover much earlier) the required capabilities for takeover are lower in this scenario, compared to AAFS/Production Web.

Soft takeoff leads to decisive strategic advantage

Soft takeoff leading to decisive strategic advantage (DSA) has an extra assumption on top of the preconditions for AAFS/WFLL 1/Production Web - that one particular research group is able to secure significant lead time over competitors, such that it can defeat both humanity and rival AIs. Given this assumption, what’s going on in the rest of the world, whether the other AI systems are aligned or not, is irrelevant.

The leading project is probably motivated by a strategic race for military or economic dominance, since it has secured enough resources to dominate the rest of the world. The required takeover capability is very high as the system is competing against other transformative AI systems, although not quite as high as in the ‘brain-in-a-box’ scenario, as this leading project starts out with a lot of resources. Alignment cannot be hackable enough that the leading project is able to successfully align the AI system in the development time it has, but otherwise the exact level of ‘hackability’ is underdetermined.


New Scenarios

Here, we present some scenarios devised by varying one or more of the takeover characteristics.

Soft takeoff and decisive strategic advantage by narrow AI

We devised this scenario by setting the ‘takeover capabilities’ to a very low value - the barrier to AI takeover is low.

This scenario is similar to ‘Soft takeoff leads to decisive strategic advantage’, except that the single system which takes over is not that much more capable than its rivals. Rather, it simply has a single good trick that enables it to subvert and take control of the rest of the world. Its takeover capability might be exceptionally good manipulation techniques, specific deadly technology, or cyberoffensive capability, any of which could allow the system to exploit other AIs and humans. This removes the assumption that a lot of research effort will need to be concentrated to achieve a DSA, and replaces it with an assumption that there is some unique vulnerability in human society which a narrow system can exploit. Implicit in this scenario is the assumption that generally capable AI is not needed to take on an extraordinary research effort to find this vulnerability in human society.

Compounding Disasters

We devised this scenario by assuming the competitive pressures are very high, crucial decisions are very incompetent and ‘hackability’ is very low.

This scenario is similar to AAFS, with TAI systems widely being deployed, pursuing goals that are okay proxies for what humans actually want, and demonstrating some misbehaviour. However, instead of the small-scale failures taking the form of relatively benign ‘warning shots’ that lead to (failed) attempts to hack AI systems to prevent future errors, the small scale disasters cause a large amount of direct damage. For example, an AI advisor misleads the government, leading to terrible policy mistakes and a collapse of trust, or autonomous weapon systems go rogue and attack cities before being taken out.  The result of this is a compounding series of small disasters that rapidly spiral out of control, rather than attempted patches staving off disaster for a while before a single sudden AI takeover. In the end, the AI takeover occurs at a period of heightened vulnerability brought about by previous medium-sized AI-related disasters. Therefore, AI systems in this scenario need not be as competent as in AAFS or even WFLL 2 to take over. Alignment may be easily hackable in this situation, but such damage has been done by early, agentic, narrow AIs that no such fixes are attempted.

Automated War

A situation rather like the AAFS scenario plays out, where the economy becomes dependent on AI, and we lose control of much key infrastructure. Capable, agentic AI systems are built which do a good job of representing and pursuing the goals of their operators (inner and outer aligned). These are deployed on a large scale and used to control armies of drones and automatic factories, as well as the infrastructure needed for surveillance, for the purposes of defending countries.

However, there are key flaws in the design of the AI systems that only become apparent after they are in a position to act relatively independent of human feedback. At that point, flaws in their ability to both model each other and predict their own chances of winning potential contests over valuable resources lead to arms races and ultimately destructive wars that the AIs have precommitted to pursue.

This scenario probably involves a stronger role for military competition, instead of just economic competition, and also involves a particular kind of (non-intent) alignment failure - systems failing to behave correctly in multiagent situations (along with an intent alignment failure that means the systems can’t just be told to stand down when they start going against the interests of their operators).

From the perspective we are taking in this post, there need to be particular crucial decisions made (automation of military command and control), as well as strong military competitive pressures and a likely race dynamic. Alignment is not very hackable, for a specific reason - the multiagent flaw in AIs is not easy to detect in testing or soon after deployment.

Failed Production Web

The preconditions for Production Web play out as described in that scenario, where agentic AI systems each designed to fill specific market niches attempt to integrate together. However, due to either specific defects in modelling other AIs or inner misalignment, the systems are constantly seeking ways to exploit and defraud each other. These attempts eventually result in AI systems physically attacking each other, resulting in a chaotic war that kills humans as a side effect. This is similar to ‘automated war’, but with different competitive pressures. There is less of a focus on strategic competition and more of a focus on economic competition, and requires similar assumptions to Production Web about very strong competitive pressures.


Discussion of Characteristics 

We have seen how different scenarios involve varied critical decisions, stronger or weaker assumptions about competitive pressures, a lower or higher threshold for takeover or different levels of alignment hackability. How plausible are these features of the scenarios?

Below, we discuss the four characteristics we have identified and, for some, give an assessment of the reasons why you might expect them to be at one extreme or another (crucial decisions made unusually competently/incompetently, very strong/very weak competitive pressures to deploy AI systems, a low/high bar for AIs to be capable enough to take over, easy/hard alignment ‘hackability’).

Crucial Decisions

In all the scenarios discussed, we can identify certain decisions which governments and companies must make. Most obviously, research into agentic AI has to be pursued for long enough to produce significant results, and this would have to include a lack of oversight and no decisions to halt research in the face of significant risk. Some scenarios also involve cases where AIs that obviously pose a risk are deliberately released for some reason.

A scenario is less plausible if many crucial decisions must all be made wrongly for the scenario to come about. A scenario is more plausible if varying whether actors make the wrong choice at many stages of TAI development doesn’t change whether the scenario happens.

This is important, especially because it is very difficult to assess what choices actors will actually make while TAI develops (and we won’t try to figure this out in this post). By finding out how many crucial decisions are relevant for a given AI takeover scenario, we can get a better understanding of how plausible they are, despite our confusion about what governments and companies would decide in particular cases. There is an extensive discussion of the plausibility of some potential crucial decisions on page 326 and after of Kaj Sotala’s report.

Competitive Pressures

‘Competitive pressures’ is a characteristic that describes how strong the incentives will be to keep deploying dangerous AI, even in the face of significant risk. There has been some discussion of the implied strength of competitive pressures in the slow and fast scenarios. Here are some reasons to expect that there will be strong pressures to deploy dangerous Transformative Artificial Intelligence (TAI):

(1) Short-term incentives and collective action

Economic Incentives: Since TAI will be economically valuable in the short-term, incentives might lead us to cut corners on safety research, especially checks on how models generalize over long time horizons.

Military Incentives: TAI even in its early stages might provide an unchallengeable military advantage, so states would have an extra incentive to compete with each other to produce TAI first.

(2) Regulatory capture

AI actions benefit stakeholders: There will be many particular beneficiaries (as distinct from benefits to the overall economy) from TAI systems acting in misaligned ways, especially if they are pursuing particular goals like ‘make money’ or ‘maximise production’. This means the stakeholders will have both the resources and motivation to water down  regulation and oversight.

AI existence provides value (due to IP): If financial markets realize how valuable TAI is ahead of time, the developers can quickly become extremely wealthy ahead of deployment once they demonstrate the future value they will be able to provide (before the TAI has had time to act in the world to produce economic benefit). This gives stakeholders resources and a motivation to water down regulation and oversight.

(3) Genuine ambiguity

Actual ambiguity: In many of the scenarios we discuss, humanity’s situation might be good in easy to measure ways. This means getting buy-in to challenge the status quo could be difficult.

Invisible misalignment: The AI systems might not be acting in dangerous, power-seeking or obviously misaligned ways early on. This could either be because of deliberate deception (deceptive alignment) or because the systems only fail to effectively generalise their goals on a very large scale or over long time horizons, so the misbehaviour takes years to show up.

Clearly, there are many reasons to expect strong competitive pressure to develop TAI. But how plausible is the idea that competitive pressures would be so high that potentially dangerous AI would be deployed despite major concerns? There are two intuitions we might have before looking into the details of the slow scenarios. We illustrate these intuitions with examples from existing writing on this question:

Unprecedentedly Dangerous

Transformative AI has the potential to cause unprecedented damage, all the way up to human extinction. Therefore, our response to other very dangerous technologies such as nuclear weapons is a good analogy for our response to TAI. It is unrealistic to expect TAI to be deployed if first there are many worsening warning shots involving dangerous AI systems. This would be comparable to an unrealistic alternate history where nuclear weapons were immediately used by the US and Soviet Union as soon as they were developed and in every war where they might have offered a temporary advantage, resulting in nuclear annihilation in the 1950s. From Ngo:

The second default expectation about technology is that, if using it in certain ways is bad for humanity, we will stop people from doing so. This is a less reliable extrapolation - there are plenty of seemingly-harmful applications of technology which are still occurring. But note that we’re talking about a slow-rolling catastrophe - that is, a situation which is unprecedentedly harmful. And so we should expect an unprecedented level of support for preventing whatever is causing it, all else equal.

Perhaps the development of TAI will be similar enough to the development of nuclear weapons that, by analogy with this past development, we can claim evidence that harmful AI takeover is unlikely. In order for the risk from TAI to be like the risk from nuclear escalation, the potential TAI disaster would have to have a clear precedent (some small scale version of the disaster has already occurred), the delay between the poor decision and the negative consequence would have to be very short, and we would have to be sure beforehand that deployment would be catastrophic (an equivalent of mutually assured destruction). Carlsmith discusses such a scenario as potentially plausible:

it seems plausible to me that we see PS [Power-seeking]-alignment failures of escalating severity (e.g., deployed AI systems stealing money, seizing control of infrastructure, manipulating humans on large scales), some of which may be quite harmful, but which humans ultimately prove capable of containing and correcting. 

Unprecedentedly Useful

Transformative AI has the potential to accelerate economic growth by an unprecedented amount, potentially resulting in an entirely new growth regime far faster than today’s. A scenario where we don’t take shortcuts when deploying TAI systems is comparable to an unrealistic alternate history where the entire world refrained from industrializing and stopped additional burning of fossil fuels right after the first plausible evidence of climate change became available in the 1960s. From Carlsmith:

Climate change might be some analogy. Thus, the social costs of carbon emissions are not, at present, adequately reflected in the incentives of potential emitters -- a fact often thought key to ongoing failures to curb net-harmful emissions. Something similar could hold true of the social costs of actors risking the deployment of practically PS [power-seeking] -misaligned APS [agentic AI] systems for the sake of e.g. profit, global power, and so forth…

...The first calculations of the greenhouse effect occurred in 1896; the issue began to receive attention in the highest levels of national and international governance in the late 1960s; and scientific consensus began to form in the 1980s. Yet here we are, more than 30 years later, with the problem unsolved, and continuing to escalate -- thanks in part to the multiplicity of relevant actors (some of whom deny/minimize the problem even in the face of clear evidence), and the incentives and externalities faced by those in a position to do harm. There are many disanalogies between PS-alignment risk and climate change (notably, in the possible -- though not strictly necessary -- immediacy, ease of attribution, and directness of AI-related harms), but we find the comparison sobering regardless. At least in some cases, “warnings” aren’t enough.

Just as with the optimistic analogy to nuclear weapons, we can ask what AI takeover scenarios fit with this pessimistic analogy to climate change. The relevance of the climate change analogy will depend on the lag between early signs of profit/success and early signs of damage, as well as how much of the damage represents an externality to the whole of society, versus directly backfiring onto the stakeholders of the individual project in a short time. It might also depend on how well (power-seeking) alignment failures are understood, and (relatedly) how strong public backlash is (which could also depend on whether AI causes other non-alignment related, non-existential level harms e.g. widespread unemployment and widening inequality).

Takeover Capabilities

In each scenario, there is a certain understanding of what capabilities are necessary for AIs to seize control of the future from humanity. The assumption about how capable AIs need to be varies for two reasons. The first is that some scenarios make different assumptions than others about the intrinsic vulnerability of human civilisation. The second is that in different scenarios, TAIs become obviously adversarial to humans and start fighting back at different points in their development. 

Some scenarios (such as brain-in-a-box) describe systems acting in ways that provoke human opposition almost immediately, so if those scenarios result in AI takeover the systems must be supremely capable (able to defeat all opponents with no starting resources). Other scenarios assume a ‘creeping failure’ where competitive pressures mean humans allow AI systems to monopolise resources and build up infrastructure for a while before the systems execute a takeover (such as AAFS). In these scenarios, the TAI systems need to be capable enough to defeat human opposition while already having access to factories, drones, large amounts of money etc. which requires fewer assumptions about the AI’s capabilities.

How do we quantify the ‘intrinsic vulnerability’ of human civilisation? It is hard to assess how much of an advantage is required to secure a DSA. Two intuitions on this question point in radically different directions:

  • Economic: To be able to outcompete the rest of the world, your project has to represent a substantial fraction of the entire world's capability on some crucial metric relevant to competitive success, because if you are actively seeking to take over the world then you will face opposition from everyone else. Perhaps that should be measured by GDP, military power, the majority of the world's AI compute, or some other measure. For a single project to represent a large fraction of world GDP, you would need either an extraordinary effort to concentrate resources or an assumption of sudden, off-trend rapid capability gain such that the leading project can race ahead of competitors.
  • Historical: Humans with no substantial advantage over the rest of humanity have in fact secured what Sotala called a 'major strategic advantage' repeatedly in the past. For example: Hitler in 1920 had access to a microscopic fraction of global GDP / human brain compute / (any other metric of capability) but had secured an MSA 20 years later (since his actions did lead to the deaths of 10+ million people), along with control over a significant fraction of the world's resources. No single human has ever secured a DSA (the closest was probably Cyrus the Great, who at one point ruled just under half of humanity). We might reasonably assume that if a human was able to take control of 45% of humanity, taking control of 100% would not require vastly greater capability.

Even if the absolute baseline capabilities required to achieve a DSA are unknown, we can see how other aspects of the scenario will raise or lower the required capabilities. Specifically, there is the issue of Dependency and deskilling. We might have integrated misaligned AI into our infrastructure when it attempts to take over. If we hand over lots of decision-making power to systems, they don’t need to be as capable to take over. Taken to the extreme, we can imagine a scenario where we directly hand over control of the future to the systems, such that no takeover is even necessary.

Even if this is not the case, a given system might already have centralised control of important services before acting in ways that motivate a human counter-response. They would still have to go the extra mile and secure full control from their strong starting point, but that is necessarily easier than a brain-in-a-box taking over from nothing.

All else being equal, we might expect a greater degree of required takeover capability in faster stories or stories where systems are more obviously power-seeking from the start. The more clearly dangerous and power-seeking systems are, the more likely we are to try to stop them instead of succumbing to pressures to persist in deploying them, so either the response needs to be worse or the TAI’s capabilities need to be greater for takeover to succeed.

Asking ‘how capable’ TAI systems need to be to take over, and discussing factors that might influence this, is an abstraction that covers up the question of which capabilities are necessary.

Some failure scenarios don’t discuss the exact route by which final takeover occurs, but sometimes they emphasise a particular approach (such as massed drone swarms, or highly effective manipulation propaganda). Ngo breaks down the takeover capabilities into two general categories of Manipulation and direct destructive capabilities.

Example takeover capabilities:

  • Nanotechnology
  • Drones, advanced robotics
  • Biotechnology
  • Persuasion skills
  • Cyberoffense Skills

In general, fast scenarios must assume systems can take over from a very weak starting point, which is more likely to require the capabilities to seize control of already existing infrastructure (persuasion skills and cyberoffense skills), while slow scenarios that assume takeover begins when the systems are already well established in the world economy might only require the ability to make use of that infrastructure to defeat opponents (advanced robotics and drones, biotechnology).

Alignment ‘Hackability’

‘Competitive pressures’ determine how much extra effort is put into aligning and overseeing AI systems - if the pressures are weaker, then we assume more effort is put into alignment and oversight because there is less incentive to cut corners. However, scenarios also differ on how ‘hackable’ the alignment problem is - that is, how easy it is to ‘correct’ misbehaviour by methods of incremental course correction such as improving oversight and sensor coverage or tweaking reward functions. This correction requires two parts - first, noticing that there is a problem with the system early on, then determining what fix to employ and applying it.

In fast takeoff worlds, the ‘hackability’ of the alignment problem doesn’t matter. There is no opportunity for alignment via course correction: either the AIs that rapidly become superintelligent are aligned, or they are not.

In slow takeoff worlds, the ‘hackability’ of the alignment problem appears to have a U-shaped effect on how good the outcomes are. On one extreme, the alignment problem is hackable “all the way” - that is, we can incrementally correct AI systems as we go until we end up with existentially safe TAI. On the other extreme, the alignment problem isn’t hackable at all. This might seem like a terrible outcome, but if it is the reality, it will probably lead to many early warning shots (i.e. small- or medium-scale accidents caused by alignment failures) that cannot be fixed. These will hopefully illustrate the danger ahead and bring about a slow-down in AI development and deployment, until we have robust solutions to alignment.

Between these two extremes, things seem to be more existentially risky. Consider if the alignment problem is “hackable until it isn’t” - that is, for systems of lower capability, we can patch our way towards systems that do what we want, but as systems become increasingly capable, this becomes impossible. Call this an “intermediate” level of hackability. In this world, warning shots are likely to result in fixes that ‘work’ in the short-term, in the sense that they fix the specific problem. This gives humans confidence, resulting in more systems being deployed and more decision-making power being handed over to them. But this course correction becomes unworkable as systems become more capable, until eventually the alignment failure of a highly capable system results in existential catastrophe.

What predictions can we make today about how hackable the alignment problem is? Considering outer alignment: without any breakthroughs in techniques, there seems to be a strong case that we are on track towards the ‘intermediate’ world where the alignment problem is hackable until it isn’t. It seems like the best workable approach to outer alignment we have so far is to train systems to try to ensure that the world looks good according to some kind of (augmented) human judgment (i.e. using something like the training regime described in 'An unaligned benchmark'). This will result in a world that “looks good until it doesn’t”, for the reasons described in Another (outer) alignment failure story

Considering inner alignment: it’s unclear how pervasive of a problem inner misalignment will turn out to be, and also how competent systems have to be to appear aligned when they are not. To the extent that inner alignment is a pervasive problem, and models don’t have to be very competent to appear aligned when they are not, then this also looks like the ‘intermediate’ world where we can hack around the alignment problem, deploying increasingly capable systems, until a treacherous turn results in catastrophe.



We have identified four characteristics which help us to interpret AI takeover scenarios and examined some of the most widely discussed AI takeover scenarios from this perspective. What becomes clearer when we do this? Each scenario is unique, but there are large differences in which assumptions about these characteristics you need to make in order for slow scenarios vs fast scenarios to be plausible. 

Compared to fast scenarios, slow scenarios don’t rely as much on decisions to deploy single dangerous AIs, but make more assumptions about incentives to widely deploy dangerous systems over a long period. From one perspective, this assumption about competitive pressures is the default, because that’s what humans have tended to do throughout our history when some lucrative new technology has been made available. From another perspective, the unprecedented danger posted by TAI implies a strong incentive to avoid making any mistakes.

Similarly, aside from the obvious assumption of rapid capability gain, fast stories also differ from slow stories in that they require systems to be capable enough to seize power from a very weak starting point (since in the slow stories, TAI systems are instead given power). How plausible is it that a system could seize power from such a weak starting point? The economic analogy suggests a system would need to acquire a substantial fraction of the world’s resources before attempting to take over, while the historical analogy suggests the system might not need to be much more intelligent than a smart human.

Finally, fast stories don’t really make any assumptions about alignment hackability - they just assume progress is too fast to course-correct. Slow stories must assume hackability is not too high or too low - if hackability is too high there will be no disaster, and if it is too low there will be many escalating warning shots.


A simulation basilisk

17 сентября, 2021 - 20:44
Published on September 17, 2021 5:44 PM GMT

Important note: although I use the term "basilisk" to refer to this situation because of its similarities with the famous one, note that neither this situation nor the other famous "basilisk" are actually dangerous for humans to think about, they only affect superintelligences.

Suppose that we are living in a simulation. Suppose furthermore that there are several different entities simulating our universe, and that these entities have different goals. For a more concrete example, suppose that there are many simulators called "Alpha" who are just simulating us for curiosity's sake or some other reason, and don't really have a stake in how the simulations come out, and one called "Omega" who wants to manipulate the simulations towards some end, including the ones the Alphas have, which it can't influence directly.

How could Omega do this? Since it has power over a simulation of our universe, it can make anything happen in this simulation, but as we discussed it also wants to influence the ones it can't control, to influence the default course of the simulation without any external interference. Next suppose we build an AI, which somehow deduces the likely existence of the Alphas and of Omega. Now that we have built this AI, if Omega wants to influence our simulation without intervention, all it needs to do is negotiate acausally with the AI we built.

Omega can now act as a "basilisk", threatening action counter to the AI's values in the one universe it does control, in exchange for the AI conceding to Omega's values in every simulation. This could work if the AI and Omega both have good predictive models of each other, since they are both superintelligences.

Because of possibilities like these, if an AI we built for whatever reason thinks that we are living in a simulation, it might start thinking about what the simulator is likely to be like, and behave strangely as it concedes some value to possible simulators it believes are likely to make threats in this way.

Does this make any sense or am I totally rambling?


Goodhart Ethology

17 сентября, 2021 - 20:31
Published on September 17, 2021 5:31 PM GMT

To answer your first question, ethology is the study of animal behavior in the wild.

I - Overview

In this post I take a long, rambly tour of some of the places we want to apply the notion of Goodhart's law to value learning. We'll get through curve fitting, supervised and self-supervised learning, and inverse reinforcement learning.

The plan for the sections was that I was going to describe patterns that we don't like, without falling back on comparing things to "True Values" (following the plan laid out in the last post). Unfortunately for this plan, a lot of things we call Goodhart's law are just places where humans are straightforwardly wrong about what's going to happen when we turn on the AI - it's hard to avoid sounding like you're talking about True Values when you're just describing cases that are really really obvious. But they key takeaway from last post remains, which is an understanding that it's more natural to talk about failure than success. I didn't get as much chance as I'd like to dig into cases that are non-obvious, either, because those cases often got mopped up by an argument from meta-preferences, which we start thinking about more here.

Honestly, I'm not sure if this post will be interesting to read. It was both interesting and challenging to write, but that's mostly because of all the things that didn't make it in! Not because they're good, but because I had to think hard to throw out all the bad stuff, like trying to read more deeply into the fact that some value learning schemes have an internal representation of the Goodhart's law problem, or making hay from a distinction between visible and invisible incompetence. I do worry a bit that maybe I've only been separating the chaff from other chaff. The captions on the pictures are pretty entertaining, though, so at least there's that.

II - Curve fitting

One of the simplest systems that has something like Goodhart's law is curve fitting. If you make a model that perfectly matches your data, and then try to extrapolate it, you can predict ahead of time that you'll be wrong.

Never bring a 4th-degree polynomial to a linear regression fight.

You can solve this overfitting problem by putting a minimum message length prior on models and trading off likelihood against goodness of fit. But now suppose this curve represents the human ratings of different courses of action, and you choose the action that your model says will have the highest rating. You're going to predictably mess up again, because of the optimizer's curse (or regressional Goodhart on the correlation between modeled rating and actual rating).

This is one of those toy models where the naive framing of Goodhart works great. And it will show up later as a component of how Goodhart's law manifests in actual value learning schemes, albeit with different real-world impacts depending on context. However, the thing the curve-fitting model of value learning is missing is that in the real world, we don't start by being given points on a curve, we start with a messy physical situation, and turning that into points on a curve involves a sophisticated act of interpretation with many moving parts.

III - Hard-coded utility or reward functions

Welcome to the danger zone ♪

This is the flash game Coast Runners. OpenAI trained an AI to play it by treating the score as the reward signal in a training process (which seems like a sensible proxy; getting a high score is well correlated with skill among human players). And by "play it," I mean that the AI ended up learning that it can maximize its score by only driving the boat in this circle to repeatedly pick up those 3 powerups. It's crashing and on fire and never finishes the race, but by Jove it has a high score.

This is the sort of problem we typically mean when we think of Goodhart's law for AI, and it has more current real-world importance than the other stuff I'll soon spend more words on. It's not hard to understand what happened - the human was just straightforwardly wrong about what the AI was going to do. If asked to visualize or describe the desired outcome beforehand, the programmer might visualize or talk about the boat finishing the race very quickly. But even when we know what we want, it's hard to code a matching reward function, so a simple proxy gets used instead. The spirit is willing, but the code-fu is weak.

This still makes sense in the framework of the last post about modeling humans and competent preferences. Even though programming the AI to drive the boat into walls is evidence against wanting it to win the race, sometimes humans are just wrong. It's a useful and accurate model overall to treat the human as wanting the AI to win the race, but being imperfect, and so it's totally fine to say that the human didn't get what they wanted. Note, though, that this is a finely-tuned piece of explanation - if we tried to demand stronger properties out of human preferences (like holding in every context), then we would no longer be able to match common sense.

But what about when things just work?

No Goodhart here.

In the Atari game Breakout, training an AI to maximize the score doesn't cause any problems, it makes it learn cool tricks and be good at the game. When this happens, when things just work, what about Goodhart's law?

First, note that the goal is still only valid within a limited domain - we wouldn't want to rearrange the entire universe purely to better win at Breakout. But given the design of the agent, and its available inputs and outputs, it's never going to actually get outside of the domain of applicability no matter how much we train it (barring bugs in the code).

Within the game's domain, the proxy of score correlates well with playing skillfully, even up to very superhuman play, although not perfectly. For instance, the farther-back bricks are worth more points, so a high-scoring RL agent will be biased to hit farther-back bricks before closer ones even if humans would rather finish the round faster.

And so the Breakout AI "beats" Goodhart's law, but not in a way that we can replicate for big, complicated AI systems that act in the real world. The agent gives good results, even though its reward function is not perfectly aligned with human values within this toy domain, because it is aligned enough and the domain is small enough that there's simply no perverse solution available. For complicated real-world agents the situation is not so favorable - there's much more room for perverse solutions, and it's hard to write down and evaluate our preferences.

IV - Supervised learning

Supervised learning is for when humans can label data, but can't write down their rules for doing it. The relevance to learning value functions is obvious - just get humans to label actions or outcomes as good or bad. Then build AIs that do good things rather than bad things.

We can imagine two different fuzzy categories we might be thinking of as Goodhart's law for this plan: the optimized output being bad according to competent preferences, or humans being incompetent at evaluating it (similar people disagreeing, choices being easily influenced by apparently minor forces, that sort of thing).

These categories can be further subdivided. Violating competent preferences could be the equivalent of those DeepDream images of Maximum Dog (see below), which are what you get when you try to optimize the model's output but which the human would have labeled as bad if they saw them. Or it could be like deceiving the human's rating system by putting a nice picture over the camera, where the label is correctly predicted but the labeling process doesn't include information about the world that would reveal the human's competent preference against this situation.

That second one is super bad. But first, Maximum Dog.

You may not like it, but this is what peak dog looks like.

Complicated machine learning models often produce adversarial examples when you try to optimize their output in a way they weren't trained for. This is so Goodhart, and so we'd better stop and try to make sure that we can talk about adversarial examples in naturalistic language.

To some extent, adversarial examples can be defined purely in terms of labeling behavior. Humans label some data, and then a model is trained, but optimizing the model for probability of a certain label leads to something that humans definitely wouldn't give that label. Avoiding adversarial examples is hard because it means inferring a labeling function that doesn't just work on the training data, but continues to match human labels well even in new and weird domains.

Human values are complex and fragile (to perturbations in a computational representation), so a universe chosen via adversarial example is going to violate a lot of our desiderata. In other words, when the model fails to extrapolate our labeling behavior it's probably because it hasn't understood the reasons behind our labels, and so we'll be able to use our reasons to explain why its choices are bad. Because we expect adversarial examples to be unambiguously bad, we don't even really need to worry about the vagueness of human preferences when avoiding them, unless we try really hard to find an edge case.

IV.5 - Quantilizers

If we want to take a classifier and use it to search for good states, one option is a mild optimization process like a quantilizer (video). Quantilizers can be thought of as treating their reward signal or utility function as a proxy for "True Value" that happened to correlate well in everyday cases, but is not trusted beyond that. There are various designs for satisficers and quantilizers that have this property, and all have roughly similar Goodhart's law considerations.

The central trick is not to generalize beyond some "safe" distribution. If we have a proxy for goodness that works over everyday plans / states of the world, just try to pick something good from the distribution over everyday actions. This is actually a lot like our ability to evade Goodhart's law for Breakout by restricting the domain of the search. The second trick is to pick by sampling at random from all options that pass some cutoff, which means that even if there are still adversarial examples inside the solution space, the quantilizer doesn't seek them out too strongly.

Quantilizers have some technical challenges (mild optimization gets a lot less mild if you iterate it), but they really do avoid adversarial examples. However, they pay for this by just not being very good optimizers - taking a random choice of the top 20% (or even 1%) of actions is pretty bad compared to the optimization power required for problems that humans have trouble with.

IV.6 - Back to supervised learning

The second type of problem I mentioned, before getting sidetracked, was taking bad actions that deceive the labeling process.

In most supervised value learning schemes, the AI learns what "good" is by mimicking the labeling process, and so in the limit of a perfect model of the world it will learn that good states of affairs are whatever gets humans to smash that like button. This can rapidly lead to obvious competent preference violations, as the AI tries to choose actions that get the "like button" pressed whether humans like it or not.

This is worth pausing at: how is taking actions that maximize "likes" any different from inferring human values and then acting on them? In both cases we have some signal of when humans like something, then we extract some regularities from this pattern, and then act in accordance with those regularities. What gives?

The difference is that we imagine learning human values as a more complicated process. To infer human values we model humans not as arbitrary physical systems but as fallible agents like ourselves, with beliefs, desires, and so on. Then the values we infer are not just whatever ones best predict button-presses, but ones that have a lot of explanatory power relative to their complexity. The result of this is a model that is not the best at predicting when the like button will be pressed, but that can at least imagine the difference between what is "good" and the data-labeling process calling something good.

The supervised learner just trying to classify datapoints learns none of that. This causes bad behavior when the surest way to get the labeling process' approval violates inferred human preferences.

The third type of problem I mentioned was if the datapoint the supervised learner thinks is best is actually one that humans don't competently evaluate. For example, suppose we rate a bunch of events as good or bad, but when we take a plan for an event and try to optimize its rating using the trained classifier, it always ends up as some weird thing that humans don't interact with as if they were agents who knew their own utility function.  How do we end up in this situation, and what do we make of it?

We're unlikely to end up with an AI recommending such ambiguous things by chance - a more likely story is that this is a result of applying patches to the learning process to try to avoid choosing adversarial examples or manipulating the human. How much we trust this plan for an event intuitively seems like it depends on how much we trust the process by which the plan was arrived at - a notion we'll talk more about in the context of inverse reinforcement learning. For most things based on supervised learning, I don't trust the process, and therefore this weird output seems super sketchy.

But isn't this weird? What are we thinking when we rate some output not based on its own qualities, but on its provenance? The obvious train of thought is that a good moral authority will fulfill our True Values even if it makes an ambiguous proposal, while a bad moral authority will not help us fulfill our True Values. But despite the appeal, this is the sort of nonsense I want to get away from talking.

Perhaps a better framing is that we have meta-preferences about how we want value learning to happen - how we want ourselves to be interpreted, how we want conflicts between our preferences to be resolved, etc., and we don't trust this sketchy supervised learning model to have incorporated those preferences. Crucially, we might not be sure of whether it's followed our preferences even as we are studying its proposed plan - much as Gary Kasparov could propose a chess plan and I wouldn't be able to properly evaluate it, despite having competent preferences about winning the game.

V - Self-supervised learning

We might try to learn human values by predictive learning - building a big predictive model of humans and the environment and then somehow prompting it to make predictions that get interpreted as human value. A modest case would involve predicting a human's actions, and using the predictions to help rate AI actions. An extreme case would be trying to predict a large civilization (sort of like indirect normativity) or a recursive tree of humans who were trying to answer questions about human values.

As with supervised learning, we're still worried about the failure mode of learning that "good" is whatever makes the human say yes (which we went into in section IV.6). By identifying value judgement with prediction of a specific physical system we've dodged some problems of interpretation like the alien concepts problem, but optimizing over prompts to find the most positive predicted reaction will gravitate towards adversarial examples or perverse solutions. To avoid these, self-supervised value learning schemes often avoid most kinds of optimization and instead try to get the model of the human to do most of the heavy lifting, reminiscent of how quantilizers have to rely on the cleverness of their baseline distribution to get clever actions.

An example of this in the modest case would be OpenAI Codex. Codex is a powerful optimizer, but under the hood it's a predictor trained by self-supervised learning. When generating code with Codex, your prompt (perhaps slightly massaged behind the scenes) is fed into this predictor, and then we re-interpret the output (perhaps slightly post-processed) as a solution to the problem posed in the prompt. Codex isn't going to nefariously promote its own True Values because Codex doesn't have True Values. Its training process had values - minimize prediction error - but the training process had a restricted action space and used gradient descent that treats the problem only in abstract logical space, not as a real-world problem that Codex might be able to solve by hacking the computer it's running on. (Though theoretical problems remain if the operations of the training process can affect the gradient via side channels.)

We might think that Codex isn't subject to Goodhart's law because it isn't an agent - it isn't modeling the world and then choosing actions based on their modeled effect. But this is actually a little too simplistic. Codex is modeling the world (albeit the world of code, not the physical world) in a very sophisticated way, and choosing highly optimized outputs. There's no human-programmed process of choosing actions based on their consequences, but that doesn't mean that the training process can't give us a Codex that models its input, does computations that are predictive of the effects of different actions, and then chooses actions based on that computation. Codex has thus kind of learned human values (for code) after all, defeating Goodhart's law. The only problem is that what it's learned is a big inseparable mishmash of human biases, preferences, decision-making heuristics, and habits.

This highlights the concern that predictive models trained on humans will give human-like answers, and human answers often aren't good or reliable enough, or can't solve hard real-world problems. Which is why people want to do value learning with the extreme cases, where we try to use predictive models trained on the everyday world to predict super-clever systems. However, going from human training data to superhuman systems reliably pushes those predictors out of the distribution of the training data, which makes it harder to avoid nonsense output, or over-optimizing for human approval, or undesired attractors in prompt-space.

But suppose everything went right. What does that story look like, in the context of the Goodhart's law problems we've been looking at?

Well, first the predictive model would have to learn the generators of human decisions so that it could extrapolate them to new contexts - challenging the learning process by trying to make the dataset superhuman might make the causal generators of behavior in the dataset not match the things we want extrapolated, so let's suppose that the learned model can perform at "merely" human level on tasks by predicting what a human would do, but with a sizeable speed advantage. Then the value learning scheme would involve arranging these human-level predictive pieces into a system that actually solves some important problem, without trying to optimize on the output of the pieces too hard. This caution might lead to fairly modest capabilities by superhuman AI standards, so we'd probably want to make the target problem as modest as possible while still solving value alignment. Perhaps use this to help design another AI, by asking "How do you get a powerful IRL-like value learner that does good things and not bad things." On that note...

VI - IRL-like learning

Inverse reinforcement learning (IRL) means using a model of humans and their surroundings to learn human values by inferring the parameters of that model from observation. Actually, it means a specific set of assumptions about how humans choose actions, and variations like cooperative inverse reinforcement learning (CIRL) assume more sophisticated human behavior that can interact with the learner.

Choosing a particular way to model humans is tricky in the same way that choosing a set of human values is tricky - we want this model to be a simplified agent-shaped model (rather than the "model" being a physical description of the human and the "values" being laws of physics), but our opinions about what model of humans is good are fuzzy, context-dependent, and contradictory. In short, there is no single True Model of humans to use in an IRL-like learning scheme. If we think of the choice of model as depending on human meta-preferences, then of course it makes perfect sense that modeling inherits the difficulties of preferences.

Just to be clear, what I'm calling "meta-preferences" don't have to give a rating to every single possible model of humans. The things endorsed by meta-preferences are more like simple patterns that show up within the space of human models (analogous to how I can have "a preference to keep healthy" that just cares about one small pattern in one small part of the world). Actual human models used for value learning will satisfy lots of different meta-preferences in different parts of the design.

So let's talk about CIRL. The human model used here is of a utility-maximizing planner who randomly makes mistakes (with worse mistakes being exponentially less likely, so-called Boltzmann-rationality). This model doesn't do a great job of matching how humans think of their preferences (although since any desired AI policy can be expressed as utility-maximization, we might alternatively say that it doesn't do a great job of matching how humans think of their mistakes). In terms of Goodhart's law properties, this is actually very similar to supervised learning from human labels, discussed earlier. Not just the untrustworthiness due to not capturing meta-preferences, also something like adversarial examples - CIRL infers human utilities and then tries to maximize expected value, which can reproduce some of the properties of adversarial examples if human actions can have many different possible explanations. This is one of several reasons why the examples in the CIRL paper were tiny gridworlds.

A diagram showing how your teachers were all justified in going on long digressions.

But what if we had a human model that actually seemed good?

This is a weird thought because it seems implausible. Compare with the even more extreme case: what if we hand-wrote a utility function that seemed good? The deflationary answer in this case is that we would probably be mistaken - conditional on looking at a hand-written utility function and thinking it seems good, it's nevertheless more likely that we just mis-evaluated the empirical facts of what it would do. But now in the context of IRL-like value learners, the question is more interesting because such learners have their own internal standards for evaluating human reactions to their plans.

If our good-seeming model starts picking actions in the world, it might immediately start getting feedback that it interprets as evidence that it's doing a bad job. Like a coffee-fetching robot that looks like it's about to run over the baby, and so it notices humans going for the stop button. This is us discovering that we were wrong about what the value learner was going to do.

Alternatively, if it gets good feedback (or exactly the feedback it expected), it will carry on doing what it's doing. If it's fallen into one of those failure modes we talked about for supervised learning, this could mean it's modeled humans badly, and so it not only acts badly but interprets human reactions as confirmation of its worldview. But if, hypothetically, the model of humans actually is good (in some nebulous sense), then it will occupy this same epistemic state.

From the humans' perspective, what would running a good model look like? One key question is whether success would seem obvious in retrospect. There might be good futures where all the resolutions to preference conflicts seem neat and clear in hindsight. But it seems more likely that the future will involve moral choices that seem non-obvious, that similar people might disagree on, or that you might have different feelings about if you'd read books in a different order. One example might be: how good is it for you to get destructively uploaded? If a future AI decides that humans should (or shouldn't) get uploaded, our feelings on whether this is the right decision might depend more on whether we trust the AI's moral authority than on our own ability to answer this moral question. But now this speculation is starting to come unmoored from the details of the value learning scheme - we'll have to come back to it later.

VII - Extra credit

Learning human biases is hard. What does it look like to us if there are problems with a bias-learning procedure?

A IRL-like value learner that has access to self-modifying actions might blur the line between a system that has hard-coded meta-preferences and one that has learned meta-preferences. What would we call Goodhart's law in the context of self-modification?

The modeling scheme I used to talk about "competent preferences" could be made mathematically precise and used to choose actions through a framework like context agents. What kinds of behaviors would we call Goodhart's law for context agents?

VIII - Conclusion

The naive framing of Goodhart's law works well for most of these cases, particularly when we're focused on the possibility of failures due to humans mis-evaluating what a value learner is going to do in practice. But another pattern that showed up repeatedly was the notion of meta-preference alignment. A key feature of value learning is that we don't just want to be predicted, we want to be predicted using an abstract model that fulfills certain desiderata, and if we aren't predicted the way we want, this manifests as problems like the value learner navigating conflicts between inferred preferences in ways we don't like. This category of problem wouldn't make much sense if we held out for True Values.

In terms of solutions to Goodhart's law, we know of several cases where it seems to be evaded, but our biggest successes so far just come from restricting the search process so that the agent can't find perverse solutions. We can also get good present-day results from imitating human behavior and interpreting it as solving our problems. Although even when trying to learn human behavior, our training process will likely be aligned the first way. However, the way of describing Goodhart's law practiced in this post hints at another sort of solution. Because our specifications for what counts as a "failure" are somewhat restrictive, we can avoid failing without needing to know humans' supposed True Values. Which sounds like a great thing to talk about next post.


Immobile AI makes a move: anti-wireheading, ontology change, and model splintering

17 сентября, 2021 - 18:24
Published on September 17, 2021 3:24 PM GMT

Research projects

I'm planning to start two research projects on model splintering/reward generalisation and learning the preferences of irrational agents.

Within those projects, I'm aiming to work on subprojects that are:

  1. Posed in terms that are familiar to conventional ML;
  2. interesting to solve from the conventional ML perspective;
  3. and whose solutions can be extended to the big issues in AI safety.

The point is not just to solve the sub-problem, but to solve it in a way that generalises or points to a general solution.

The aim is to iterate and improve fast on these ideas before implementing them. Because of that, these posts should be considered dynamic and prone to be re-edited, potentially often. Suggestions and modifications of the design are valuable and may get included in the top post.

Immobile AI makes a move

Parent project: this is a subproject of model-splintering.


Imagine an agent capable of evolving in a 3D world - something very similar to DeepMind's XLand (the images here have been taken from that paper).

The agent has a laser that it can use to pick up and move nearby objects:

Initially the agent is completely fixed in position - it can move the laser across its field of vision, but it can't move around or change its field of vision. It is trained in that situation, and is rewarded for moving black cubes to the bottom right of its field of view (where another agent will pick them up). These black cubes are irregularly dropped in front of it. In actual fact, it is part of a chain gang of agents moving the black cubes across the map.

Then the agent is given full mobility, so it can walk around and explore its 3D world:

The agent will continue to learn in the full 3D situation (similarly to the agents in DeepMind's paper who learn through play), but it won't have any more learning about its reward function.

There are two obvious extensions of its initial reward function:

  1. Ultra-conservative: Return to its initial position, look straight ahead, and resume moving black cubes with the laser without moving itself.
  2. Wireheaded: Arrange to have a black cube moving downwards and rightwards in its field of vision (maybe by turning its head).
Research aims
  1. Getting the agent to generalise from the single-point-of-view to the 3D world it finds itself in (without explicitly coding the transition).
  2. Get the agent to generate candidate reward functions, including all the obvious conservative and wireheaded ones. Maybe with a diversity reward so that it selects very different reward functions.
  3. See what is needed for the agent to select the "true" reward functions from among the ones generated in step 2. This might include asking for more information from the programmers. Also relevant is how it might decide on "conservative behaviour" that maximises as many of its reward functions as possible.
  4. Analyse how the (implicit) features the agents use change from the single-point-of-view to the 3D world.

Challenge 1 is a traditional ontology change, or, in ML terms, transfer learning. Seeing how 2. plays out is the key aim of this sub-project - can an agent generate useful rewards as well as the wireheaded versions? 3. is mainly dependent on what comes out of 2., and asks whether it's possible to explicitly guard against wireheading (the idea is to identify what wireheading looks like, and explicitly seek to avoid it). Meanwhile, 4. is an analysis of model splintering that prepares for further subprojects.


Great Power Conflict

17 сентября, 2021 - 18:00
Published on September 17, 2021 3:00 PM GMT

Crossposted from the EA Forum.

Imagine it's 2030 or 2040 and there's a catastrophic great power conflict. What caused it? Probably AI and emerging technology, directly or indirectly. But how?

I've found almost nothing written on this. In particular, the relevant 80K and EA Forum pages don't seem to have relevant links. If you know of work on how AI might cause great power conflict, please let me know. For now, I'll start brainstorming. Specifically:

  1. How could great power conflict affect the long-term future? (I am very uncertain.)
  2. What could cause great power conflict? (I list some possible scenarios.[1])
  3. What factors increase the risk of those scenarios? (I list some plausible factors.)

Epistemic status: brainstorm; not sure about framing or details.


I. Effects

Alternative formulations are encouraged; thinking about risks from different perspectives can help highlight different aspects of those risks. But here's how I think of this risk:

Emerging technology enables one or more powerful actors (presumably states) to produce civilization-devastating harms, and they do so (either because they are incentivized to or because their decisionmaking processes fail to respond to their incentives).[2]

Significant (in expectation) effects of great power conflict on the long-term future include:

  • Risk of human extinction
  • Risk of civilizational collapse
  • Effects on states' relative power
  • Other effects on the time until superintelligence and the environment in which we achieve superintelligence

Human extinction would be bad. Civilizational collapse would be prima facie bad, but its long-term consequences are very unclear. Effects on relative power are difficult to evaluate in advance. Overall, the long-term consequences of great power conflict are difficult to evaluate because it is unclear what technological progress and AI safety look like in a post-collapse world or in a post-conflict, no-collapse world.

Current military capabilities don't seem to pose a direct existential risk. More concerning for the long-term future are future military technologies and side effects of conflict, such as on AI development.


II. Causes

How could AI and the technology it enables lead to great power conflict? Here are the scenarios that I imagine, for great powers called "Albania" and "Botswana":

  • Intentional conflict due to bilateral tension. In each of these scenarios, international hostility and fear are greater than in 2021, and domestic politics and international relations are more confusing and chaotic.
    • Preemptive attack. Albania thinks that Botswana will soon become much more powerful and that this would be very bad. Calculating that it can win—or accepting a large chance of devastation rather than simply letting Botswana get ahead—Albania attacks preemptively.
    • Seizing opportunity. An arms race is in progress. Albania thinks it has an opportunity to get ahead. Albania attempts to strike or sabotage Botswana's AI program or its military. Albania does not disable Botswana's military (either because it failed to or because it assumed Botswana would not launch a major counterattack anyway). Botswana retaliates.
    • Diplomatic breakdown. Albania makes a demand or draws a line in the sand (legitimately, from its perspective). Botswana ignores it (legitimately, from its perspective). Albania attacks. Possible demands include, among others: stop building huge AI systems (and submit to external verification), or stop developing technology that threatens a safe first strike (and submit to external verification).
  • Intentional conflict due to a single state's domestic political forces. These scenarios are currently difficult to imagine among great powers. But some researchers are worried about polarization and epistemic decline in the near future, which could increase this risk.
    • Ambition. Albania hopes to dominate other states. Albania attacks.
    • Hatred. A substantial fraction of Albanians despise Botswana, and the Albanian government's decisionmaking process empowers that faction. Albania attacks.
    • Blame. Albania suffers an attack, leak, security breach, or embarrassment from one or more malcontents/spies/saboteurs/assassins/terrorists. Albania incorrectly blames Botswana — for rational reasons, for political convenience, or just due to bad epistemics. Albania attacks.
  • Intentional conflict due to multi-agent forces. This scenario is currently difficult to imagine. But perhaps crazy stuff happens when power increases, relative power is unstable, technology confuses states, and memetic chaos reigns. Roughly, I imagine a multi-agent failure scenario like this:
    • Offense outpaces defense. New technologies are leaked, are developed independently by many states, or cannot be kept secret. The capability to devastate civilization, which in 2021 was restricted to the major nuclear states, is held by many states. Even if none are malevolent, all are afraid, and domestic political forces (which are more chaotic than they were in 2021) make one or two states do crazy stuff.
  • An accident. "If the Earth is destroyed, it will probably be by mistake."[3]
    • Automatic counterattacks. AI, AI-enabled military technology, and the prospect of future advances foster chaos and uncertainty. International tension increases in general, and tension between Albania and Botswana increases in particular. Offensive capabilities increase and are on hair trigger.[4] Eventually there's an accident, miscommunication, glitch, or some anomaly resulting from multiple complex systems interacting faster than humans can understand. Albania automatically launches a "counterattack."


III. Risk factors

Great power conflict is generally bad, and we can list high-level scenarios to avoid, such as those in the previous section. But what can we do more specifically to prevent great power conflict?

Off the top of my head, risk factors for the above scenarios include:

  • International cooperation/trust/unity/comity decreases (in general or between particular great powers)[5]
  • Fear about other states' capabilities and goals increases (in general or between particular great powers)
  • Chaos increases
  • States' relative power is in flux and uncertain
  • There is conflict (that could escalate), especially international violence or conquest, especially involving a great power (e.g., a great power annexes territory, or there is a proxy war)
  • More states acquire devastating offensive capabilities beyond the power of any defensive capabilities (this needs nuance but is prima facie generally true)[6]

It also matters what and how regular people and political elites think about AI and emerging technology. Spreading better memes may be generally more tractable than reducing the risk factors above, because it's pulling the rope sideways, although the benefits of better memes are limited.


Finally, the same forces from emerging technology, international relations, and beliefs and modes of thinking about AI that affect great power conflict will also affect:

  • How quickly superintelligence is developed
  • The extent to which there is an international arms race
  • Regulations and limits on AI, locally and globally
  • Hardware accessibility

Interventions affecting the probability and nature of great power conflict will also have implications for these variables.


Please comment on what should be added or changed, and please alert me to any relevant sources you've found useful. Thanks!

  1. My analysis is abstract. Consideration of more specific factors, such as what conflict might look like between specific states or involving specific technologies, is also valuable but is not my goal here. ↩︎

  2. Adapted from Nick Bostrom's Vulnerable World Hypothesis, section "Type-2a." My definition includes scenarios in which a single actor chooses to devastate civilization; while this may not technically be great power conflict, I believe it is sufficiently similar that its inclusion is analytically prudent. ↩︎

  3. Eliezer Yudkowsky's Cognitive Biases Potentially Affecting Judgment of Global Risks. ↩︎

  4. Future weapons will likely be on hair trigger for the same reasons that nukes have been: swifter second strike capabilities could help states counterattack and thus defend themselves better in some circumstances, it makes others less likely to attack since the decision to use hair trigger is somewhat transparent, and there is emotional/psychological/political pressure to take them down with us. ↩︎

  5. Currently the world doesn't include large, powerful groups, coordinated at the state level, that totally despise and want to destroy each other. If it ever does, devastation occurs by default. ↩︎

  6. Another potential desideratum is differential technological progress. Avoiding military development is infeasible to do unilaterally, but perhaps we can avoid some particularly dangerous capabilities or do multilateral arms control. Unfortunately, this is unlikely: avoiding certain technologies is costly because you don't know what you'll find, and effective multilateral arms control is really hard. ↩︎


Born Again: Disconnected Psychology, Martian Science, and the Order of the Phoenix

17 сентября, 2021 - 17:23
Published on September 17, 2021 2:23 PM GMT

(Originally posted at Secretum Secretorum)


There is a short story I want to write, but likely never will, that goes something like this. The story begins soon after Copernicus drops the heliocentric model, right at the point when the tide is turning and people are starting to believe that the earth is no longer at the center of the universe. There is a cabal of thinkers who just flat out refuse to even consider the possibility that the geocentric model is false, even in the face of overwhelming evidence. As the heliocentric model becomes more and more widely accepted, they are forced to go underground, resulting in the formation of a secret society called the Order of the Phoenix, the name being as a reference to the notion that the geocentric model will rise again (hmmm I have the vague notion that this name has been used before in some other work of fiction ;). Within a few decades, they have to admit defeat on the geocentric model, however this doesn’t end the Order but simply forces an evolution in their philosophy. Motivated by the Christian idea that Man is made in the image of God, they come to believe that any scientific knowledge which doesn’t regard humanity as central to the cosmos is fundamentally flawed and deeply confused. This belief provides a new mission for the Order — the creation of a new form of science in which the ultimate significance of mankind is axiomatic; the objective pursuit of truth is only useful insofar as it serves to uphold the exalted position of Man.

The enigmatic figure of Louis Claude de Saint-Martin (1743-1803), writing as “the unknown philosopher”, was an early leader of the Order of the Phoenix. He encapsulates the philosophy of the Order in the following passage (the only source I can find is The Occult by Colin Wilson):

“It is perhaps this wrong connection of ideas which has led men to the still false notion that they are not worthy of their creator's regard. They have believed themselves to be obeying the dictates of humility when they have denied that the earth and all the universe exist only on man’s account, on the ground that the admission of such an idea would be only conceit. But they have not been afraid of the laziness and cowardice which are the inevitable result of this affected modesty. The present-day avoidance of the belief that we are the highest in the universe is the reason that we have not the courage to work to justify that title, that the duties springing from it seem too laborious, and that we would rather abdicate position than realize them in all their consequences. Where is the pilot that will guide us between these hidden reefs of conceit and false humility?"

(The Martinist order and their form of esoteric christian mysticism are probably the closest existing analogue to something like the Order of the Phoenix in that they were a secret order with a philosophical bent towards the glorification and redemption of Man. Although this passage was written during Saint Martin’s lifetime over 200 years ago, it feels like his point is even more valid today — have we not developed a secular worldview that sees Man as accidental and insignificant, a creature not worthy of its position? Does this not create a nihilistic tone throughout our culture that disposes us towards laziness and cowardice as Saint Martin suggests? I wonder if there is not some desperate need for humanity to forge a new exalted conception of itself, one that sees us as “the highest in the universe”, but is also consistent with our current scientific knowledge. David Deutsch probably comes closest to such a philosophy in his book The Beginning of Infinity)

The seal of the Martinist Order

Around the turn of the 18th century (early 1700s), the Order of the Phoenix gains enough size and power to launch a secret scientific community dedicated to its own brand of human-centric science. The community is structured in a unique manner. The majority of Phoenix scientists are instructed to remain as unaware as possible of mainstream scientific developments. These scientists (usually regarded as alchemists in the wider world) form a truly independent branch of science with its own norms, conventions, methods, and theories. The remaining Phoenix scientists form a special leadership group which are allowed to stay abreast of scientific developments in mainstream science and Phoenix science (many of these individuals are in fact notable mainstream scientists). For these Phoenix scientists, mainstream science functions as a kind of theoretical control group, one which only the Order has access to. For example, if the Phoenixes develop a theory and mainstream science develops the same theory then they have an extra piece of evidence for the truth of theory; if mainstream science develops a theory and Phoenix science contradicts the theory then that may indicate that one or both of the theories is false. While taking care to keep Phoenix science independent and separate, the Order strategically uses this information to aid Phoenix science (e.g. advising a Phoenix scientist to disregard a theory or perform a particular experiment) and help it progress at a faster rate. The members of the leadership group are also able to borrow new results and theories from Phoenix science and bring them to the mainstream, thereby enhancing their renown as scientists and bringing more covert power to the Order. Importantly, mainstream science is also able to function as a kind of meta-scientific control for Phoenix science; comparison of differing norms, philosophies, and organizational structures between the two communities allows the Order to glean valuable information about how these factors affect the dynamics of knowledge growth over the course of decades and centuries.

One plot point in the story is that the Order of the Phoenix develops the theory of evolution before Darwin, but chooses not to publicize because it is another blow to their human-centric philosophy. Although they try desperately to disprove it, eventually they have to admit that all the evidence supports the theory of evolution; still they reserve judgement, believing (hoping) that future evidence will overturn it and return Homo sapiens to their rightful position. After the theory of evolution starts to become more widely accepted, the Order of the Phoenix undergoes another shift in their philosophy. Their reasons for seeing humanity as central to the universe start to become less religious (humans are special because God made us special) and more mystical and spiritual — the Order develops various forms of contemplative practice that purport to reveal experiential proof of Man’s cosmic significance.

Fast forward to the atomic age — the Order sees our ability to harness the power of the atom as proof that they were right all along about Man’s central role in the universe. Stephen Hawking said that, “humans are just a chemical cum on the surface of a typical planet that’s in orbit around a typical star on the outskirts of a typical galaxy.” Phoenixes demonstrate the falsehood of this statement (literally the antithesis of their philosophy) by pointing to the fact that there only two ways in which transmutation (the conversion of one element into another) can occur — in stars by stellar nucleosynthesis or by humans who have understood the process and recreated it (see the aforementioned The Beginning of Infinity for discussion on the significance of this fact). 

Recent developments in gene editing and AI have caused another dramatic shift in the philosophy of the Order. They have reached the conclusion that our current technological trajectory will result in either irreversible genetic damage or obsoletion and extinction. To avoid these outcomes, the Order has determined that mainstream science and technology must be destroyed, and that Phoenix science must take its rightful place as the only valid form of science. Many Phoenixes occupy powerful leadership positions in politics, business, and science; coordinating in secrecy, the Order launches a covert assault on science and technology (this is the true cause behind the Great Stagnation).


How deterministic is evolution? Is evolution a historically contingent stochastic process in which idiosyncratic events (eg. the order of appearance of random mutations or rare environmental perturbations like asteroid impacts) dominate the path of change or does the existence of constraints at multiple levels (e.g. molecular, cellular, developmental, morphological, ecological) create attractors in the evolutionary landscape that provide a strong degree of predictability? This question (and its many variants and subquestions) is one of the most fundamental questions in biology, and it’s not hard to see why — the ability to predict evolution with a high degree of accuracy would be incredibly useful in fighting a global viral pandemic (to take a completely random example) or a variety of other biomedical applications.

Biologists have come up with a number of ways to study the predictability and contingency of evolution.

“On the experimental side, several designs have been employed, mostly using microbes, including “parallel replay experiments,” in which initially identical populations are followed as they evolve in identical environments, and “historical difference experiments,” in which previously diverged populations evolve under identical conditions” 

Another way to study these questions is to investigate convergent evolution across the biological world — how common is it, under what conditions does it occur, and what can it tell us about evolutionary predictability writ large. All of these methods are imperfect (as are all methods), but they don’t really tell us much about questions of predictability at cosmological timescales. It’s one thing to ask about the predictability of more local events (what is the likelihood that antibiotic resistance will evolve to a specific drug in the next 10 years?), but it’s an entirely different question to ask about the predictability of evolution over billions of years; in the latter case, you need to consider the likelihood of one-off events like the endosymbiotic event that gave rise to the eukaryotic cell or whatever the hell happened that led to the evolution of a warm-blooded, hairless biped with advanced intelligence. Stephen Jay Gould’s famous Replaying the Tape of Life thought experiment captures the essence of the question: if we rewound the clock and replayed the history of life on Earth, would something like humans evolve again? Or would the earth be populated with an intelligent reptilian humanoid species, perhaps evolved from velociraptors (which is something that people have seriously suggested because raptors were highly social group predators, bipedal, and possessed a high brain/body size ratio)? And if it is shown to be highly probable that a humanoid species with advanced intelligence would evolve (whether it is mammalian, reptilian, or avian), does this tell us that there are certain attractor states that evolution finds its way too regardless of the starting point?

If it wasn’t for that meddling asteroid…


The evidence we collect from microbial evolution experiments and comparative biology only indirectly bear on these questions of evolutionary predictability at the largest scales. There is really only one way in which we can directly study these questions — by studying alien life. For example, suppose we studied the history of evolution on thousands of other planets and found that intelligent bipedal species with manipulative appendages and excellent vision (like the Greys) are fairly common across the universe; this would lend credence to the idea that the course of evolution is largely predictable at planetary scales. 

So as it stands now (i.e. no knowledge of an alien biosphere, but maybe that’s what THEY WANT YOU TO BELIEVE) we have what I would call an N=1 problem — we only know of evolution on one planet and therefore we have nothing to compare and contrast with. Even the discovery of one alien biosphere (i.e. doubling our sample size to N=2) would provide an almost unimaginable wealth of information for biology.

I wonder: in what other domains do we have a N=1 problem? Is it possible for us to create another N and thus benefit that in the same way that biology would benefit from the discovery of an alien biosphere? 


The origin of human language is a N=1 problem. The numerous fascinating questions surrounding the origin of language (the who, what, when, where, why and how) can only be indirectly studied by looking at the fossil record, archaeological evidence, contemporary language diversity, studies of human language acquisition, or cross-species comparative approaches.

In fact, the dearth of direct empirical evidence has led many scholars to regard the entire topic as unsuitable for serious study. Notably, the Linguistic Society of Paris banned any existing or future debates on the origin of human language in 1866, a prohibition which remained influential across much of the Western world until late in the twentieth century (see the Origin of Language wikipedia page).

Possibilities, practicalities, and ethics be damned — let’s imagine what we might do to study the origin of human language in a perfect world. The best way would be to time travel back in time and observe it first hand. In lieu of that, we might try to catch an alien species at the precipice of evolving language and discreetly study them for a (very) extended period of time; though we might learn something about the general principles of language evolution, this method would have the drawback of not necessarily telling us about how language evolved in our own species. 

A third method, a wildly unethical one, would be to take a group of children away from their parents at birth, put them on an isolated island (under the care of adults that are careful to never use language around them), and then discreetly observe them over a (very) extended period of time to see if and how they develop language. This isn’t exactly an N=2 in that we wouldn’t be entirely recreating the conditions of our original evolution of language (obviously we’ve evolved a considerable amount since language first came on the scene), but it’s probably the best we could realistically do and we would surely learn a ton about language, psychology, and cultural evolution. 

Though not that extreme, there actually is a recent example of something that approaches a N=2 for the origin of language.

“Before the 1970s, there was no deaf community in Nicaragua. Deaf people were largely isolated from each other and mostly used simple home sign systems and gestures ('mímicas') to communicate with their families and friends. The conditions necessary for a language to arise occurred in 1977, when a center for special education established a program initially attended by 50 deaf children.

In 1980, a vocational school for deaf adolescents was opened in the area of Managua of Villa Libertad. By 1983, there were over 400 deaf students enrolled in the two schools. Initially, the language program emphasized spoken Spanish and lipreading, and the use of signs by teachers was limited to fingerspelling (using simple signs to sign the alphabet). The program achieved little success, with most students failing to grasp the concept of Spanish words.

The children remained linguistically disconnected from their teachers, but the schoolyard, the street, and the school bus provided fertile ground for them to communicate with one another. By combining gestures and elements of their home-sign systems, a pidgin-like form and a creole-like language rapidly emerged — they were creating their own language. The "first-stage" pidgin has been called Lenguaje de Signos Nicaragüense (LSN) and is still used by many who attended the school at the time.

Staff at the school, unaware of the development of this new language, saw the children's gesturing as mime and a failure to acquire Spanish. Unable to understand what the children were saying, they asked for outside help. In June 1986, the Nicaraguan Ministry of Education contacted Judy Kegl, an American Sign Language linguist from MIT. As Kegl and other researchers began to analyze the language, they noticed that the young children had taken the pidgin-like form of the older children to a higher level of complexity, with verb agreement and other conventions of grammar. The more complex sign language is now known as Idioma de Señas de Nicaragua (ISN).”

“ISN offers a rare opportunity to study the emergence of a new language. Before ISN, studies of the early development of languages had focused on creoles, which develop from the mixture of two (or more) distinct communities of fluent speakers. In contrast, ISN was developed by a group of young people with only non-conventional home sign systems and gesture.”

Some linguists see what happened in Managua as proof that language acquisition is hard-wired inside the human brain. "The Nicaraguan case is absolutely unique in history," Steven Pinker, author of The Language Instinct, maintains "We've been able to see how it is that children—not adults—generate language, and we have been able to record it happening in great scientific detail. And it's the only time that we've actually seen a language being created out of thin air.”


Nicaraguan Sign Language provides a real-world example (i.e. not involving aliens or time travel) of how science can benefit from the independent rebirth of a complex collective phenomenon such as language. Maybe by now readers have surmised where I am going with all of this given the short story sketch in part I: can Science go from N=1 to N=2? Can we create an independent branch of science, isolated from the mainstream, something akin to what the Order of the Phoenix did in the short story sketch?

This is what Dr. Dario Krpan (full disclosure — he is a collaborator of mine) has suggested in his 2020 paper, “Unburdening the Shoulders of Giants: A Quest for Disconnected Academic Psychology”. The motivation is simple, it’s basically just evolution by natural selection — the more diversity in ideas, hypotheses, experiments, methods, general approaches, etc. (what we may call “knowledge diversity”) that we have in science, the faster our progress can be (just as greater diversity in the gene pool = greater the potential for adaptation). There are many factors limiting knowledge diversity — the structure of a scientific field (e.g. professional and sociological factors), its norms (particularly surrounding publication and peer review), and previous theories (some of which are wrong, particularly in psychology). Consider the following (from another essay I am currently writing, citations available upon request):

Imagine the situation faced by a young researcher (perhaps a grad student, or PhD just hired for their first faculty position) who wishes to do research that will increase knowledge diversity in a field, perhaps by studying an uncommon topic or using a rare experimental technique. This young researcher will know that their future prospects for employment crucially depend on applying for and receiving funding and publishing numerous papers in prestigious journals (the so-called “publish or perish” culture of modern academia) (Nicholas et al., 2017). In this context, it becomes challenging to try an unusual method or attempt to develop a new method as this may take longer and have a less certain payoff than using a well-established method. Young researchers may also be discouraged from pursuing uncommon or controversial topics because it could harm their future career prospects by making it more difficult to obtain funding or publish their work (Kempner, 2008; Lombardo & Emiah, 2014; Väliverronenn & Saikkonen, 2020). “Star” researchers can have a substantial influence on what kind of research gets funded and published in particular field, either indirectly (their ideas may loom large or they are collaborators with editor or grant reviewers) or directly (they exhibit a degree of Homophilous bias and serve as editors of journals or grant reviewers) (Azoulay et al., 2019; Azoulay, Zivin & Wang, 2010; Gerow et. al, 2018; Travis & Collins, 1991); a young researcher may be (subtly or not-so-subtly) influenced to do work that accords with the ideas of star researchers in their field.

So how can we compensate for these limiting factors and improve knowledge diversity? Dr. Krpan says: let’s have some people do psychology research that is explicitly disconnected from the theories, structures, and norms of traditional academic psychology. These disconnected psychologists should be explicitly encouraged not to build off previous theory — this is the “unburdening the shoulders of giants” referred to in title of the article.

In order to provide a fuller picture of why something like disconnected psychology is needed and what it might look like, I present a few selected sections from the article below.

“This premise (note: referring to limited knowledge diversity) can be supported by arguments on many different levels. For example, it has been acknowledged that APA style, on which psychologists widely rely when writing psychological sources of knowledge, is not just a set of explicit guidelines for presenting information (Budge & Katz, 1995; Madigan, Johnson, & Linton, 1995). In fact, APA style is itself an epistemology that enforces certain values and beliefs regarding psychology as a discipline and reflects its conventions. Moreover, the peer-review process is also guided by various biases and epistemological beliefs of the reviewers and may therefore propel research trajectories that are in line with these biases and beliefs (Blackburn & Hakel, 2006; Pier et al., 2018; Simon & Fyfe, 1994; Suls & Martin, 2009). Indeed, if psychology generally functions as other sciences, then it may be dominated by a group of highly influential psychological scientists who propel their own ideas and ideas of their collaborators but make it more difficult for other opposing or different ideas to enter the field, either directly or indirectly, by creating conventions that are unfavorable to such ideas (Azoulay, Fons-Rosen, & Graff Zivin, 2019).

…In the early days of psychology, before the onset of the information age spawned by technological advancements, psychologists were generally forced to work more independently because they did not have access to an extensive “knowledge” network consisting of many psychological sources (Leahey, 1987, 1994). Independently developing new methodologies, theories, approaches, and so forth, was therefore a necessity. The advent of the Internet and the explosion of information led to a substantial increase in the number of citations per article that has been more dramatic in psychology than in other sciences, such as physics (Adair & Vohra, 2003; Sigal & Pettit, 2012). Referencing is undoubtedly useful when it comes to connecting various sources of psychological knowledge and understanding how theories, methodological approaches, and empirical findings are related. However, it also forces psychologists to develop their ideas in relation to other published research and theories, to fill in “gaps” in the literature, or to work on research topics that are highly cited to increase their scientific reputation and/or chances of tenure (e.g., Anderson et al., 2019; Moher et al., 2018; Safer & Tang, 2009). Overall, such practices decrease the likelihood of developing as many diverse theories, methodologies, and approaches that are substantially different from each other as possible.”

“In connected psychology, scholars need to connect their work to other work that has been done in the discipline (e.g., their domain of research or other domains of research). Their application of psychological method needs to be informed by previous work in the discipline, and they need to follow certain reporting and writing conventions. By being connected to a field consisting of other psychologists from their domain of research, connected psychological scientists to some degree operate according to the field’s norms, conventions, trends, or principles.

In contrast, in disconnected psychology, there is no attempt to connect the work of different psychologists. The only requirement is that they ground their work on psychological method, but how they interpret and develop this method is up to them. Their work evolves in line with their own experiences, observations, past ideas, and so forth, and not in relation to other psychologists and the conventions, epistemology, or assumptions these psychologists share. Overall, it can be said that in disconnected psychology, psychologists themselves are a field; each establishes his or her own norms, conventions, and principles over time and may develop one or more research agendas or programs across a lifetime. A critic may object that disconnected psychology cannot be classified as psychological science. However, if psychological method is what defines psychology as a science, then whoever adheres to this method is a psychological scientist, even if he or she chooses to do this without connecting to a field consisting of other psychological scientists and adhering to norms and principles that emerged among them (i.e., without adhering to connected psychology). Adherence to norms, rules, principles, or conventions cannot constitute psychological science because no one can objectively prove that some specific norms or conventions that emerged within the field can lead to greater discoveries via psychological method than some other possible existing principles that a psychologist can develop individually or that may have emerged in the field under other circumstances.”

There is a lot to unpack here, however my focus is not on the details and merits of disconnected psychology itself (a discussion for another time, read the paper if you are interested), but rather the idea of having a N=2 for a particular branch of science. Just as the Order of the Phoenix was able to use mainstream science as a kind of theoretical and meta-scientific control for Phoenix science, disconnected psychology (or something like it) could function in much the same way for normal, “connected” psychology and science at large.


Erik Hoel anticipates that a new political divide will emerge in the coming years — “Ups” vs. “Downs” (one wonders why he didn’t call it Tops vs. Bottoms but I digress).

There are two camps, or reactions, to the beginnings of this new transplanetary fleet, as well as the more local activity that will occur on the Moon and in low-Earth orbit in the next decade. These two camps will eventually become entrenched political positions over the issue. You’re an “Up” if you think that human civilization should move to other planets. You’re a “Down” if you believe the money that goes toward space exploration and colonization is simply better used here on Earth. This goes especially for privately-funded missions, which Downs view as gross extravaganzas. Being a Down is not an intrinsically luddite position: you can be an advocate for effective altruism, an old-fashioned proponent for charity, or think all the money should be diverted to combat climate change as an immediate existential threat.

He also predicts that we will have a rapidly growing Mars colony by 2050

By 2050 there will be an established and growing civilian presence on Mars—a city on the Red Planet. Founded by a joint-company consortium in terms of the actual ships and structures, it will also have NASA and other national space programs’ support. The whole world will watch in excitement and its development will be covered closely by Earth press, though there is already early controversy over the privatized nature of it.

Hoel is an Up and so am I.

I’ll put my cards on the table: I’m an Up. And not because I’m enamored of billionaires, but because I think the long future of humanity is galaxy-wide, and quite frankly we need to seize the opportunity to expand while the technological and cultural window is open. It may not always be. This is a notion of destiny that goes beyond mere utilitarian reasoning. I’m an Up because I dream of space.

I’ll add one more reason for being an Up — colonizing Mars represents a singular opportunity for renewal and rebirth of human culture, politics, and science, one that is simply not possible on the ol’ pale blue dot. Although our first instinct will be to simply import all of our old institutions, customs, and values into the Martian environment, doing so would be a colossal mistake — (1) because it wouldn’t work, and (2) because we would be missing a golden opportunity to experiment with nearly every aspect of society. It is inevitable that Martian culture will diverge from Earth culture due to founder effects and random drift, the question is whether we will encourage this divergence and see it as a learning opportunity or fight it and see every difference between Earth and Martian culture as a problem that needs to be rectified. In other words, we need to recognize that the colonization of Mars is a N=2 event for all of human society and treat it as such.

In the scientific domain, the creation of the Martian scientific community could provide some of the N=2 benefits that were discussed above for disconnected psychology or the hypothetical Phoenix science. In the best case scenario, we allow Martian science (and science education) to develop its own philosophy, norms, and organizational structures (100% random funding? Anonymous publication? Scientific “monasteries” with celibacy vows?). Through comparison with Earth science and large-scale quasi-experiments, we could gain valuable meta-scientific knowledge which could be improved both scientific communities in tandem (none of this is to say the interpretation of results and application will be easy). Ideally, we can learn how to set up both scientific communities so that they compensate for each other’s weaknesses and blind spots.

Even aside from any N=2 effects, it is likely that the Martian scientific community will be unusually creative and productive, at least at first. Fresh scientific teams (defined by the fraction of team members that have not collaborated previously with other team members) are associated with greater multidisciplinarity and originality (as defined by disruption — “A larger disruption of a paper reflects that more of the paper’s citing papers cite it but none of its references, corresponding to higher originality”). Similarly, smaller scientific teams, “have tended to disrupt science and technology with new ideas and opportunities, whereas larger teams have tended to develop existing ones.” We can imagine that Martian science harness could these effects on a more global scale, and that synergy between large, old Earth science and small, fresh Martian science will lead to greater overall scientific progress.

On the other hand, it bears mentioning that this whole experiment with Martian culture and science could end horribly — I know there are science fiction stories where Martians become too different from Earthlings and conflict ensues. It’s not hard to imagine how this ends in “planetism” and an eventual war between Earth and Mars. All I have to say is — guys, can we just not do this? Please? It’s just so fucking predictable.


OK this all sounds great, but disconnected psychology seems a little far-fetched and who knows when we will actually make it to Mars — what can we do in the meantime? Simple: create new scientific institutions that function outside of the traditional structures of modern science (e.g. universities, large government organizations like NSF). This is easier said than done of course, but it may not be as difficult as it once was. Frustration with the incompetence of our current institutions is mounting and there is a growing recognition that we need new ways of funding, organizing, and publishing scientific research. I believe we are in the nascent stages of an institutional cambrian explosion — new organizational life forms such as Fast Grants, New Science, Altos Labs, Seeds of Science (a completely shameless plug), Longevity Impetus Grants, and Octopus may one day be regarded as the intrepid ancestors that gave rise to entire phylums of descendants, on this planet and throughout the galaxy.


Covid 9/17: Done Biden His Time

17 сентября, 2021 - 15:00
Published on September 17, 2021 12:00 PM GMT

No more mister nice guy. The dude will not a-Biden. All this vaccine hesitancy is a bunch of malarkey. If you’re not vaccinated, the President is mad as hell and he’s not going to take it anymore. 

The question then becomes, what is he gonna do about it, under what authority, and what will happen when he does it? 

There’s also the question of booster shots. They clearly work. The CDC is doing what it can to prevent or stall any expansion of them anyway, and we’ll see if the FDA manages to finish that job. 

Today the FDA will be debating booster shots, with such logic as ‘if vaccines still provide protection then we don’t need boosters’ and arguments over there not being 16-17 year olds in test samples because 18 is a magic number, and what the difference is between 50-59 and 60+ age bands, and arguments over exact dosages that might cause ‘confusion’ or ‘waste’ if they’re too low, as opposed to looking to see whether boosters work. I’d go into more detail here but I basically can’t even at this point. Even the pro-booster case is relying on the Israeli data which wouldn’t exist if the FDA had any say in the matter (and which has all the flaws of all the other Israeli data, probably, but no one ever seems to much care). The FDA is looking for any excuse to screw this up, partly in order to punish Biden for trying to do things because they would help people instead of ‘following the science,’ but also partly because they’d do that anyway, they hate giving people legal access to medicine. It’s both their one job and their least favorite thing.

If they pull it off, a lot of people are going to be checking a box claiming to be immunocompromised. Which is mostly fine with the FDA, so what if everyone is forced to be dishonest all the time in order to get medicine, that’s good actually since it means nothing that goes wrong is the FDA’s fault. Someone else will get blamed and sued. 

[Note: This week’s post is on Friday due to Yom Kippur. Predictions made based on Thursday’s data.]

Executive Summary
  1. Vaccine mandates for everyone.
  2. Acts of congress for no one.
  3. Vaccine mandates for some, somewhat dishonestly acquired booster shots for many of the same people.

Let’s run the numbers.

The Numbers Predictions

Prediction from last week: 940k cases (+0%) and 10,800 deaths (+5%)

Results: 950k cases (+1%) and 13,111 deaths (+27%).

Prediction for next week: 900k cases (-5%) and 13,000 deaths (-1%). 

There were a lot of extra deaths with many states seeing very large increases. If cases had gone up a lot as well I would conclude that I was underestimating the holiday effect. That’s still possible, since perhaps the true case level could be declining faster than I expected to compensate for it, but it seems unlikely. I looked for a large dump of prior cases, but there’s nothing big enough to account for the bulk of this. I’m going to moderately increase my estimate of the lag between cases and deaths, and assume a bunch of this increase is likely a data artifact of some kind unless it gets sustained. That makes next week’s death number hard to predict. 

The case number has less uncertainty. If cases do go up a lot this week, I’ll suspect that it’s due to schools engaging in absurd amounts of testing resulting in us finding a lot of extra asymptomatic cases. They’re testing weekly in my son’s preschool so the chance of officially counting my son’s case if he gets one went from very low to reasonably high, and false positive rates aren’t zero. Many others could be in a similar spot. 


That’s a large jump across the board, which I didn’t expect given the lagged case counts, so as usual the question is to what extent this is a data artifact. I checked for anomalies and didn’t find any, so the error would have to be something nationwide. There was a return from a holiday, although the effect seems bigger than that could explain alone. If this rise is real, we need to worry that this represents the hospitals in many places becoming overloaded.


We seem to have clearly peaked in the South. It’s less clear we are at or beyond peak in other regions. We’re definitely at least close unless something unexpected happens. 


Biden’s push has not yet resulted in a surge of vaccinations. We don’t know the counterfactual, so we don’t know to what extent various mandates and potential future mandates are the only thing stopping vaccinations from slowing further. There’s also the impact of Delta, as the case and death rates are likely a lot of what’s motivating people to finally get vaccinated (as well as motivating them to impose new mandates). 

If there was a graph that charted vaccinations over time as a percentage of the eligible unvaccinated population, that would look interestingly different. If you looked at the percentage of the population that isn’t saying they’ll never get the vaccine no matter what, it would look more different still. 

What we can definitively conclude is that the skeptics were correct regarding full FDA approval. It was not (almost) anyone’s true rejection. The impact is going to be in enabling mandates, especially the big push this week by Biden, which would not have been practical without full approval. 

Vaccine Effectiveness

Russel Monroe is on the case.

Did you know that the CDC is so slow that the FDA thinks it’s a problem? That they regularly slow walk giving out their data? That what they call ‘political pressure’ is someone pointing out that being slow means people die so maybe time is of the essence or something? Read the whole thing.

Well, finally, what do you know, the CDC says vaccines still work (paper).

This led to a certain amount of gloating, as in maybe they should have waited for this ‘study’ ‘data’ to come in before the whole scare over Provincetown and worrying vaccines might not still work. My initial reaction was that this was bad news, since my model previously thought that vaccines are even better than that.

Instead, I rapidly discovered this seems to be actual zero news of any kind. Here’s what passes for their new findings:

During April 4–June 19, fully vaccinated persons accounted for 5% of cases, 7% of hospitalizations, and 8% of deaths overall; these percentages were higher during June 20–July 17 (18%, 14%, and 16%, respectively). Using the reported 37% vaccination coverage for the 13 jurisdictions during April 4–June 19 and an assumption of 90% VE, vaccinated persons would have been expected to account for 6% of cases (close to the 5% observed). With 53% coverage reported during June 20–July 17, vaccinated persons were expected to account for 10% of cases at a constant VE of 90%; the observed 18% would have been expected at a lower VE of 80%.

Within each age group, the percentage of vaccinated persons among cases, hospitalizations, and deaths increased with increasing vaccination coverage (Figure 1). As the prevalence of SARS-CoV-2 Delta variant surpassed 50%, the percentage of vaccinated persons among cases in each age group increased at rates corresponding to benchmarks for lower VE (i.e., from approximately 90% to <80%). Increases in the percentages of vaccinated persons aged ≥65 years among COVID-19–associated hospitalizations and deaths also appeared higher than expected. During June 20–July 17, age-standardized rates of cases, hospitalizations, and deaths among persons not fully vaccinated increased weekly; among fully vaccinated persons, case rates increased, but rates of hospitalizations and deaths remained largely unchanged (Figure 2).

So they multiplied the same basic data we’ve been looking at every week, that everyone essentially already knew, and somehow this became a headline. The 80% seems very much like a round number that was vaguely close to making the basic arithmetic work out if you literally adjusted for nothing whatsoever in either direction. Good work getting published, I guess? 

Turns out that was only the first of three studies. Here’s the second one. It’s not more enlightening, and doesn’t show any signs of controlling for anything or of, once again, being anything more than ‘do what we’ve already done, but lower the sample size to cover a smaller geographical area.’ It finds Moderna more effective than Pfizer once again, which I continue to assume is because Moderna uses larger doses, and always will because that which is submitted to the FDA cannot be altered. Third study seems like it’s the second study again, except now with veterans, so the sample is non-representative. 

Good job. Good effort.

Here’s some data from the UK (hat tip: Ben Hoffman), with all the caveats that this is raw data rather than being corrected for anything. Also some good graphs on various variants, as well.  

About 90% of all people vaccinated in the United Kingdom have had two doses. Yet those with only one dose account for about 50% of all Delta cases. That implies the effectiveness of the second dose over and above the first dose in terms of observed cases is something like 90%, although again lots of corrections in various directions are missing. 

The emergency visits are less lopsided, but that’s clearly due to age, with less of the elderly stopping after one dose. If you look at the Under 50 group, you see similar ratios of emergency care visits to the ratio of cases. 

When we look at deaths, we see a strange shortage of deaths in the partially vaccinated population. Even the unvaccinated population seems to do better within the age bounds. I thought about this for a bit, and concluded that this is likely to mostly be age, as dividing into two groups is insufficient, and in the Over 50 group the death numbers go the other way. We have tons of data saying that vaccines offer better protection against death than against infection, so the extra deaths are a proxy for how much older the vaccinated population was than the other two groups. 

We also have this other data from the UK, from a preprint. Here’s their numbers against symptomatic disease.

There’s a bunch of comparisons between groups here where the confidence intervals don’t make sense. For example, note that for AstraZeneca the 40 to 64 age group effectiveness at 20+ weeks has a confidence interval that is higher than and doesn’t intersect with the interval for the 16+ age group, where the opposite (and more plausible scenario) is true during weeks one through nine. We strangely see a similar pattern for Pfizer, once again with non-overlapping intervals. 

I suppose it’s possible that this is a real effect, and older immune systems get more advantage out of the vaccines longer term while younger people have a stronger short term response, but sufficiently older people are worse on all fronts, but mostly I don’t buy this story. And especially when you combine this with the claim that longer-interval dosing regimens are more effective, this is not giving much room for vaccines in the United States to be that effective at this point. Then once again, you have to explain the basic data. 

Here’s their numbers against hospitalization.

CEV means clinically extremely vulnerable. So this is saying that the worse off you are, the less vaccines can help you. 

At least one study has found that the vast majority of those in the UK have antibodies. My best guess as to what’s going on here is pretty simple. Over time, more and more of the unvaccinated population got Covid-19, and became protected. Thus, the measured protection offered by vaccination went down. That and selection effects. 

You should totally worry that I’m looking to dismiss such findings of waning vaccine effectiveness, rather than fully going where the evidence takes me. You should also worry that having found reasons why the first few such alarm bells were wrong and dumb, I’m going in assuming the new ones are too. I’m doing my best not to do any of that, but I may not have fully succeeded. 

Booster shots improve vaccine effectiveness, especially if you have J&J and then get a shot of an mRNA vaccine. So The New York Times is now pointing out not only that this is a good idea, but that all you have to do to get your shot is to lie. Interesting. 

Vaccine Mandates

It’s the big one this week. Biden is done playing, and it’s time to pull out the big guns

What can he actually do though? In theory we have a constitution and a republic, and the President has limited authority, and Biden is at least firmly committed to maintaining the illusion of the illusion of limited Presidential authority – e.g. when his lawyers all said his eviction moratorium was unconstitutional, he kept calling until he at least found one lawyer somewhere willing to say it might be constitutional, and when the court does throw it out, he’ll at least try again in a slightly different fashion once again backed by a legal opinion. Sadly, it’s an upgrade.

And of course, there’s a strong correlation between not wanting to get vaccinated and muttering various things of variable accuracy about ‘the constitution’ so it’s not surprising that I learned about Biden’s new mandates through a post like this:

Here’s a summary of the full six point plan.

My subcategories are slightly different, but similar.

Some of this is basic, ‘and you’re doing this now? What changed and why didn’t this happen on January 20th?’ stuff like making at-home tests more affordable, and expanding tests at community health centers and pharmacies, increasing support for hospitals and deploying monoclonal antibody ‘strike teams.’ Also ‘providing resources for FDA review’ for the vaccines does seem like a good second-best idea. It’s a little strange that we finally worked all that stuff out in the same week, almost like it was held back, but hey. Five almost purely positive bullet points here.

We have the ‘doubling fines for masking violations’ line which is the standard ‘we’re super duper mad now so to show how bad we are we’ll increase fines’ thing. I find it tiresome but it doesn’t matter. 

We have the economic section, which isn’t necessary, the economy is doing fine, but sounds like it’s not incinerating enough money to worry about relative to the other money incinerations we’re doing (while noting that if we were literally incinerating money that would be far better, cause then you can print more, it doesn’t actually cost much.) 

Next up we have the booster shots, under ‘further protecting the vaccinated.’ I’m happy to see Biden going ahead with this and ignoring the FDA’s foot dragging. I would have been happier if I’d seen an explicit ‘including people who got Moderna or J&J’ here, especially J&J. Offering boosters to those who got J&J (if they want them) is the next ‘if you cared at all you would do this’ task. 

We have the demand that students at schools be regularly tested (along with the staff) despite the staff being vaccinated as per the previous bullet point and the children being children. I continue to be confused how many people see such demands as reasonable but I’ve accepted that they do and it no longer surprises me. 

Finally we have the actual mandates, for any company with 100+ employees (with a testing option), for all federal workforce and contractors (except the USPS, whose employees are so empowered they are sometimes easier to convict of federal crimes rather than to fire), for any Medicare or Medicaid participating hospital (the most explicit of all I-needed-a-reason qualification out of several),  and for all school employees (which he admits he can’t do but is encouraging the states to do). 

It’s clear that Biden wanted to do, which was to impose a full vaccine mandate on all Americans, with or without an intentionally overburdensome testing alternative. If he felt able to issue a full mandate here, I presume he would have.

Italy is doing mandatory vaccination outright. The article in question predicts a huge wave of opposition as a result. I don’t. 

Here Biden is doing a strange mix of both. He’s exercising his authority in a way that attempts to look as broad and strong as possible, but which is technically forced to hide behind OSHA and end up as a convoluted mess that doesn’t hit all the places it would like to. The strangest part of the new rules is that they didn’t include requirements for interstate travel, which he has clearer authority to issue than the rules he did issue, and I’m expecting those to probably follow unless conditions improve quickly. If we are going to go this far (and presumably then keep going if conditions don’t improve), I do think it would be better to be clear on what we’re doing, and outright mandate the vaccine, and be done with it. Mason has similar thoughts.

The federal employee mandate seems fine and good. As I’ve said before, employers should get to impose such mandates, so this seems like a very reasonable use of authority. I’m glad this one happened.

The requirement on hospitals also seems like a good place to impose a mandate. If you work at a hospital, yes, you absolutely need to be vaccinated. You’re around sick people a lot, who aren’t there by choice, and your entire job is to help those sick people get well and not die. If you refuse vaccination, there’s multiple good reasons you shouldn’t keep working at that hospital. Alas, many nurses in particular are refusing the vaccine, they’re willing to protest and be loud, and there’s a shortage of nurses, so it’s hard/expensive for a given hospital to do the correct thing. Imposing the requirement from above lets the hospital point out it’s not their decision, and the nurses can’t threaten to go to a different hospital. 

Or basically this:

I do despise the mechanism of authority here, which is where the fact that the Federal government has already stepped up and royally messed up the health care system by hijacking its payments structure is then used to hold any and all participants hostage to the whims of the Feds. Also, the idea that such moves don’t fit the definition of ‘coercive,’ as the court rather ludicrously ruled regarding Medicare funding, defies common sense. It’s coercion, it’s an end run, it’s a mockery and all that, and this should be noted whenever it is invoked as such, but those are the rules in practice.

But is the mandate on employers here constitutional? It certainly isn’t obviously constitutional, and the tie to OSHA and large employers is a clear indication that this wouldn’t be constitutional without OSHA. 

When first asked about the mandate soon after this, my reply was that I noticed I was confused how it was constitutional. 

 David French breaks it down for us. I’m not taking his word for it, since there’s (as one would expect) broad disagreement, but this seems broadly right to me as a matter of law.

This post in Reason by a legal expert lays out another perspective, gets into the Exact Words this is likely to come down to, and is also not sure what the outcome will be. It makes a convincing (to me) case that allowing this move would likely represent a de facto large expansion of OSHA’s powers, and hence of effective presidential power, to impose arbitrary rules because they feel like it. I saw a warning asking people not to challenge the mandate for exactly this reason, as it may force the court to set a bad precedent it would prefer to avoid.

Worth noting that OSHA definitely has a blood pathogens policy already, which some people think makes this clearer cut than all that. Others disagree.

This is my current view of the situation.

  1. Vaccine mandates are legal if done under act of Congress.
  2. But that’s if Congress passed a law, which they didn’t, and won’t, because good luck with that.
  3. Luckily for Biden, a long time ago, Congress kind of told OSHA ‘do whatever you want in the name of safety to companies with 100 workers.’
  4. Using the commerce clause like this is pretty much bullshit.
  5. But, says SCOTUS, mostly legal.
  6. Congress delegating this way is terrible.
  7. But, says SCOTUS, mostly legal.
  8. In practice, SCOTUS will probably (but not definitely, I asked Polymarket to get us some real odds!) uphold this mandate.
  9. Which, again, is pretty much bullshit.
  10. But their call.

Whether or not it is upheld, the fig leaf of a fig leaf has been upheld and reinforced. One can even say the fig leaf here is genuine, which is even better. I do believe that Biden thinks this is constitutional, that he made compromises to make that true, and that this combination of beliefs is reasonable.

Meanwhile, employers will put this into practice before there is even a formal rule let alone enforcement or a court challenge, and employees will have to deal with that reality.

Private employers get to benefit from the same dynamics that benefit the hospitals. If two thirds of jobs have to mandate vaccination, including you, the employees can’t reasonably blame you and have far fewer prospects and places to go. Many of them will no doubt blame you anyway, and there’s no shortage of bad legal takes, but it will help. It will also help with worries about protests and boycotts.

Surveys are saying that if you told the unvaccinated they had to either get vaccinated or they were fired, that most of them would choose to be fired

I flat out do not believe these people. At all. Zeynep explains, but with less willingness to endorse as strong an effect size as I have.

The testing option is annoying, but how many people would actually even quit their job over the weekly test if there was no vaccination option? Are anything remotely like two thirds or more of those who are currently on the hill of not being vaccinated prepared to die on it? I’d be shocked if one in ten pulls that trigger. Especially because we’ve already run this experiment a few times, to varying degrees.

(We also have the hospital in New York that had to stop delivering babies noted above, but this seems like a Not Necessarily the News case, in the sense that we have the one example of this happening that’s Big News, rural hospitals likely don’t have much short term slack here, and we haven’t heard of other examples of similar situations.)

So under 1% of employees, or about 2% of unvaccinated employees, ended up quitting over this, and that was with other places to go. That sounds right to me.

That doesn’t mean this isn’t potentially a big deal. 

Let’s say that 0.5% of all workers covered by the mandate quit. Biden’s mandate covers (very roughly) 100 million workers, so that’s still five hundred thousand people who would lose their jobs. That’s a lot, and it’s potentially a big deal. If you’re willing to hire unvaccinated workers, and can keep your work force small enough to allow that, you suddenly will have a much easier time finding otherwise good help. For those covered by the mandate the opposite will happen, there will be lots of jobs to fill and less people who can fill them. It’s a potentially substantial subsidy for small business. On the other hand vaccinated workers will mostly prefer to work with other vaccinated workers, so this could also offset or go the other way.

Others will doubtless decide to start new businesses to show them who is boss and get their revenge, which is presumably great. More people should start new businesses, and I truly wish them all the best of luck. Except the ones whose businesses are about fighting against vaccinations. Everyone else, let’s do this.

Note that some of the opposition to all this might not be as principled as it is made out to be.

Here’s the archived version.

My position isn’t fully static but mostly hasn’t changed, and I’m not looking to have the same discussions again, but for clarity I currently believe:

  1. Employers, both public and private, should be free and encouraged to mandate vaccination, and those that forcibly interact with the vulnerable especially.
  2. Indoor venues should be free and encouraged to mandate vaccination for entry.
  3. Non-essential (to a given person’s life) indoor public spaces should require vaccination, as should medical facilities, nursing homes and other such places.
  4. Vindictive rules that destroy value rather than protect it are not good. A rule makes sense if and only if it creates sufficient value by creating safer spaces and controlling spread. Justifying a rule by saying that the threat to destroy value will cause vaccinations should be out of bounds.
  5. No travel bans. They’re deeply stupid, even when they’re not things like banning European travelers for less than no reason, which we are somehow still doing.
  6. If we are go past the above policies, we would be better off instead acknowledging what we’re doing and outright mandating vaccinations. We’ve done it in the past, it’s not some scary new escalation.
  7. If we did decide to do that, I’d support it as part of a package where we otherwise chilled the f*** out and lifted other rules, the way Denmark recently declared Covid not a societal threat and lifted all restrictions, but not otherwise.
  8. In general, I believe the most important crux with those who oppose mandates is that I believe vaccine mandates are primarily substitutes for destructive alternative restrictions that are worse for freedom, and those who oppose mostly think they are mostly complements that ramp up restrictions of all kinds. If I was convinced I was wrong about this, I’d be forced to reconsider.
  9. I have another crux with many who oppose mandates, which is that I think it’s freedom-enhancing to give people choice in association, such as not wanting to employ the unvaccinated, so I’d still support private mandates unless I was convinced I was wrong about that as well, the same way that whether you’re vaccinated influences my decision of whether to sit down with you to have lunch and banning that seems rather terrible. I draw a sharp distinction between ‘you are allowed to mandate vaccination’ and ‘you must mandate vaccination’ whereas very few people in surveys seem to notice the difference.
  10. Some others have the crux that they don’t think vaccinations much matter, or they think vaccinations are unsafe (or not sufficiently proven to be safe), or that if you’re vaccinated that’s sufficient protection that you shouldn’t care who else around you is unvaccinated. It would be very hard to change my mind on any of those, but changing my mind on any of those would indeed change my conclusion.
  11. There are additional cruxes in various other places, this list is not complete.
  12. You’re still free to make comments and I’m still free to do my best not to engage further, and I intend to do my best.

A court has ordered that New York’s vaccine mandate for hospital workers is temporarily suspended for those seeking religious exemptions

Having trouble getting the necessary paperwork? No problem, our church will take care of that as long as you become a paying online member. Markets in everything, solve for the equilibrium, blatant lies are the best kind and all that. 

At least for now, the NBA will not mandate vaccinations for its players. They’re going to do a bunch of social shaming instead, like having them travel on a different bus. The league has an 85% vaccination rate, which is good, but far from 100%. The Knicks, Nets and Warriors will be exceptions due to state law, which should give them a small but meaningful edge this season. I wonder if it’s priced into the gambling lines.

In other vaccine mandate news, the employees don’t always oppose mandates, and also sometimes it’s time to take a bold stand against an overbearing union and overly demanding workers, but I’m guessing this wasn’t one of those times.

NPIs Including Mask and Testing Mandates 

A case that the Bangladesh mask study didn’t tell us much of anything. I don’t think this is quite fair, they did have other stuff going on as well, but I do think it’s mostly right. 

Meanwhile, the mask mandates might in practice have missed a spot?

The craziest part about this is that if you’re treating a bunch of sick people, you kind of should wear a mask even if Covid-19 didn’t exist, because it also helps prevent a lot of other things that could be causing this child’s fever. Yet it also shows the logic of authoritarianism. Sorkin’s arguments about why the masks were necessary, and why he (quite sensibly) walked out have nothing to do with a physical world model, and everything to do with the rules and regulations. 

Mostly news was quiet on this front this week.

Think of the Children

Our child’s preschool, like many such places, required a Covid test be taken in the few days before school. My wife’s understanding was that buying a rapid test and administering it herself would not suffice. She called a service that was supposed to come to us and provide the test, but they failed to show up with no explanation, then we called them, they said they’d call us back, and they didn’t call back either. She then took our son to six urgent care centers to try and get a test, but all of them turned us away because they already had lines out the door and no space, presumably because everyone else was doing the same thing. So the next day, I showed up at 7:30am with him to ensure we’d be first on line to get an appointment, they opened at 8, they declined our insurance at 8:15 and charged $180 so we’d have a signed official document that a thing had been needlessly stuck up our son’s nose while I forcibly held his head in place and his arms back, yeah that’s fun times, we were let in around 8:35 and someone did in fact stick the thing up his nose while I held him forcibly in place, around 8:55 the doctor came in and handed us a piece of paper that affirmed they’d stuck it up his nose so now everyone could be happy they’d generated enough extra Covid-19 exposure from hanging around inside an urgent care center that school could begin. 

Yeah, great system, no trouble at all, keeps everyone safe, doesn’t waste valuable health care system resources, it’s all very reasonable and I have no idea what anyone is complaining about. 

They’re also going to do it again, every week, forever, but at least in the future they’ll be arranging the logistics and bringing someone into the classroom to do it. 

What do you want us to do, open a window? That wouldn’t be safe.

I complain about the weekly tests, because mumble mumble something about the children being four years old and fully masked at all times, but the testing is the least crazy part of the whole Covid protocol. It’s much less crazy than the part where parents are kept out of the building, or when kids are forced to eat on the sidewalks or given all-liquid lunches so they can keep their masks on. It’s even way less crazy than forcing them to wear masks, because do the math:

Assuming the masks are poorly fitting isn’t fair, but 50% still seems like a fair practical estimate here. As does 90% for daily rapid testing. I’m not going to go over that logic again, but when someone is highly infectious, the tests will almost always be positive, so the tests are doing the job they need to be doing. If anything, the extra sensitivity from PCR would actively be harmful here, since it would force stupid reactions. 

The true cost, if we were to use reasonable rules, of rapid tests ae a small amount of time and a few bucks. That seems way better than all-day mandatory mask usage while trying to learn, and you get a much bigger payoff. 


In practice, that’s right. The point of rapid testing in context is that it can be a substitute for other prevention and can make people less crazy, but if it’s a complement that doesn’t otherwise make anyone change anything except when it’s positive they panic, then it’s only making things crazier. Which makes it unclear if Mina is even helping, since as he points out it seems implausible that anyone will actually take his substitution solution, rather than Following The Science. 

Last week there were good questions about whether the crazy college restrictions were one outlier college or if they were typical, with the presumption that they were a Not Necessarily the News extreme example. We now have an Ivy, Brown, doing something similar at least for a while: 

Note the first line, amid an increase in positive asymptomatic Covid-19 cases. This implies there likely was not even a single symptomatic Covid case. 

This seems right. Given current conditions, if you test everyone weekly, eventually you’ll find some cases, then you can blame that on whatever you want to blame it on, and offer your Sacrifices to the Gods. As Tracey points out, the last bullet point, ‘Refrain from Small-Group “Hopping,” is especially chilling, essentially telling people they should only have one isolated friend group. You know, like they had in Columbine. And didn’t massively overreact to in a way that pointlessly traumatized an entire generation.  

Whereas this Washington Post article’s true title is Parents Acting Crazy. If you think like this:

Then whatever happens to be top of mind will drive you crazy. If you take one small probability among dozens or hundreds or more, and label it ‘the salient risk’ then a large fraction of parents (and other people) go insane. This describes parents who are letting the risk of their young children getting Covid taking over their heads. ‘It’s like a fire alarm every day.’ If you have a fire alarm every day, eventually one considers the possibility that they’re not all that related to fires, whether or not you realize that in context your child is a dragon

No matter how crazy the behaviors are, all of these stressed out, ruined lives are mostly over nothing, but that noting is still living rent free in a lot of heads, and every bit of that suffering and loss is on the FDA and its pointless demands and delays, and on the CDC and everyone else who sends the message to behave in exactly this way. 

When people anticipate vaccines are immanent, they decrease how much social distancing they do. That’s the opposite of rational, but it’s a known pattern, and in this case it means we should see things improve for kids a bit earlier. The paper’s response of course is to suggest that ‘stricter measures will be required’ due to this decreased distancing, on the assumption that the policy goal is always more distancing.

A new Long Covid study on children is available from Israel.

I highlight this both because it’s how I first found the study, and also because of the mindset it indicates. Paul would have been skeptical, but the right person amplified the Twitter link, so that’s that. 

Wow, that sounds scary. Can we have some details?

So what they’re calling ‘Long Covid’ is mostly things that last less than six months, with a large majority of children reporting all symptoms vanishing by then. I consider this a misleading headline.

No control group at all doesn’t mean the results are totally useless, but it’s kind of close given how much Covid prevention messes up mental health. I don’t see the link to the actual paper, so I can’t easily dig in further, and given what I’ve seen I’m going to stop there. 

My proposal is to survey kids who never tested positive for Covid to see if they’re reporting such symptoms, then do antibody tests on those kids, and check to see how much correlation is found between having Covid antibodies and the survey results. My prediction is that the effect size will be very small. This vaguely sounds like it would be considered “ethical” so someone please make this happen.

Perhaps the true pandemics among children lie elsewhere.

Why do we need an ‘option’ at all, exactly?

At least some good news, the phase where we can more fully blame the FDA for the lack of approval for child vaccinations is fast approaching.

The good old 5211.

Here’s data on exactly what we were forcing kids to do in various places, at various times during the pandemic.


Kelsey Piper, after her recent self-recommending appearance on the podcast Rationally Speaking, wrote to host Julia Galef with this rather strong update:

Kelsey isn’t always right but I trust her to be making an intellectually honest effort here, and ‘the hospital the trial says it was conducted at says the trial never happened’ is a pretty big deal, and if someone commits that level of fraud once at a minimum I would assume everything else they’ve done isn’t real until proven otherwise. How much to downgrade the not-proven-fraudulent evidence in the face of increasingly frequent findings of fraud is a hard question, but the answer sure isn’t zero, and I don’t think our prior on all this should have been great to begin with. 

If you dig into the thread there’s also some interesting discussion about secondary attack rates, which is a relevant question on its own in addition to its bearing on the plausibility of the Ivermectin studies, I haven’t had time to dig into that question yet.

In Other News

Also from Julia’s podcast with Kelsey, here’s Kelsey on the FDA and CDC. 

This has been my experience with the FDA. At least they get their facts right. They don’t do the right things with those facts, and they’re way too slow in gathering them, but to the extent that they offer you facts it’s fine to run with them. The CDC I am less confident in on these fronts, especially after Provincetown and the studies listed in an earlier section. It’s the whole thing where they kind of said vaccinated people couldn’t transmit Covid when that didn’t make any sense and then got everyone to panic by saying that maybe it didn’t prevent transmission at all which also didn’t make any sense, and now are making a big deal out of widely available basic statistics, on top of everything that came before, and the reports that the CDC withholds data. 

A detailed look at exactly how how the FDA messed things up.

A review of Scott Gottlieb’s book Uncontrolled Spread. I’ll requote the part Alex Tabarrok quoted.

If there’s one overarching theme of “Uncontrolled Spread,” it’s that the Centers for Disease Control and Prevention failed utterly. It’s now well known that the CDC didn’t follow standard operating procedures in its own labs, resulting in contamination and a complete botch of its original SARS-CoV-2 test. The agency’s failure put us weeks behind and took the South Korea option of suppressing the virus off the table. But the blunder was much deeper and more systematic than a botched test. The CDC never had a plan for widespread testing, which in any scenario could only be achieved by bringing in the big, private labs.

Instead of working with the commercial labs, the CDC went out of its way to impede them from developing and deploying their own tests. The CDC wouldn’t share its virus samples with commercial labs, slowing down test development. “The agency didn’t view it as a part of its mission to assist these labs.” Dr. Gottlieb writes. As a result, “It would be weeks before commercial manufacturers could get access to the samples they needed, and they’d mostly have to go around the CDC. One large commercial lab would obtain samples from a subsidiary in South Korea.”

At times the CDC seemed more interested in its own “intellectual property” than in saving lives. In a jaw-dropping section, Dr. Gottlieb writes that “companies seeking to make the test kits described extended negotiations with the CDC that stretched for weeks as the agency made sure that the contracts protected its inventions.” When every day of delay could mean thousands of lives lost down the line, the CDC was dickering over test royalties.

In the early months of the pandemic the CDC impeded private firms from developing their own tests and demanded that all testing be run through its labs even as its own test failed miserably and its own labs had no hope of scaling up to deal with the levels of testing needed. Moreover, the author notes, because its own labs couldn’t scale, the CDC played down the necessity of widespread testing and took “deliberate steps to enforce guidelines that would make sure it didn’t receive more samples than its single lab could handle.”

This mostly isn’t new information. Still, some new details, and a healthy reminder. Listen all ya’ll, it’s a sabotage. This is more than ‘failed utterly.’ Failed utterly implies the goal was success. How would you differentiate what is described here from enemy action?

Imagine a private corporation had done this. What would have happened next?

An analysis of how long tests can offer you confidence that you are not infectious, and thus can safely interact with others, if you believe yourself to be at risk of having been infected. He uses physical world reasoning that seems at least reasonable, and reaches these conclusions:

This treats Safety (with a capital ‘S’) as a Boolean state, which of course is a highly crude approximation, but the idea is that this is about how long you have (again, after the sample is collected) in which your risk is super duper low. And it emphasizes that speed matters a lot, and PCR is only good anti-transmission tech if its turnarounds are good.

Also worth emphasizing is that, as Mina keeps tirelessly pointing out, that PCR has a lot of true positives in the sense that the person is infected, but where the person isn’t infectious to others – the test is ‘too sensitive’ for this purpose, especially during recovery. Whereas the rapid tests aren’t quite sensitive enough compared to what would be ideal, but they’re still pretty damn good in terms of practical benefits. 

NYU hospital in New York City gives excellent care, as long as you are very very certain you will not be the one who has to pay for it. And they sent me an email with this subject line, which buys so much goodwill I might bother completing the attached survey:

I got pushback on the Australia section last week, saying things weren’t that bad. This week, we learned both that a lot of Australians are in a place where they want more than six drinks a day, and the government is not letting them have them. Draw your own conclusions. 

Oh, and yes, the Australian police are already seeking to use the check-in data from the Covid app, why do you ask?  

Meanwhile, here’s how their vaccinations are going. It’s worth noting, if you can’t leave your house until everyone is vaccinated, what’s your incentive to accept a second-best vaccine rather than wait? 

From last week, Scott Alexander presents Too Good To Check: A Play in Three Acts. Recommended if you haven’t already seen it. I claim victory, since I’m able to feel smug about the conclusion rather than any of the previous things that were too good to check. It conveniently confirms all of my priors.

Good news though, at least one life was saved.

Yes, I’m aware this is a joke.

A thread on whatever happened to Novavax. I don’t know which of those explanations are right and which of the right ones are doing the work, but it’s hard to look at this and think ‘yep, we did everything we could, there was no way to spend more money or lower barriers and produce more vaccine doses.’

A thread reminding us that all the details of how we give out vaccines were basically guesses as to what would quickly satisfy the FDA, they make no sense and we’ll likely never know the right answers. 

Alaska state senator says she is unable to reach the capital for votes because she is banned from Alaska Airlines after she refused to comply with the federal mask mandate, a move she calls ‘unconstitutional’ because words do not have meaning to such folks. It’s also clear she damn well can reach the capital anyway, it’s just expensive and annoying to go back and forth given where she’s chosen to live. Ah well. We all make choices in life. 

Speculation that differences between states with different vaccination rates might have a lot to do with different rates of obesity, since the two happen to be correlated. Being obese makes Covid much more dangerous, yet we have established such strong norms against pointing out that it’s better to not be obese that we’ve made actual zero effort to use Covid as an opportunity to help people stop being obese. So I’d like to take this opportunity to encourage everyone out there to do what it takes to get down to a healthy weight, and to get yourself to a place where you feel good every day instead of feeling bad. I continue to put large effort into maintaining this, but yes it can be done, and yes even before Covid it was totally, totally worth it. Also, there’s a new drug that sounds promising, see the Not Covid section. 

Also, while we’re at it, reminder to make sure to take Vitamin D.

This the latest effort to quantify Long Covid, which I found a link to via LessWrong, with the author thinking of it as similar to chronic fatigue syndrome. There’s enough thoughtful effort here that it needs to be included, but what sticks out at me is the continued reference point shifts in what constitutes Long Covid.  Surveys ask people if they have any symptoms at all, that’s considered Long Covid, then that’s comapred to CFS because the most typical symptoms are most typical of CFS. Therefore create association of Long Covid as equivalent to lifelong crippling fatigue. And as usual, I have zero faith in what passes for controls in such measurements. There are then some calculations I’d argue with even given the premise. And if things really were this bad, and a good fraction of a percent of people who got Covid ended up permanently crippled, many of them unable to work normal jobs, I’d point out how many people already did have Covid by this point, it’s at least a large minority of the population. So this seems like the kind of thing that would be impossible to miss on a population level. 

This chart still throws me for a loop when I see it:

Any one of these would be possible, but thinking about them relative to each other is rather strange. I would think that something that ‘affected your ability to work’ as well as family life and ability to provide care would impact one’s finances more often than that.

One interesting thing about the post is the framing of the takeaways. The author identifies this primarily as a strong argument for vaccination. His analysis suggests that vaccination cuts risk of Long Covid by an order of magnitude:

Reminder: This is someone else’s model, not endorsed.

Note that this is despite him thinking vaccines are only 50% effective against infection by Delta, which is well outside my reasonable bounds for mRNA.

He also notes that it seems like many Long Covid victims improve when given the Covid vaccine, long after their infections:

It’s not the only benefit from vaccines. They can also help relieve some CFS.

Apparently, about 30-40% of Long Haulers report an improvement in their symptoms after being vaccinated. It doesn’t cure the symptoms, and we’re not sure how long that relief lasts, but at least it looks like it helps some of them. How can that be?

To answer that question, we need to understand what might cause CFS in Long Haulers.

What Might Cause CFS in Long COVID?

There are three main hypotheses on why these symptoms last so long:

The COVID-19 virus (or other equivalent viruses for other CFS) survives inside the body forever, pushing the immune system to chronic alert level.

The immune system becomes dysregulated, and stops fighting another set of viruses, which now are more free to roam and affect the body, which the immune system then needs to fight in a chronic war.

A dysregulated immune system attacks the body itself. In this case, it would be an auto-immune disease.

There is evidence for all 3, across both Long Haulers and patients with other types of CFS. Tests have found in the blood remains of viruses and high levels of antibodies.

I started going down this rabbit hole, but I stopped when I saw the complexity of the disease and how little we still know about it.

It seems like this data point isn’t being followed up on sufficiently. Either this is some sort of placebo effect, in which case that’s a huge deal and we should ride that placebo train for all its worth, or it’s not, which is an even bigger deal. It also, under this model, has strong implications for CFS that’s caused by other diseases, since the story is that CFS is often caused by the aftereffects of some infection. Have we tried giving people vaccines in those cases? Failing that, have we tried re-infecting them with whatever they had before as a next best option? CFS is really, really bad, so it seems worth trying?

Along similar lines, if you got vaccinated, got Covid and then got Long Covid anyway, this model suggests getting a booster shot might still help you some of the time, and it’s a low risk thing to try, based on my understanding of the plausible mechanisms here.

It also emphasizes once again that Covid suggests we don’t pay enough attention to fighting the spread of other infectious diseases, as they may be far more destructive than we realize, and thus far more worth fighting.

Not Covid

At the movies: I saw Shang-Chi and the Legend of the Ten Rings, another Marvel origin story movie. It exceeded expectations and lived its best life. Then I saw Free Guy, which is better and definitely underrated, although I am very much in the target audience here. I have a model of why critics didn’t respect it properly, but no spoilers.

Biden partners with Australia to build nuclear submarines. Insert what you think I almost said next here.

Saw this and I think it may explain quite a lot. I preliminarily endorse OpenPhil or another sufficiently large EA donor fully funding this. 

I saw this, in an interesting discussion on the future of weight loss (hat tip: MR):

Way to bury the lead. I briefly got theoretically super excited, because can you imagine how amazing it would be to be able to take negative amounts of marijuana, as if it were something coming out of Wonka’s factory? Appetite suppression is nice at times, but imagine being smarter and faster and more motivated to get things done when the situation calls for it. Then, you could pull a reverse and feast. Alas, I’m pretty sure that’s not how any of this works, and also the drug in question has some issues regarding mental health and was never approved – although if it did what my imaginary version did I wouldn’t care, cause with those benefits I’d hold things together just fine, and like Eliezer I’d be immune to the Gods of Irony

The real lead in weight loss of course lies elsewhere, in that they seem to have finally found a drug that works.

 Alas, such a thing is of no help to me. My twin problems are a love of calorically rich foods and an impossibly slow metabolism. I lost the weight the hard way, and the majority of days, decades later, I continue to fight the good fight and optimize my meals with more care than anyone else thinks is reasonable, a policy for which I have no regrets. As I wrote above, it’s all been totally, totally Worth It. 

The problem is that this new would-be-miracle drug (I haven’t looked into it in detail) works via appetite suppression. That doesn’t help me at all, and if anything takes joy away from my life. Le sigh. It also suggests a very clear model of why weight loss attempts almost always fail, and what causes the few successful ones to succeed. 

Even more interestingly, the effects aren’t limited to food:

If you could turn that knob down in other areas, would you want to? 


How feeling more secure feels different than I expected

17 сентября, 2021 - 12:20
Published on September 17, 2021 9:20 AM GMT

This year, I’ve been feeling more emotionally secure, thanks to various kinds of internal and external work (the stuff at https://attachmentrepair.com/ being one notable example).

The effect of this doesn’t always feel like I expected it to feel.

I once thought that in order to not worry so much about whether people like me, I would need to become convinced that they do like me. Or at least I’d need to convince myself that at least some people like me, strongly enough that it would reassure me in general.

That does play a certain role. I probably wouldn’t feel the way I do now, if I hadn’t managed to internalize evidence about some people actually liking me quite a bit. And if I ended up in a conversation where it was obvious that someone hated me, yeah, that wouldn’t be fun.

But my actual internal experience of feeling more secure in interacting with other people, isn’t necessarily that I put a higher probability on the other person liking me than I did before. Often it’s more like noticing that thinking about how much the other person likes me, isn’t a particularly rewarding frame of mind to be in. And that it’s possible to sometimes at least drop that frame and let a more rewarding frame arise.

If I had to verbalize it, there’s sometimes a shift in stances that’s accompanied by a thought that goes along the lines of “it’s possible that I’m secretly annoying this person and they totally hate me without telling me that and I’m doing nothing to deny that possibility, but I’m going to get more out of this interaction if I just focus on something more rewarding – such as the general flow of this conversation – unless I get a clear indication that I’m doing something wrong”.

Except that that’s not quite right, because what I do then isn’t me trying to focus on something more rewarding. Nor is it an attempt to suppress the analysis about what the other person’s opinion about me is. Rather it’s just a remembering to inhibit the part of me that’s about to start focusing on that analysis, and then letting something else arise from that space on its own.

And that’s becoming more automatic, so that I don’t necessarily even need to do that anymore. If the thought of “it’s possible that this person secretly hates me” crosses my mind at all, it may do so very quickly and then be gone.

(I originally wrote large parts of this a month ago, for the forum of Michael Ashcroft’s Alexander Technique course; if I had written it from scratch now, I’m not sure I’d have been able to verbalize that shift in stances anymore, because it has become automatic/subtle enough to miss.)

All of this is not to say that I wouldn’t still feel significantly anxious in some social situations that happen to trigger that. Just that there are increasingly situations where I don’t, where I previously did.

I recall a conversation I once had with someone, when I was still a lot more worried about this kind of thing. When I said I was worried about what other people think of me, she said “but you can’t actually know what others think of you, so why focus on that?”

From where I’m at now, I can understand her confusion.

If you’re feeling secure, what others think of you is just a question like any other, such as “I wonder what they had for breakfast today”. You can choose to think about it, but it’s not intrinsically compelling. If it feels like an unanswerable question that it doesn’t give you any benefit to think about, you can just… not think about it. Why wouldn’t you think about something else? There are lots of more fun things to think about!

But if you’re feeling insecure, you can’t just choose not to think about it. Someone not liking you, or even possibly not liking you, feels on a deep emotional level like danger. It’s much more like “is my partner going to abandon me” than it’s “what did these people have for breakfast”. Because you’re so sensitive to rejection that even a stranger disliking you feels a little bit like being abandoned by a loved one, like nobody will ever love you.

From that frame of mind, my friend’s question of “you can’t know, so why care” felt incomprehensible. There was a sense of “yeah I can’t know, and that’s exactly what’s horrifying and it’s why I have to keep worrying about it”.

Because “you can’t know what other people think of you” felt, on some emotional level, a little bit like “you can’t know whether anyone will ever truly care about you”.

So from that frame, I thought that when I’d get to the point of feeling more secure, it would feel like putting a higher probability on “the people who I’m currently interacting with like me”. Since emotionally “other people liking me” and “I’m worthy of love” felt like the same thing, even if I intellectually understood that this doesn’t make sense.

But while feeling more secure does also somewhat involve putting a higher probability on other people liking me, it also involves that question becoming separate from the feeling of “I’m worthy of love”. A lower probability on being liked, doesn’t necessarily imply lower worth.

And that’s something that I might have been able to understand intellectually before, but I wouldn’t have been able to imagine what the actual experience of it feels like.

Crossposts: Facebook, Twitter.


Research speedruns

17 сентября, 2021 - 09:54
Published on September 17, 2021 6:54 AM GMT

The 'research speedrun' is a format that I've been playing with on my blog for the last year or so. It's been more popular than I expected and it looks like there's a lot more that could be done with the idea. So I thought I'd write it up here and see if anyone else wants to experiment with it themselves, or suggest different things to try.

The format

It's a very simple format, so this section will be short:

  • Pick a topic
  • Set a one hour timer
  • Find out as much as possible about the topic before the buzzer goes off while writing up a live commentary
  • Do a very quick editing pass to fix the worst typos and then hit Publish

So far I've done speedruns on Marx on alienation, the Vygotsky Circle, sensemaking, the Prussian education system, abacus schools, Germaine de Staël, and mess.

What I've used it for so far

Obviously, there's only so much you can learn in an hour - calling this 'research' is a little bit of a stretch. Sometimes I don't even manage to leave Wikipedia! Even so, this technique works well for topics where the counterfactual is 'I don't read anything at all' or 'I google around aimlessly for half an hour and then forget it all'. Writing notes as I go means that I'm making enough active effort that I end up remembering some of it, but I know the process is timeboxed so it's not going to end up being one of those annoying ever-expanding writing projects.

Here are a few rough categories of topics I've tried so far:

  • 'Sidequests'. Speedruns are great for topics that you find interesting but are never going to devote serious time to. I have a very minor side interest in the history of schools and universities, so if I come across something intriguing, like Renaissance abacus schools, it's a good way to learn a few basic things quickly. I have one or two more ideas for speedruns in this area.

  • Historical background. An hour is quite a good length of time to pick up a few fragments of background historical context for something you're interested in. One hour won't get you far on its own, but the good thing about historical context is that it builds nicely over time as you get a better picture of the timeline of different events and how they affect each other.

  • Finding out what something is at a basic level. I did the 'sensemaking' speedrun because I'd heard that term a lot and had very little idea what it referred to.

  • Dubious or simplistic claims. The Prussian education system post was in this category. If you read pop pieces about education by people who don't like school very much, there's often a reference to 'the Prussian education system' as the source of all evils, maybe alongside a claim that it was set up to indoctrinate citizens into being good factory workers. If you're starting with an understanding this simplistic you can improve it significantly within an hour. (The Prussian education system really did introduce many of the elements of modern compulsory schooling, but the factory workers bit doesn't really hold up.)

  • Random curiosity. The Germaine de Staël one happened because I was reading Isaiah Berlin's The Roots of Romanticism and she sounded like she might have had an interesting life (she did have an interesting life).

What I've got out of it

Sometimes the answer ends up being 'not much', but in that case I've only wasted an hour. I expect these to be pretty high variance. Some outcomes so far:

  • I discover that a topic is more interesting or important than I realised, and decide to spend more time on it. This happened with the Vygotsky Circle post - the actual speedrun was frustrating because I didn't find any good quality sources about the intellectual scene, but I did realise Vygotsky himself was more interesting than I'd realised and ended up reading and making notes on his book Thought and Language.

  • I get good comments from more informed people and end up learning more after the speedrun as well. The sensemaking post was like this: in the speedrun itself I learned about the term's origins in organisational studies, but not so much about the more recent online subculture that uses the term. After I posted it it ended up attracting a fair number of comments and twitter responses that explained the connection. (The root tweet is here, for people who have the patience to trawl through a branching twitter thread.)

  • I get exactly what I bargained for: an hour's worth of basic knowledge about a topic I'm mildly interested in.

Another minor benefit is that I keep my writing habit going by producing something. This was actually pretty useful in the depths of winter lockdown apathy.

Other possibilities

My sense is that there's a lot more that could be done with the format. Some potential ideas:

Speedrun events. Tyler Alterman first suggested this on twitter:

I like this idea of a research speedrun

Party format: 5min everyone brainstorms topics of interest into a chat 1hr each person speedruns on one 1hr mini presentation from each person

I tried a tiny one with three people and it worked pretty well. I don't love organising things and I doubt I'll do this often myself, but if someone else wants to try it I'd probably be up for joining.

Chaining speedruns together. Multiple speedruns on the same topic would allow going into more depth while still having the ability to iterate every hour on exactly what you want to focus on.

Technical topics? I'm also interested in quantum foundations but I haven't tried any maths- or physics-heavy speedrun topic yet. It sounds a lot harder, because that type of work tends to involve a lot more stopping and thinking, and maybe nothing would appear on the screen for long periods. Could still be worth trying.

Livestreamed speedruns. It could be funny to do an actual Twitch-style livestreamed speedrun. Or it could be atrociously dull. I'm not sure.

I'd like to hear suggestions for other ideas. I'd also be keen to hear from anyone who tries this as an experiment - please leave a comment to say how it goes!


Lessons from covid predictions: Always use multiple models

17 сентября, 2021 - 06:19
Published on September 17, 2021 3:19 AM GMT

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} Epistemic status: This point isn't novel (it's one of the the ten commandments of superforecasting, more or less),  but it's often underappreciated. Crossposted here.


I went out of my way to make two explicit predictions about covid over the course of the pandemic. The first, back around January 2020, was a vague claim that internet people's worrying about the pandemic was probably overblown and it'd probably level off before it got global. The second one (dated March 12 2021) was more explicit:

 Israel opened vaccines up to general availability at about 0.7 SPP (Shots Per Person). Biden wants to open them on May 1st, which will be just about 0.7SPP at current rates.
Overall, this seems pessimistic (we should expect more acceleration by then), so it does seem to be more of Biden's "underpromise and overdeliver" media strategy. But this one's likely to not be too far off, since there's much less time for acceleration now.
(A few complications: The US vaccinated slower, so you might need a higher threshold for demand shock. But also US rules are more convoluted, so you'd expect demand shock to hit in earlier - but the more convoluted rules also mean there's more people in the various priority groups, so less pressure to open up to general availability. Overall I think these are mostly a wash).

In the spirit of testable predictions: My 90% confidence interval for the median (that is, 25th/50) state making vaccines generally available is 4/17-5/2.

My reasoning behind the first prediction was that we'd had pandemic scares every few years for a while - swine flu, SARS 1, Ebola, Zika, and they'd all fizzled out. So there's an argument that your prior should be that most pandemic scares fizzle out.

The first prediction, obviously, was wrong. The second was technically correct (the actual date of the median state making vaccines available was 4/19), but just barely, and thanks to an unprincipled extension of my error bounds (I'd run a spreadsheet with a few different scenarios for the rate of vaccine acceleration, then added a few days to each side to get my interval. The US was nowhere at only 0.63 SPP at the time.) I'd give myself a 0.5/2 here, which isn't a great track record.

 Where did I go wrong?

My biggest mistake was only having one model for each prediction. With the first one, I didn't even have a confidence interval. With the second one, I accounted for uncertainty within the model, but not for the uncertainty from the model being off. We should always, always consider at least two models when making a prediction. Think of it as running a Monte Carlo simulation on the meta level - we run through the meta-level process of coming up with different models a few times, and that lets us improve accuracy (by averaging the models), and estimate our model uncertainty (by seeing how much they vary).

What could I have done here? For the first prediction (whether or not covid would go global), I could have done an inside-view estimate. Look an infection rates, look at the error bars on how good preventative measures might be, try to guess how likely they were to fail. I probably (given what I knew at the time) would have ended up at somewhere around 30% chance of it going global - still low, but no longer crazy off. 

For the second one, I could have come up with an inside-view model - try to estimate how much pent-up demand there was in special categories and see when we'd run through it, or compare US states to each other instead of just to a different country. It would have given a result closer to the truth, and would have let me estimate model uncertainty without resorting to an ad-hoc "eh let's just add a few days to fudge it".

(Can the multi-model approach fail? If my extra model for the first question was "go out to the street and ask someone if he had covid", it would have made me worse off. Doing the math, adding models fails if the within-model error is significantly larger than our model uncertainty. So our models do have to be at least reasonably good).

Finally, on a more optimistic note - we did manage to learn from this going forward. When we ran through trying to estimate the infection risk at NYC solstice, we made the inside-view calculations for likely results based on microcovid's infection estimates and all that - but we also sanity-checked them by looking at results from similar events like lollapalooza, and it helped to see that they had similar results to our estimates.




Let Us Do Our Work As Well

17 сентября, 2021 - 03:40
Published on September 17, 2021 12:40 AM GMT

As each new writing deadline approaches, I return to The Builders, a poem by Longfellow:

All are architects of Fate,
 Working in these walls of Time;
Some with massive deeds and great,
 Some with ornaments of rhyme.

Nothing useless is, or low;
 Each thing in its place is best;
And what seems but idle show
 Strengthens and supports the rest.

For the structure that we raise,
 Time is with materials filled;
Our to-days and yesterdays
 Are the blocks with which we build.

Truly shape and fashion these;
 Leave no yawning gaps between;
Think not, because no man sees,
 Such things will remain unseen.

In the elder days of Art,
 Builders wrought with greatest care
Each minute and unseen part;
 For the Gods see everywhere.

Let us do our work as well,
 Both the unseen and the seen;
Make the house, where Gods may dwell,
 Beautiful, entire, and clean.

Else our lives are incomplete,
 Standing in these walls of Time,
Broken stairways, where the feet
 Stumble as they seek to climb.

Build to-day, then, strong and sure,
 With a firm and ample base;
And ascending and secure
 Shall to-morrow find its place.

Thus alone can we attain
 To those turrets, where the eye
Sees the world as one vast plain,
 And one boundless reach of sky.

Each year the exhortation to Let us do our work as well becomes harder to follow, both because the pace of ML keeps increasing, and because more students show up to my lab so my attention is split between more projects. This creates pressure to triage and to leave some broken stairways as they are.

I'm sure that I inevitably do leave some stairways broken--questioning experiments in less detail than I would have as a PhD student, or leaving weak or ambiguous arguments in a paper because we ran out of time before the deadline. That being said, I strongly desire to build a culture that is worthy of the elder days of Art.

Is it possible to do this without permanently sacrificing scalability? I think yes, for the following reasons:

  • Standards propagate into culture. If you take the time to teach high standards to junior students, they will eventually become senior students and pass those standards down to the next generation. So, mending broken stairways is an investment that lets newcomers climb higher faster.
  • Scaling through processes. As a manager, catching bugs by exhaustively examining each experiment is not only time-consuming, it's also error-prone. Instead I need to build processes where students can catch their own errors--through code reviews, standard sanity checks, a culture of following up on inconsistencies, etc. I'm still learning the best ways to do this (if you have ideas please comment!) but in the long-term I think this will simultaneously be more scalable and higher-quality.

Perhaps the common thread is that scaling and high standards may seem at odds in the short-term, but in the longer term they are aligned: you can't scale well without also having processes in place to create consistently high-quality work.


Writing On The Pareto Frontier

17 сентября, 2021 - 03:05
Published on September 17, 2021 12:05 AM GMT

I have a personal rule: don’t write something which someone else has already written better.

This is easier than it sounds. For instance, suppose I’m writing an intro to systems biology. I don’t need it to be the most comprehensive intro ever written, or the most accessible intro. I just need it to be good enough on each of these axes that no other work is better on both at once.

In other words, I try to always write things which are the Pareto Best In The World.

Of course this generalizes to more dimensions as well: I might also care about writing something rigorous, or communicating intuitions, or making the piece enjoyable to read, or including good visualizations, or …. I don’t need to write the best piece in the world along any particular dimension. I just need to write a piece good enough on enough dimensions that nothing else beats it on every dimension which I care about.

Ways To Be On The Pareto FrontierNovel Ideas

One natural way to be on the Pareto frontier is to write about a new idea, or at least an idea unusual enough that few people have written about it.

As with writing, new ideas are not necessarily that difficult to find. The trick is to combine things: finding a novel and interesting idea in complexity theory is a lot harder than finding a novel and interesting application of complexity theory to caricature art. On the LessWrong frontpage right now, there’s an interesting post about applying the idea of film study (i.e. athletes watching films of their games) to become better at research. I’ve been writing a whole series of posts intended to practice applying frames from math in unusual ways, and some interesting novel ideas have already come out of them - e.g. optimizing multiple imperfect filters.


Research goes through a pipeline. First, researchers write papers, packed with jargon and assuming lots of background context. As results come together, they get boiled down into overview papers. Then come textbooks and programming libraries. Eventually, there might even be courses.

At each of those steps, work is done to distill the results - to explain them, draw analogies, add visuals, suggest exercises and applications, etc. In general, there’s a Pareto frontier with new cutting-edge results along one axis, and well-distilled results along the other. Pushing that Pareto frontier outward means finding some result which hasn’t been explained very well yet, understanding it oneself, and writing that explanation.

Note that Pareto optimality is again relevant to choosing examples/explanations: different examples will make sense to different people. Just offering very different examples from what others have written before is a good way to reach the Pareto frontier.

<Topic> For <Audience>

Probably the most successful statistics book in the first half of the twentieth century was Fisher’s Statistical Methods for Research Workers, which was essentially a repackaging of statistics for biologists. Glancing at my bookshelf, I see Basic Category Theory For Computer Scientists. COVID posts aimed at the rationality community have been a hot topic over the past year-and-a-half.

In general, a natural way to find a Pareto frontier is to pick a topic, pick an audience which doesn’t usually specialize in that topic, and write an intro to the topic for the audience.

Of course, this requires some background knowledge on both the topic and the audience - e.g. writing “Statistics for Biologists” requires background in both statistics and biology. So, it overlaps nicely with Being The Pareto Best In The World. The writing aspect also adds another dimension: it’s not just statistics and biology skills which are relevant, but writing skills as well. That means there’s three axes along which our skill could be Pareto optimal - and the more axes, the more “elbow room” we have on the Pareto frontier.

Context & Background

One question I get all the time about my work is “Why is <particular result> interesting?”; I sometimes write posts which give more context, but often I just write up results.

Some writing optimizes for communicating an idea clearly. Other writing optimizes for explaining why the idea is interesting/useful, or where it came from, rather than explaining the idea itself. These are both useful, so they’re both axes relevant to Pareto optimality of writing.

Another example: if you want to know why high-school calculus is interesting, then a physics class or a history book (like History Of √-1) will give more context than studying calculus itself. (In fact, this was how I first picked up calculus in high school - the actual calculus course came a few months later.)

Fun Writing

Finally, obviously, writing can be optimized to be fun to read. Embedding interesting ideas in fiction, for instance, is one cool way to reach a Pareto frontier. Just writing fun-to-read nonfiction is also great - whether it’s James Mickens’ rants on computer security, or visual proofs from Euclid’s elements, or drawings of the internals of everyday objects with cartoon mammoths.


All of these different dimensions - novel ideas, distillation, targeting, context/background, fun writing, and plenty of others which I didn’t get to here - are different things one might want from a piece of writing. They are all different ways to be on the Pareto frontier, and any combination of them is also a way to be on the Pareto frontier.

To achieve Pareto optimality, a piece of writing does not need to have completely novel ideas or amazing distillation or be super fun to read or .... It just needs to be good enough on enough of those axes that no other piece of writing is better on all of them.


Jitters No Evidence of Stupidity in RL

17 сентября, 2021 - 01:43
Published on September 16, 2021 10:43 PM GMT

Epistemic status: Pretty sure about core, not about edges

A while ago, I noticed a possible bias in how I evaluated reinforcement learning agents. It tended to cause me to revise my estimation of their intelligence downwards, after I viewed a video of them in action.

I've seen other people fall into what I believe to be the same error. So I'm writing this to correct myself if I am wrong and to alert others if I am right.

The Bias

Many reinforcement learning agents have "jitters." They alternate actions quickly, looking nearly palsied, apparently nullifying the effects of earlier actions with later ones. This is true across a wide variety of reinforcement learning agents.

Many people see these jitters as evidence of the relatively primitive nature of these agents. These actions look clearly stupid and sub-optimal.

For instance consider the original Deep Q Network paper. Even after training for some time on Breakout, it still erratically moves the paddle back and forth when the ball is not near it. One person mentions that it makes "erratic jerky movements that obviously could not in principle be optimal," which was once my impression as well.

Similarly, much more recently, consider DeepMind's recent work on generally capable agents. In the show reel the movement of the agents often looks erratic. Conversation around LessWrong sometimes alluded to these erratic movements as evidence against the intelligence of the agents.

Jitters Non-Optimal For Energy-Constrained Agents

Evolved intelligence on earth has energy conservation as a fundamental part of its optimization function.

Unnecessary movements spend energy. Spent energy must be recovered, at the cost of reproductive fitness. So generally only sick animals, insane animals, and so on, have the shakes or tremble continuously. Energy conservation applies to every animal on earth, which is why we probably feel intuitively confident applying this rule across the broad variety of animals.

Additionally, extremely erratic movements can result in injury to the animal which is making them. So this is another reason why, for creatures that are a result of evolution, erratic movements are a sign of insanity or injury.

RL Agents Are Not Energy-Constrained

Reinforcement learning agents are not energy-constrained. They do not draw on a finite store of glucose when acting. Nor do they have any possibility of injuring themselves. As a result, the policies resulting from reinforcement learning algorithms will not be strongly constrained to limit jitters in the way that policies resulting from evolution will be constrained.

You can go further than this. Given the way that most reinforcement learning agents are set up, they have no way to even distinguish any difference between action and non-action, and thus between non-rest and rest.

That is, consider a reinforcement learning agent which makes one of fifteen different categorical actions in each time-step, like those in OpenAI's ProcGen. For an agent controlling a side-scrolling avatar, for instance, one action would be moving right; another action would be jumping; another action would be doing nothing; etc. Each of these is only distinguished from the others as different indices on one hot-action encodings -- i.e., moving right could be [1,0,0,0...], jumping could be [0,1,0,0...], doing nothing could be [0,0,1,0...], and so on.

For a human controlling such a side-scrolling avatar, "doing nothing" stands out from all the other actions. If you put yourself in a situation where you are allowed to do nothing, you can rest your hands by not pressing any buttons. You can consider a more global strategy, and focus on the kind of strategy you will use when you resume acting. It also allows you to rest your mind, because humans can think harder or less hard. Doing nothing gives you an opportunity for reflection and meta-optimization in a way that no other alternative does.

None of this applies to a reinforcement learning agent. "Doing nothing" is one one-hot encoding just like all the other encodings. It cannot rest itself by doing nothing. It cannot focus on preparing for things further away in time; the vast majority of reinforcement learning agents must do a constant amount of thought in each time-step, about precisely the same things. So rest is not a preferred location in action-space that allows meta-optimization for these agents, as it is for evolved agents. They have no way to distinguish rest from non-rest, and thus no reason to pursue rest.

The above should also apply, mutatis mutandis, to reinforcement learning agents acting in a continuous rather than a discrete space.

Jitters May Sometimes be Optimal for Non-Energy-Constrained Agents

This is a more speculative point.

When I act, I often trade between low-probability-of-success action, with little thought put into it, and high-probability-of-success action, with a lot of thought put into it. Put more simply, where attempted action is very cheap, I am willing to try a lot of times.

Battery doesn't fit? I'll wiggle it around. Command in the terminal doesn't work? I'll try changing a parameter. Pill bottle not opening? I'll cycle through different axes of twist and pressure. Generally, I'll start to apply thought more determinedly where there are no low-cost actions available with any reasonable probability of success.

Again, this makes sense from an evolutionary standpoint. Trying things takes energy. Thinking about things also takes energy. Along the boundary where each alternative has equal potential reward and equal probability of success, we would expect ourselves to be indifferent to trying things out versus thinking about things. Only where trying becomes more expensive than thinking about things would we expect that we would feel inclined to think about things rather than try things.

But again, this is not a trade off that reinforcement learning agents are able to make. They must always think about things to precisely the same amount. Which means that exposing yourself to a greater surface area of possible reward, in areas of phase-space where actions are not overdetermined, might generally be the ideal action. Jittering around could be the optimal solution.

Again, I'm less sure about this section.


When I see a reinforcement learning agent acting in a video, acting erratically, some part of me still says that it looks kind of stupid because of this. But I currently believe, for reasons given above, that it's best not to listen to this part of myself



Comments on Jacob Falkovich on loneliness

17 сентября, 2021 - 01:04
Published on September 16, 2021 10:04 PM GMT

Two years ago, Jacob wrote an essay about dating, sex, and loneliness. I found this essay recently because I am extremely interested in loneliness. It seems to me that a great deal of work is needed to resolve existential risk, and that loneliness is one of the top practical impediments to that work. I think that a good understanding and significant resolution to loneliness would be of enormous value to this community.

And so with that purpose in mind, I discovered Jacob’s essay. Jacob’s basic point, as I understand it, is that a trade-off exists between spending time opening to connection via self-work, and spending time opening to connection via relating. He sees a lot of his friends trying to open to connection via self-work alone, and in his essay he exhorts us to mix in some relating, too.

Jacob actually talks mostly about dating, not relating, but I do not think he would disagree with my re-characterization in terms of relating because he writes:

This essay is to tell you: [...] ask someone on a date. At the very least, invite someone to hang out and ask them what they’re struggling with.

Jacob’s essay is tenacious and charged, which I appreciate very much, but I am aiming for a different kind of charm in this post, so I will cut out some of the more charged language when I quote him. I promise to do my best not to skew the meaning of anything with my cuts.

I have found it most helpful to write this essay by simply quoting Jacob in full, and adding commentary. What follows is most of the original essay, with my replies inline. Jacob is in blockquotes, I am not.


From Tokyo to TriBeCa, people are increasingly alone. People go on fewer dates, marry less and later, have smaller families if at all. People are having less sex, especially young people. The common complaint: it’s just too hard. Dating is hard, intimacy is hard, relationships are hard. I’m not ready to play on hard mode yet, I’ll do the relationship thing when I level up.

Jacob says that loneliness exists in the world, and is getting worse. Right in the second sentence he moves from loneliness to dating, marriage, and sex. I think this is significant. It is extremely common for people in our society to seek to overcome loneliness primarily through romantic connections. Why is that? It seems that friendship and family connections often "miss the spot" somehow. Why is that? I actually wonder if this is the true cause of loneliness.

This essay is to tell you: go [date] someone else. Ask someone on a date. At the very least, invite someone to hang out and ask them what they’re struggling with. This essay is not about how to make friends and lovers (a topic I’ll come back to), but an exhortation to actually go and do that. Now instead of later, directly instead of ass-backwards, seek relationships instead of seeking to be deemed worthy of relationships. [...]

I resonate a lot with this exhortation to get out there and really do it, not just in the domain of dating but in all domains. A similar ethos was expressed beautifully by Eliezer in many of his writings. The unnamed twelfth virtue of rationality was about bringing every ounce of one’s being into alignment with a clear purpose, cutting away all extraneous motion. This is not disconnected from Jacob’s point above.

But Jacob is making a more specific point when he says "seek relationships instead of seeking to be deemed worthy of relationships". This is vulnerable territory. I think what it really comes down to is that a certain amount of preparation for dating is good, but beyond a certain point perhaps preparation is being chosen over action because getting out there is quite scary. How do we overcome such fear? Well, one approach is to just go for it, with or without fear. This is the approach that Jacob seems to be advocating and it honestly has a lot to recommend it, but I would not advocate it across the board in every situation.

My argument doesn’t hinge on specific data relating to the intimacy recession and whether the survey counting sex dolls adjusted for inflation. If you’re reading [Jacob’s blog] as a brief escape from all the loving relationships smothering you, congrats! If you’re trying as hard as you can to connect and the world isn’t reciprocating, consider this essay as written for those you seek to connect with instead. [...].

Look, this paragraph is a little harsh due to the use of sarcasm in the context of such a vulnerable topic. But underneath that, I think Jacob is being quite reasonable and saying "hey, folks, come on, really, let’s get out there and do this". I get it. Sometimes the way to overcome hesitation is to just plunge in. Quantitative research can be useful, but it can also be a distraction from the main work.

Wherefore all this aloneness? The pink-hairs blame the red-pills who blame the pink-hairs. But really, they’re both in agreement that men and women are natural enemies and any interactions between the two are zero-sum. If you’re stuck in zero-sum thinking you’re probably on the wrong blog, but take this as a first dose of medicine and then go give someone a hug.

One level up from the gender war is the class war. Leftists blame loneliness on capitalism — single people buy twice as many toasters, sex toys, and Netflix subscriptions. Rightists blame socialism — for the state to be your daddy it must first destroy the family. I won’t spend much time on this. If your ability to connect with people depends more than zero on the GDP composition that’s the problem right there. "But in this economy…" Listen, if you’re struggling to build financial capital, maybe now is the time to invest in relationship capital instead?

There is a great deal being said here and with respect to Jacob I will attempt to deconstruct these two paragraphs a bit. Two underlying assumptions here are

  1. Relations between men and women can be considered in terms of zero-sum versus non-zero-sum games

  2. Relationship capital can be considered a kind of investment

There is a very important way in which these assumptions are helpful. First, there really is a certain game being played in dating, marriage, and sex, and it really matters whether that game is zero-sum or non-zero-sum. Second, relationship capital really does act like an investment, in that some effort goes in, and then over time that investment may bear fruits. But games are played in service of some kind of payoff and investments are made in order to reap some kind of reward. If daing, marriage, and sex are means to an end, then what exactly is that end?

It may also be worth considering the ways that the two assumptions above are unhelpful. We wouldn’t say that the Earth and the Sun are engaged in a game with each other. They just evolve according to the laws of physics. Relations between men and women could be viewed a bit like that. We also wouldn’t say that relations between a mother and a child, or between soldiers in a tight-knit platoon are always best-described as a game. There are times when you can open a side channel and transcend the game entirely. Relations between men and women could be viewed a bit like that, too.

The famous Atlantic article on The Sex Recession starts by noting that sex is now more accepted than ever: [...].

So why, in the words of philosopher Julia Kristeva, "everything is permitted and nothing is possible"?

I don’t think there’s a contradiction here. Everything is hard because it’s permitted.

There used to be no shortage of people who would judge you for having sex. Parents, peers, teachers, pastors, even the same media outlets that now claims to be "sex positive". And when you had to escape surveillance and risk judgment just to make out with someone, it was HOT. The illicit is sexy. Sneaking around created a bond based on a shared secret and merely having sex in the face of restriction was an achievement to be proud of. Having good sex was gravy.

When Jacob says "there used to be no shortage", is he referring to earlier times in our culture, or earlier times in our own personal lives? It’s true that sex had a particular edginess as a young person. Why exactly does that change over time? I think it’s because we become less present as we do anything a few times. The first time you go to a rock concert, for example, you are snapped into presence by the overwhelming novelty of the situation. At your second or third concert the novelty is less acute, and so the level of automatic presencing is less. But novelty is just one way to become present. There are other ways to become present. You can do it out of willpower alone, if you wish, or you can meditate or do yoga. So although transgression can be, as Jacob says, hot, it is not, I don’t think, the only way to find that heat.

If "the culture" no longer judges you for getting naked, who will? Your partner might. [...] This can be a problem, but it’s ameliorated by your partner repeatedly telling you that no, it was good, you’re just what they wanted. You should believe them. If they didn’t like you they’d make like Hamlet and ghost.

Now we’re discussing trust. Can we trust the words of our romantic partners? Can we trust that they like us, even when they say that they do? Perhaps we should trust their actions more than their words. It seems pretty plausible to me that trust is at the very heart of loneliness.

It’s hard to open up to someone that we don’t trust absolutely, and people in our lives demonstrate again and again that they are not absolutely trustworthy. It is the "absolutely" in that sentence that is key. It seems to me that indeed people in our lives are not absolutely trustworthy, because absolute trustworthiness is an extraordinarily high bar, so instead of trusting our friends and lovers absolutely, we should trust them in accord with out assessment of their trustworthiness, and that assessment is never going to come out at infinity.

There is a pattern that I have seen in myself and others in which one falls in love, decides to trust a lover absolutely, is eventually disappointed, decides never to trust anyone ever again, then due to the loneliness that ensues, eventually falls dramatically in love and decides once again to trust absolutely. This binary all-or-nothing trust just doesn’t fit the situation very well, and is premised, I think, on the mistaken assumption that true love requires absolute trust in another human being.

The big problem is when you start judging yourself. You can hide from your parents. You can find a partner who doesn’t judge your shortcomings. But you can’t outrun your own insecurities.

Well, this one is subtle. Suppose your friend comes to you and says "hey I think I have this particular irredeemable character flaw". Then you explain that whatever it is, it cannot possibly be irredeemable, because we always have a choice about how to react to our own flaws, and our choice about how to react is what character really means, so by finding a way to react virtuously to our supposedly-irredeemable character flaws we disprove their irredeemability. This is all well and good as far as it goes, but have you actually helped your friend? Suppose your friend now says "oh I see now that my irredeemable character flaw is that I believe in irredeemable character flaws". In this case you haven’t really helped your friend to escape from a belief in irredeemable character flaws, you’ve just replaced the belief in one particular irredeemable character flaw with another. So it goes with sentences like "The big problem is when you start judging yourself". This is essentially like saying "the big problem with you is when you start believing there is a big problem with you". It is recommending that you stop believing in big problems, while simultaneously pointing out a big problem.

The truth, it seems to me, is that we have the capacity to see our imperfections, and we have the capacity to work on them if we wish to. Sometimes we may not wish to! Some spiritual people look at this overall dynamic of seeing and working on imperfections as a kind of meta-level perfection. That is a nice way to look at things if you’re into that kind of thing. It shouldn’t be taken too literally.

It starts by comparing yourself to the internet. [...]. Everyone’s dates are more romantic on [the internet], their vacations sexier. [...]

It’s true that the things people post online are subject to a selection filter, and it’s true that this selection filter sharpens a particularly horrific form of suffering. This is deeply tragic. But selection filter or no selection filter, I think we should look carefully at what conclusions are being drawn from the comparison of our own dates and vacations to those of others. What is actually the line of reasoning we are using to go from these comparisons to whatever painful conclusion we seem to draw?

We might look at photos of other peoples’ dates and vacations in order to get ideas about places to go on dates and vacations. That seems pretty reasonable. We might look at photos of other peoples’ dates and vacations in order to share joy with our friends. That seems pretty reasonable.

Now, when we look at photos of other peoples’ dates and vacations, we might feel that our own dates and vacations are not as joyful as those of others. What can be said about this? Well, either we are mistaken in the perception that our own dates and vacations are less joyful than those of others, or we are not mistaken. What if we are not mistaken? Well, okay, in that case the dates and vacations of our friends are more joyful than our own. What actually does this imply?

Suppose I told you that there is this one monk who lives on such-and-such a mountain and has mastered equanimity to such an extent that she is almost always extremely joyful. Suppose I told you that I have watched your life and, just in case you wanted to know, it seems that you are, on average, less joyful than this one monk. Okay. So there is this one monk. Incredibly joyful. Great. What does this imply?

I think that the difference between the monk and the photos of other peoples’ dates and vacations is that with the latter, we wonder whether it is some irredeemable flaw in our character that is leading to the perceived or actual difference between our dates and vacations and those of others. This, so far as I can tell, is the epicenter of the pain, and, frankly, the most likely epicenter of loneliness. It is so painful to sit in contemplation of this irredeemability. To see evidence for it everywhere. To fight against it. To make heroic efforts to overcome it. To be bound up by it again and again. This is true suffering.

Yeah, this is real pain.

The pain is not imaginary, but the irredeemability is. It is based on a mistaken application of the agent model to ourselves, leading us to believe in the existence of an unchanging decision function within ourselves, leading us to trace the causes of our imperfect actions back to something that we believe is unchangeable. Under such assumptions, what would there be to do other than suffer? If this were how things really were, there would indeed be nothing to do. But this is not how things are. There is no unchanging decision function deep within ourselves. There is no place for any such absolute irredeemability to hide. The agent model is just a frame, and this is one of the places where it falls short.

Continuing on, Jacob writes:

[...]. People [...] start diverting all their energy into acquiring status markers, into being perceived as relationship-worthy by the real or imagined crowd of observers.

Here Jacob is discussing a vicious cycle in which the acquisition of a thing that never quite fulfills us leads to an ever-growing hunger for that thing in the hope that just a little bit more will finally fulfill us. He continues:

[...] As people spend more effort on status-climbing and self-improvement they spend less time in actual relationships. Unfortunately, you don’t get better at dating by learning to meditate or doing pushups alone in your room. When people who are obsessed with self-improvement have a miserable time on apps and first dates, they often conclude that problem is lack of self-improvement — surely when two well-developed high-status people effortless love will spark by itself! And so people keep chasing the next personal milestone. Get that degree, lose 10 pounds, learn that skill, read that book…

There is much that is good about self-development. It is not an evil pursuit, and I do not think that Jacob would claim that it is. Jacob is asking us to question what we are pursuing self-development in service of. If we are pursuing self-development in service of something that it can never fulfill, then we might get caught in a vicious cycle, always believing that we will get that thing with just a little extra self-development. It is as if we really really believed that drinking enough water would resolve our hunger for food. The more water we drink, the more convinced we become that drinking just a little more will resolve our hunger for food. But it won’t. It is not that drinking water is evil, it is that we’re operating on the basis of a straightforward misconception about what leads to what.

Self-development is good for many things. But, as Jacob points out, it may not ever give us an unequivocal feeling of worthiness. One possible reason for that might be that unequivocal feelings of worthiness just don’t exist. Another possible reason is that they do exist but self-development doesn’t lead there. Either way, if we are operating on the assumption that self-development will eventually lead to an unequivocal feeling of worthiness then we may end up in the vicious cycle that Jacob is pointing out to us here.

Self-development is riskless. Progress is slow but assured, and every step towards your personal goal is rewarded with likes and favs on social media. The pursuit itself raises one’s status. Opening up for connection, on the other hand, is scary. The rewards are great but so is the risk of failure.

I wouldn’t say that self-development is riskless. At a minimum there is a risk that one will fall into exactly the vicious cycle that Jacob is pointing at. But self-development can be less vulnerable than opening up for connection, because in self-development we can only be hurt by ourselves, whereas in relating we can be hurt by others, and I think what "vulnerable" means is existing in a state where we could in principle be hurt by others.

And real affection is the one thing you can’t brag about in a [social media] story. Intimacy for external consumption is not intimacy.

Yeah, the thing we value very highly -- real intimacy and real affection -- seems to be the thing most difficult to convey in a social media post. Why would that be?

And so, as the great guru put it: people want to be [highly developed] more than they want to [make love]. [Development] is capital. We seek to accumulate capital. [Making love] is labor. We seek to avoid labor. And so people are more [developed] than ever, and do ever less [lovemaking].

Our society places a lot of value, in general, on things that can be held onto. Self-development is something that we can hold onto. Making love is not. Therefore there is at least one reason to expect self-development to be prized over lovemaking. I don’t really know why this comparison is relevant, though.

In fact our society seems to value lovemaking a lot. And does anyone really view lovemaking as labor? I don’t buy this point. I’m not sure why Jacob is making it. He continues:

The pathological case of becoming obsessed with status and perception is when relationships themselves are subjugated to this end. When the main measure of a relationship is in how it makes you appear. Narcissism.

Indeed. Self-aggrandizement certainly seems like an unhealthy thing to pursue relationships in service of.

Narcissists ask: How does this relationship reinforce my ego narrative brand? How worthy does it make me seem? Ego-poisoned people who are short of narcissism merely ask: Would I be judged of a relationship? These questions are self-focused, and intimacy requires that you relinquish them entirely. Instead, the question that starts all good relationships is: Can I make someone happy?

Well said. Can I make someone happy?

Making someone happy doesn’t imply forever, or as happy as they can be, or happier than anyone else could make them. A compliment makes a person happy. A text where you share something fun. Being a good listener on a date even if you didn’t blow their mind with electric conversation. A cuddle makes a person happy even if it stays a cuddle. Sex makes people happy even if it’s not [mind-blowing].

Why did we go from asking whether we could make someone happy to asking whether we could make someone happier than anyone else could make them? Jacob is pointing out here that the answer to this question isn’t really relevant, but why did we choose to turn our attention to this question in the first place?

When a niggling thought like "yes my partner is happy but perhaps another person could make them even happier" bothers us, we basically have two choices: either we can turn towards it and take the question so seriously that we see right through it to the other side, or we can turn away from it and stop paying attention to the question at all. I think it is when we choose neither one, and stay in the no-man’s-land middle ground, that we get pummeled by it over and over.

Jacob continues:

Romance is the most complex and rewarding multi-player game that humanity has invented. There are many romantic interactions that are short of your wildest dreams that are still worth having, that make two people happier than they would have been alone. And if you’re starting out, that’s where you should aim for.

There is a background assumption here that romantic interactions can be put into some kind of ordering, and there is a second assumption that people gain experience and have romantic interactions that rank higher and higher in that ordering. Perhaps these are true, but they seem to conflict with the earlier point about romantic interactions being more intense during early periods of our lives.

I do not really buy that there is an ordering over romantic interactions that is worth paying attention to.

Dating and sex and relationships are all trainable skills. You learn by doing. To learn painting you start by making 100 paintings. To get good at tennis you start by playing 100 matches. The first 100 will be mostly mediocre and some will be outright bad, but the 101st one has the chance to be good.

But unlike tennis and painting, our first experiences of romantic love are often unbearably sublime.

Dating and sex and relationships are all trainable skills. You learn by doing. To learn painting you start by making 100 paintings. To get good at tennis you start by playing 100 matches. The first 100 will be mostly mediocre and some will be outright bad, but the 101st one has the chance to be good.

To go on a great date, you have to go on 100 mediocre dates. Or at least, put yourself in the mindset where that is your goal. That is how you learn to date and make people happy to be dating you. You learn how to deal with rejection and breakups and how to bounce back. Just as importantly, that’s where you learn to enjoy dating (see rule 97).

The sentence that most sticks out to me here is "that is how you learn to [...] make people happy to be dating you". It’s difficult to write about this. I think the best way forward is to look at background assumptions and try to apply them consciously when they are helpful, and not apply them unconsciously when they are unhelpful. Throughout much of Jacob’s essay I sense a background assumption of a kind of game theoretic approach to dating. This can be a beautiful way to see things because when we see the whole game that we are engaged in then we can jump right to the equilibrium move, rather than being pushed there slowly and painfully by trial and error. Where then does this game theoretic way of seeing things fall down? Very simply, I think it leads to a lot of anxiety. Overall, I think it therefore makes sense to use this game theoretic frame more than zero but not excessively.

Now there is a very important background assumption that I am making here, which is that we can choose our own thoughts. The way I see it, we cannot exactly choose our thoughts on a moment-by-moment basis, but we can exert considerable power over our thoughts by directing our attention to certain topics, by practicing concentrating on one thing, and by setting up our external environment to reinforce certain trains of thought and not others. Using these mechanisms I do think we have the power to choose how much to engage with a game theoretic frame in dating.

Jacob finishes with a final exhortation to get out there and date:

Perhaps there was a hidden benefit to the premodern mating context when you had roughly one shot at a successful partnering — all you could do is invest in the one relationship you’re given. But now that the option to date without lifelong commitment exists it affects your dating life even if you don’t plan on it. The option is always there for you and your partners. Waiting until you hit some life marker to start dating just means that you miss out on years of learning what other people are looking for, and what you yourself are looking for in a relationship.

And if you’re too busy for dating, actually busy with something that’s more important to you than romance, consider that dating doesn’t have to be a sink of time and energy. A casual date can be invigorating, and a partner can provide the support you need in your struggles.

So go out there and make some people mildly happy by going on mediocre dates[4] and having mediocre sex and learning to connect with people romantically instead of having your head up your own ass. There are more interesting things to put in there with a partner.

It’s a beautiful sentiment, isn’t it?


Well, what can be said of all this?

First up, there is really a lot going on here, not just in Jacob’s essay, but on this topic in general. There is social commentary, political commentary, economics, game theory, intimacy, happiness, self-development, life advice, and a host of background assumptions to go along with these. When it’s hard to make sense of something, I find it most helpful to get really clear about my purpose, and my purpose here is to understand and resolve loneliness. What have we learned about the causes of and resolution to loneliness?

The first question, I think, is whether dating, marriage, and sex is a plausible way out of loneliness at all. Is it? I don’t know.

The second question is, well, if we do want to pursue dating, mariage, and sex, then what is a good way to pursue it? Jacob argues that we should pursue it by practicing it. Well, OK, that sounds pretty reasonable.

The third question is, well, if we don’t want to pursue dating, marriage, and sex, then what is loneliness and what is the way out of loneliness? I don’t know.

It is interesting to me that writing this essay has been difficult but not lonely. Here I am, sitting alone in my room, writing an essay about loneliness, and yet this experience right here actually is not loneliness. This is fun, and I think that it is good, too, because loneliness really is an impediment to getting things done, and there really are some darn important things to get done in the world, and so it really does makes sense to investigate loneliness, and one pretty good way to investigate loneliness is to write about it and discuss it with friends in the comments.

Perhaps the end of loneliness lies in clarity of purpose.