Вы здесь

Сборщик RSS-лент

Академическая встреча

События в Кочерге - 24 октября, 2021 - 14:00
Встреча рационалистов с докладами от участников сообщества. Анонс докладов появится чуть позже.

ACX Montreal Oct 30th 2021

Новости LessWrong.com - 5 часов 28 минут назад
Published on October 22, 2021 2:06 AM GMT

Meetup for fans of ACX/SSC and rationality. Friendly group eager to meet new people.

Exact location: Parc Jeanne-Mance at the corner of Duluth and Esplanade

If you would like to sign up for email notification of upcoming events, you may do so here: https://tinyletter.com/acxmontreal


Rapid Antigen Tests for COVID

Новости LessWrong.com - 5 часов 28 минут назад
Published on October 21, 2021 3:26 PM GMT


Home antigen tests for COVID are an imperfect but useful tool. In this post I’ll discuss the four scenarios where I think they’re most useful, share a few thoughts about using them correctly, and finish by taking a deep look at the data on accuracy.

If you don’t already understand concepts like sensitivity and positive predictive value, you might want to read this first.

I’ll focus on the Abbott BinaxNOW test because I think it’s overall the best and most available home antigen test in the US as of October 2021 (the situation is different in other countries). Sensitivity varies somewhat between different tests, but they are all roughly comparable and have the same strengths and weaknesses.

Epistemic status

This is a complex topic that is evolving quickly and is only partly understood. My analysis is grounded in hard data but necessarily involves a certain amount of extrapolation.

I have no relevant credentials but this writing has been reviewed by a medical epidemiologist who works full time on COVID.

Application 1: risk reduction

I consider antigen tests to be most useful for reducing the risk of asymptomatic transmission at social events. In that context, I believe a negative BinaxNOW test reduces the probability that you are infectious by about 75% for the 12 hour period immediately after taking the test. (There's no hard data behind the 12 hour cutoff—it's just a reasonable extrapolation based on what we know about viral load during the early stages of infection).

When I host social events, I calculate the microCOVID risk of attending the event and include it in the invitation. At events where everyone tests at the door, I reduce the calculated risk by 75%. (Note that this is a rare case where you care about the sensitivity of a test, not the PPV).

Application 2: if you have symptoms

Home antigen tests have limited value for testing yourself when you have symptoms because their sensitivity is fairly low (probably about 70% for people with symptoms). I agree with current CDC guidance for people with symptoms (the guidance is in the middle of changing and as of mid October some documents are out of sync with others):

Option 1 is to take a home antigen test. If the results are positive, you probably have COVID: isolate and consider seeking medical advice. If the results are negative, you should still get a PCR test because of the substantial chance of a false negative.

Option 2 is to skip the home antigen test and get a PCR test right away.

A reasonable but sub-optimal third option is to isolate immediately and take multiple antigen tests, spaced 36 - 48 hours apart.

Application 3: testing after exposure

Current CDC guidance is that if you’re vaccinated and have been in close contact with someone who has COVID, you should get a PCR test 5-7 days after your exposure. Until then, you should wear a mask when you’re around other people indoors.

As with testing when you develop symptoms, antigen tests have limited value when testing after a known exposure. A positive test indicates you likely have COVID and further action is warranted, but a negative test is not super informative. (By the way, I like The microCOVID Project's blog post about negative test results).

My personal inclination (which is shared by my epidemiologist consultant) is to quarantine and perform serial antigen testing (see below) after a mild exposure and to get a PCR test after a serious exposure.

Application 4: serial testing

The final and somewhat niche application for antigen tests is serial testing, which is typically used by people who have a high degree of ongoing exposure or are highly risk intolerant. It typically involves testing every three days. Testing every day is not unreasonable, but testing more than once per day has very little value.

The idea behind serial testing is that if you’re testing regularly, one of your tests will occur soon after your viral load increases, warning you about an infection before you get severely ill or spread it to many people.

Serial testing is far from perfect, but early data suggest it can substantially reduce forward transmission and can achieve total sensitivity almost comparable to PCR testing. I’m not aware of any data or modeling of how much serial testing reduces forward transmission: if you know of any, I’d love to see it.

Using the BinaxNOW test

The Abbott BinaxNOW is a home antigen test for COVID that is widely available without a prescription and costs about $12 per test. It yields results in 15 minutes.

Using the test isn’t rocket science but it’s easy to make mistakes that significantly affect test accuracy. I recommend reading the instructions carefully the first time you use one (you might also watch this video). If you’re testing multiple people (at a dinner party, for example), you might consider having a designated person help everyone test and watch for any mistakes.

Common mistakes

Based on one published study and my own experience helping with numerous tests, I recommend you pay particular attention to:

  • Getting 6 reagent drops in the top hole
  • Swabbing for a full 15 seconds per nostril
  • Inserting the swab correctly into the card
  • Rotating the swab three full 360° rotations after inserting it
A sample protocol

Here’s my protocol for events like dinner parties:

  • Wear a mask when you arrive
  • Ideally, conduct your test under supervision if you’re not familiar with the process
  • When it’s time to swab your nostrils: remove your mask, step back, and turn away from other people (in case you sneeze)
  • Put your mask back on as soon as you’re done swabbing and wear it until your test is done
  • Label your test with a marker and start a timer
  • Don’t be alarmed by the initial rush of pink dye across the test strip
  • Check your test when the timer goes off, remembering that even a very faint line indicates a positive result
How accurate is BinaxNOW?

Short answer: BinaxNOW has an excellent specificity of 99%. Sensitivity is middling, with wide variation depending primarily on viral load. Those characteristics make it more useful for some applications than others: in particular, it’s more useful for determining whether someone is likely to be infectious than it is for determining whether someone has COVID at all.

Unfortunately, there is a lot of data but there isn’t a lot of high quality data that answers exactly what we want to know.

If you’re testing to find out if you have COVID, the overall sensitivity of BinaxNOW based on meta-analysis is:

All patients: sensitivity = 62%

All the antigen tests perform much better in people with symptoms. I haven’t found a meta analysis of this for the BinaxNOW specifically, but my best extrapolation of multiple data points is:

Symptomatic people: sensitivity = 67%
Asymptomatic people: sensitivity = 48%

If you’re testing to find out if you’re infectious, it’s a little more complicated. My best guess is:

Testing to see if you’re infectious: sensitivity = 75%

But you probably wouldn’t be here if you just wanted the short answer.

Data sources

I’ve found three papers to be most useful: this meta analysis (paper 1) from August 2021 provides the most comprehensive review of the available data, while this one (paper 2) and this one (paper 3) include subgroup analyses which are helpful for understanding what factors affect test accuracy.

Researchers generally determine accuracy by comparing BinaxNOW results to PCR results (the “gold standard”). Most studies used real-world testing of actual patients, but there is some data that uses lab-prepared samples (which is useful for understanding the underlying processes).

Subgroup analysis

The meta analysis found an average sensitivity of 62%, but that varied substantially between different subgroups. Three different subgroup analyses all suggest that sensitivity is highly dependent on how much virus is present—taken together, they strongly suggest BinaxNOW will be pretty good at detecting people who are currently infectious, but not so good at detecting low-grade infections or infections before or after peak viral load.

Symptomatic vs asymptomatic

Many studies have found better sensitivity in symptomatic people than asymptomatic. The meta-analysis (paper 1) found that for antigen tests in general, sensitivities were 72% and 52% in symptomatic and asymptomatic individuals. Extrapolating from other data about the BinaxNOW specifically, I’d guess:

Symptomatic people: sensitivity = 67%
Asymptomatic people: sensitivity = 48%

67% sensitivity isn’t great if you’ve just developed symptoms and you want to know if you have COVID or not.

Culture-positive vs culture-negative

Paper 2 performed a very interesting subgroup analysis: they tried to culture virus from each specimen and compared the sensitivity of culture-positive specimens to culture-negative ones. They found:

All specimens:
Sensitivity = 64% (symptomatic) vs 36% (asymptomatic)

Culture-positive specimens:
Sensitivity = 93% (symptomatic) vs 79% (asymptomatic)

Sensitivity of 79% in culture-positive specimens is quite good: if I had to pick a single metric of how sensitive BinaxNOW is for detecting asymptomatic but infectious cases, it would be this one. Viral culture is complicated to perform (especially for nasal samples), but many epidemiologists consider it to be the gold standard for detecting infectious individuals.

Ct values

Subgroup analysis based on Ct values provides strong evidence for the importance of viral load in determining test accuracy and is roughly consistent across multiple papers.

Some background: PCR tests work by detecting viral nucleic acid in a specimen. The process involves multiple cycles of duplicating nucleic acid: with each duplication cycle, any nucleic acid in the specimen gets copied. This results in an exponential increase in the amount of nucleic acid. The test keeps going until either there’s enough nucleic acid to be detectable, or enough cycles have been performed that there clearly isn’t any nucleic acid to be found.

The number of cycles performed is referred to as Ct (Cycle Threshold). A lower Ct indicates much more nucleic acid was present in the original sample, so fewer duplication cycles were needed to reach the detection threshold. Ct is a very useful indicator of how much virus was present in a specimen. Unfortunately, however, Ct values are not standardized across labs: there’s no standard Ct value that indicates someone is probably infectious.

Multiple studies have found that sensitivity depends strongly on Ct. From the meta-analysis of all antigen tests:

Sensitivity = 94% (Ct <= 25)
Sensitivity = 38% (Ct > 25)

From paper 3, for BinaxNOW specifically:

Sensitivity = 100% (Ct 13-19.9)
Sensitivity = 79% (Ct 20-24.9)
Sensitivity = 13% (Ct 25-29.9)
Sensitivity = 8% (Ct 30-35)

These results provide very strong evidence that sensitivity depends strongly on viral load (and therefore that sensitivity will be high when someone is infectious).

Putting it all together

So there’s lots of data, and it’s all pretty consistent: multiple lines of inquiry strongly suggest that BinaxNOW sensitivity is strongly dependent on viral load. So what’s the actual sensitivity?

If you’re testing because you’ve developed symptoms, have had an exposure, or are conducting serial testing, you should use a sensitivity of 67% if you’re symptomatic or 48% if you’re not. Those numbers are extrapolated from overall BinaxNOW sensitivity (from meta-analysis), asymptomatic vs symptomatic sensitivity across all antigen tests (from meta-analysis), and a study that measured asymptomatic vs symptomatic sensitivity in BinaxNOW specifically.

What if you’re testing to see if you’re infectious? That’s more complicated. The data are all pretty consistent, but nobody has directly measured what we want to know (because that would be very hard). The most directly relevant number is from paper 2, which found 79% sensitivity in asymptomatic people with culture-positive specimens.

So I’m gonna pick 75% because it’s a round number—my gut says the real number might be a little higher, but for this application I think it’s appropriate to be a bit conservative.

Other bits and pieces Performance with Delta and other variants

Paper 3 found that BinaxNOW seems to perform equally well with the Alpha and Delta variants, which isn’t terribly surprising. The paper found comparable performance across variants based on Ct values: given that Delta produces much greater viral loads, one could speculate that sensitivity with Delta might actually be superior (but without data, that’s purely speculation). Note, though, that (as with most COVID data), we still have limited data that is specific to Delta.

Stacking tests

People sometimes wonder if they can get better sensitivity by taking multiple tests at the same time. There is limited data on this, but multiple same-day tests seem to add almost no sensitivity.

The primary determinant of test sensitivity seems to be viral load: if you’re shedding a lot of virus the test is quite sensitive, and if you aren’t shedding much virus the test isn’t very sensitive at all. So if you’re not shedding much virus, the test isn’t very sensitive no matter how many tests you take in a row.

A minor factor in test accuracy is user skill, but rather than trying to correct for that by taking multiple tests, I’d recommend just reading the instructions carefully and making sure you’re doing it right.

BinaxNOW versus other tests

All the home antigen tests seem to have roughly comparable accuracy: variations between studies of the same test seem to be about as large as variations between tests.

Paper 1 found overall sensitivity of 71% for all antigen tests and 62% for the BinaxNOW specifically. And paper 3 found similar results between BinaxNOW, Quidel Sofia2, and BD Veritor.

As of October 2021, I think your choice of test should be driven by cost, availability, and ease of use more than accuracy.

Interpreting test results using Bayes factors

I like mayleaf's post on interpreting COVID test results using Bayes factors. Maybe you will too.

Other sources

There are lots of papers on this topic. Here are a few of my favorites:

The authors of paper 1 maintain a website that tracks all papers about antigen tests. It’s a great source if you want to do a comprehensive overview.

This September 2021 paper is a great meta-analysis of antigen tests but unfortunately doesn’t break out BinaxNOW specifically. Its results are roughly in line with the findings of paper 1, however.

An interesting modeling study that concludes test frequency and turnaround time are more important than accuracy.

A study of serial testing that finds good overall performance.


Distributed research journals based on blockchains

Новости LessWrong.com - 6 часов 59 минут назад
Published on October 21, 2021 5:54 PM GMT

I know what you're thinking (I mean, I probably don't but I'm going to pretend that I do for a minute): Blockchains are synonymous with cryptocurrencies at this point so I'm probably talking about creating some sort of coin and using it to pay academics.

Neat, but no. What I like about blockchains is that they're:

  • Immutable
  • Distributed
  • Organized into a fixed chronological order

These all seem like features that would be great for some sort of distributed research journal:

  • Immutable: Once some academic work is published you don't want it to change. Even if later it turns out to be wrong, it's a record of your progress as a field and no one should be able to sneak in and tweak it after the fact.

  • Distributed: You want teams of researchers, academic organizations and individuals to be able to work together over long distances without doubting that they're all sharing the same base of knowledge.

  • Chronological order: Early work should be early and later work should later- and able to refer to earlier work in a static way without worrying about things being moved around.

These features seem like they could solve two persistent issues in academic publishing. The first is the cost of access. Journals tend to cost a lot, which means that unless you're associated with some academic organization, you're not going to be able to afford them. The second is that research which attempts to reproduce existing results or disprove some previous work isn't interesting to academics (trying to build careers) or journals (trying to sell access), which has led to a replication crisis.

Distributed journals would be free by default (I could imagine some sort of pay-to-access scheme, but it seems like a reach), which would reduce barriers to entry for individual researchers. The cost of hosting the journal blockchain could be shouldered by anyone (or any organization) who wants always-up-to-date access to the latest research, or that just want to contribute. Linux distributions, software and source code are often mirrored by .edu servers for similar reasons.

Distributed journals would allow research to be reviewed by peers drawn from a very large pool (everyone who is active in the journal) which would work in combination with the free-by-default point above to diminish the systems bias toward novel results. You could also measure the precise impact that your work has had on the field through automated citation mapping, which might encourage attempts at replication.

It's easier to work on non-cutting-edge research, I imagine, if you can present convincing metrics showing that you've forever altered the course of scientific inquiry.

So I have some ideas on how something like this could be made, but I wanted to validate the basic idea first. Is there something I'm missing here, something I haven't considered?


Coordination Motivation: The Pandemic

Новости LessWrong.com - 9 часов 37 минут назад
Published on October 21, 2021 9:57 PM GMT

I first started thinking about the meta-coordination 4 years ago, in the context of rationalists arguing about community norms. It seemed to me that people were getting into fights that involved a lot of wasted motion, and failing to accomplish what seemed like obvious shared goals.

For a few years, the bulk of my thought process was a vague, dissatisfied "surely we can do better than this, right?". Many of the people arguing eventually went off to focus on their individual orgs and didn't interact as much with each other. Maybe that was the right solution, and all this worrying about meta-coordination and norm arguments was just a distraction. 

Then a pandemic hit. Coordination became much more practical and important to me, and the concept of coordination pioneering became more directly relevant.

Here were some issues that felt coordination-shaped to me. In this post, I’m speaking largely from my experiences with the Bay Area rationality community, but I think many of the issues generalize.

  • Negotiating policies and norms within a single household. Do you lock down? If so, how do you go about it? What do you do if people disagree on how dangerous covid is, what practices are effective, or what’s worth trading off for safety?
  • Community contract tracing. If someone at a party later gets covid, are people entitled to share that information? How do we negotiate with each other about sharing that information? This includes concerns about privacy, public safety, and how to socially navigate trading those off against each other during a crisis.
  • Maintaining social connection. This might involve negotiation with your housemates over covid policy, or the housemates of your friends. Even if you and a friend each live alone, figuring out what kind of contact to have during a pandemic is at least a two-player game.
  • Housemate swapping/matchmaking. Housemates hadn't generally been selected for "having similar preferences of how to handle pandemics". There were several reasons people might have wanted to relocate. But people also had reason to not necessarily want to advertise that they were looking for new housemates – they might risk antagonizing their current roommates, or airing drama that was still unfolding. Switching houses is also an effortful, high cost decision that was difficult during an already stressful time.
  • Allocation of labor (intellectual and otherwise). There was a lot of stuff to figure out, and to do. There was an initial flurry of activity as everyone scrambled to orient. I think there was a fair amount of duplicate labor, and a fair amount of labor allocated to "figure out wtf is up with the pandemic?" that could have been spent on people's day job or other non-pandemic personal projects.
  • Maintaining organizational sync. Most organizations went remote. I think some organizations can do a decent job working remote, but I think it comes with costs. Some forms of communication translate easily to zoom, and some are much harder when you can’t bring things up briefly without scheduling a call being A Whole Deal. This prompts two questions of “What were the best ways to shift to remote?” as well as “Was it actually necessary to shift to fully remote? Could better coordinated orgs have found ways to stay in person without undue risk?”, or “Were there third options?”

From my perspective, these all feed into two primary goals:

  • The physical and mental health of my social network.
  • The capacity of the rationality and EA communities to continue doing important work. (In particular, this could have been a year where AI safety research made differential progress relative to AI capabilities research. But my sense is that this didn’t happen)

I think all the previous bullet points are meaty topics, that each warrant at least one blogpost worth of retrospective. I’m not sure which topics I’ll end up deep diving into. In this post, I wanted to give a broad overview of why coordination innovation feels so important to me.

“Coordination” is a somewhat vague word to cluster all those topics together with. I think, ultimately, it’s helpful if you can taboo “coordination”, and focus on individual problems and processes. But as I write this, I’m still in the process of thinking through exactly what went wrong, or what could have been improved, and how to cluster those problems/solutions/concepts. In some cases I think the issue was more like "actually making use of existing good practices for coordination (at the object level)", and in some cases I think metacoordination, and the coordination frontier, are more relevant.

What all of those items share is that they are multiplayer games. In each case, individuals made choices, but some good outcomes required multiple people to agree, or to make synergistic choices in tandem.

This blogpost is the first of a few posts for helping me organize my own thoughts.

There are a few frames that stand out to me to look at the situation:

  • Skills that could have helped.
  • Outlooks and orientation that could have helped.
  • Systems that could have helped.
  • Organizational structures or leadership that could have helped.

And then maybe a fairly different frameset around "Who's 'we', exactly?". I think there's multiple scales that it's worth looking at through a coordination lens – a couple individual people, a loose network of friends and colleagues, particular organizations, the vaguely defined "rationality community", and the broader structure of different cities, states, and countries.

Analogies to future crises

I expect to learn many things from a Pandemic Coordination Case Study, that I'd wish I'd known in 2020. But the most important question is "whether/how will this be relevant to future crises?"

It's possible there will literally be another pandemic in our lifetimes, and that many lessons will directly transfer. 

My biggest current worry is "accelerating AI technology either disrupt the economy, and create situations of high-stakes negotiations, where some of the lessons from the pandemic transfer." There are different ways that this could play out (a few individuals within an organization, negotiations between leaders of organizations, government regulation, industry self-regulation, intergovernmental treaties).

And then, of course, there could be entirely novel crises that aren't currently on my radar.


An Idea for a More Communal Petrov Day in 2022

Новости LessWrong.com - 9 часов 43 минуты назад
Published on October 21, 2021 9:51 PM GMT

(This post is roughly based on a memo I wrote for a Lightcone Infrastructure team meeting on the topic of Petrov Day.)

The main thing I want with Petrov Day is a sense of community, trust, and the respect of the principle of taking responsibility for the ultimate consequences of your actions.

I think the current format for Petrov Day has lots of room to grow. I spent an hour or two thinking about what a better Petrov Day would look like, here is a pointer to something we could do next year.

An Idea for a More Communal Petrov Day Ritual

Next Petrov Day, we host a public, online ceremony that 100s of people attend to watch. It is based around the Ceremony Readings Jim Babcock has put together. It involves lots of people taking turns to read quotes, basically everyone who ahead of time signed up to do a reading, and could assure that they had a sane AV setup. It's open invite to anyone to view.

After the ceremony, we run an online Gather Town party for 100s of people. Perhaps it's for LWers only, or perhaps it's open-access for everyone.

During the day, a large red button is on the site. Several months in advance, we open a sign-up to be trusted with codes for the day, and encouraged people to participate, and most people are given the codes. If the button is pressed, the online ceremony is ended / the party shuts down after 10 minutes.

There is an online record of the day. How many people showed up, their names, and who spoke. This is a web-page designed for purpose, somewhat more in the style of the www.dontdoxscottalexander.com site that Jacob and I made.

Possible further ideas

  • Possible idea: A member of Petrov’s family is invited to attend and to give a comment at the end of the ritual.
  • Possible idea: Every year some speaker is invited to give a short talk about what the day means to them. Similar to how there’s an annual moment-of-darkness speech at Solstice.
  • Possible idea: a bit of singing, using Bucket Brigade.
  • Possible idea: we encourage lots of local groups to do their own ceremonies.
  • Possible idea: Somehow a planned false alarm? I would like the red button to have a serious game to it, but I don’t know how to do it every year for multiple decades. Some actual uncertainty every year?
  • Possible idea: To build up the numbers, if you apply and we give you codes, you can have codes every year (though you can remove yourself from the pool if you wish).

Longer Term

The goal with this is to get lots of people to be involved (e.g. 100s each year, eventually 1000s each year) in a communal ritual to respect the day and the principles. It would be an active effort on the part of the LW team to attract lots of people to participate.

One of the things I am motivated by is the desire to have better online ways to build a communal commemoration for the day. Recently I've been thinking that the format we have for publishing ideas on AI and rationality is not ideally suited for rituals (e.g. comment sections encourage critique and disagreement, whereas a ritual is more meant to be shared acknowledgment). I'm interested in suggestions for webpages that would allow a lot of people to feel connected to the other people commemorating Petrov Day.


Petrov Day Retrospective: 2021

Новости LessWrong.com - 9 часов 44 минуты назад
Published on October 21, 2021 9:50 PM GMT

I apologize for not posting this closer to Petrov Day. It’s been a busy month and there was much to think about.

You can view the EA Forum’s retrospective here.

This year was the third Petrov Day celebration on LessWrong in which the site was endangered, and the first year we joined together with the EA Forum. In case you missed it, neither site was taken down, despite 200 people being issued codes that would allow them to do so [1][2]. Huzzah!

Although neither site went down (and thus there's no need for a blow-by-blow analysis of whodunit and why), there are some interesting things to review. In particular, there were some substantial criticisms of the Petrov Day ritual this year and last year that I want to address.

Why Petrov Day

The annual Petrov Day post recounts the basic story of Petrov Day, yet given the questions that were asked this year about what Petrov Day should be, I think it’s right to first revisit why we celebrate Petrov Day in the first place. The following is my own personal take, the one from which I’ve acted, but it is Not Official.

We find ourselves at what may be one of the most critical periods in the history of humanity and the universe. This is kind of crazy–though I’ll refer you to the writings of Holden Karnofsky for a compelling argument for why believing anything else is equally crazy. In the next few decades, we might go extinct (or worse), or we might commence an explosion in progress and productivity that propels us to the stars, allowing us to take the seemingly barren universe and fill it with value.

Petrov Day is a celebration of not going extinct. It’s a commemoration of not taking actions that would destroy the world. It’s about how Petrov chose not to follow policy and relay his alarm because, in his personal estimation, it was probably a false alarm. If he had relayed the alarm, there’s a chance his superiors would have chosen to launch nuclear missiles at the US, and history would be very different.

We can identify two virtues worth applauding in the story:

  1. Choosing actions that don’t destroy the world
  2. Even in the face of pressures otherwise, using one’s judgment to not destroy the world

On September 26th, we celebrate these virtues and attempt to enshrine them in our community. We say to ourselves and others I accept the virtue of not destroying the world, even when there’s pressure to do it! We don’t do this for idle spiritual fulfillment–we do it because there’s a real chance that we or our community may soon face actual choices that resemble Petrov’s. Be it AI, bio, or general policy, our community is represented and our influence is real. As such, the values we take as our own matter.

In addition to the virtues directly displayed by Petrov, we can add others that are important for not destroying the world:

  1. Not taking unilaterally taking large (and irreversible) action
  2. Cooperating / being the kind of person who can cooperate / being the kind of community that cooperates with itself, especially when the stakes are high

Virtues 2 and 3 are in some tension and there’s probably a meta-virtue of judging which to apply. The default principle might be like “use your own judgment to avoid destructive actions; don’t rely only on your judgment alone to take [potentially] destructive actions.”


Eliezer posted about Petrov Day first in 2007 and in 2014, Jim Babcock wrote a ritual guide for a ceremony that people could conduct in small gatherings. At some point, a red button that would end the ceremony was introduced to the tradition. You’d be a real jerk to press it, thereby ending the Petrov Day celebration for everyone.

In 2019, the LessWrong team decided to create a Petrov Day ritual for the entire community by doing something with the website.

I wasn’t involved in Petrov Day that year, but I believe the team then wanted to celebrate all the four virtues I listed above (and maybe others too) as part of a general let’s celebrate the virtues involved in not ending the world. Unfortunately, it’s quite tricky to symbolize 2. (using your own judgment against incentives) within a game. 

In addition to celebrating the four virtues above, LessWrong organizers wanted to further use Petrov Day as an opportunity to test (and hopefully prove) the trustworthiness and ability to cooperate of our community. Symbolism is powerful and it’s meaningful if you can get a large group of people to go along with your ritual. From that arose the challenge of finding N people who wouldn’t press the button. The higher the N we could find who don’t press the button, the more people we would have who are bought into our community–all of them treated the value of the trust-building symbolic exercise as more important than having fun or objecting or financial incentive or anything.

I feel pride and reassurance if I imagine truthfully saying “we have 1000 people that if we give them the chance to be a troll or a conscientious objector or a something–they don’t take it, they hold fast in not taking a destructive action”. The LessWrong frontpage is a big deal to the LessWrong team, and putting it on the line was a way of buying some gravitas for the ritual.

It’s because having N people who don’t press the button is such a powerful idea that people regard the ritual seriously and look poorly upon anyone who’d damage that. We succeeded in 2019 with no one pressing the button, yet failed in 2020. 2021 was to be a high-stakes tie-breaker involving another community.

Although the button(s) wasn’t pressed this year, I actually feel that we failed. We were unable to find 200 people (100 for each forum) who wanted to be part of our community of people who don’t take destructive actions. I don’t know that we failed by a lot, but I think we did. This is our failure as organizers as much as anyone else–we were responsible for choosing people and for designing the ritual.

An Aside About Community Buy-In

There has been criticism that the LessWrong team unilaterally designed and deployed the community Petrov Day ritual, deciding for the community at large what was going to be celebrated and how. I think this is a fair charge.

There are historical explanations for why the Petrov ritual evolved the way that it did, and, separately, principles and policies that can speak to whether that's good or bad.

Historically, building A Big Red Button That Takes Down The Site felt like a pretty straightforward evolution of the tradition people were already enacting in their homes and parties.  It didn't seem like the sort of step that required public discussion or vetting, and that still seems like the correct decision for 2019

Additionally, the team prepared its Petrov Day ritual somewhat at the last minute, and found itself in a position where a big discussion wasn't really a viable option.

Given the choice between a LessWrong team (and an overall community) where people are willing to try ambitious and potentially-cool things on their own judgment, or one where people err toward doing nothing without discussion and consensus, it seems clearly better for 2019 LW to have forged bravely ahead.

(This is actually a good place to distinguish the Petrov Day moral of "don't take irreversible and destructive actions on your own authority" from a more general moral of "don't do anything on your own authority."  The latter is no good.)

That being said, though: community rituals are for the community, and LessWrong is closer to being something like a public utility than it is to being the property of the LessWrong team.  At this stage, it feels right and proper for the community to have greater input and a greater say than in 2019, and without having specific plans, I expect us to put real effort into making that happen well in advance of Petrov Day 2022.  This feels especially important given both that Petrov Day now seems like it's going to be an enduring piece of our subculture, and also that we want it to be.

Not Getting Opt-In

Speaking of consulting the community, the 2021 ritual consisted of making people part of the game involuntarily by sending them launch codes. I see a few different complaints here.

The first is that launch codes are hazardous. Because the Petrov Day ritual is treated seriously (more on this below), someone who enters them (or just talks about entering them!) is subject to real social sanction, up to and including it affecting their job prospects. Our community takes character judgments seriously, and it's not at all clear what aspects of something like Petrov Day are "off limits" when it comes to evaluating people's cooperativeness, trustworthiness, impulsiveness, and general judgment.

In a world where the letter containing the codes was unambiguous about the cultural significance and the stakes of the Petrov Day ritual, I think receiving the launch codes would only endanger the highly impulsive and those with poor reading comprehension (and those should reasonably affect your job prospects). However, I think the way I wrote this year’s letter could be interpreted as a “Murder Mystery” invitation by someone not aware of the Petrov Day context. Plus, the letter didn’t explain the cultural significance to people who hadn’t been following along the LessWrong Petrov Day celebrations in last two years, which especially seems like a misstep when reaching out to a whole new subculture (i.e., the EA Forum).

I screwed up on that account and I’m sorry to anyone I put at risk. If you had pressed the button, it would have been on me.

The second–and I think more serious–complaint around lack of opt-in is that it leaves people who object to the ritual with no good option. If you don’t press the button, you are tacitly cooperating with a ritual you object to; if you do press it, you’ll have destroyed value and be subject to serious social sanction. 

Moreover, the organizers (me, EA Forum staff) have declared by fiat what the moral significance of people’s symbolic actions are. This goes beyond just deciding what the ritual is and into deciding what’s good and bad symbolic behavior (with strong social consequences). While the Petrov Day ritual might be innocuous, it is a scary precedent if LessWrong/EA Forum organizers freely shape the moral symbolic landscape this way, without the checks and balances of broader community discussion.

I think this is fair. and this makes me realize that the LessWrong team has more power (and therefore more responsibility) than we previously credited oursevles with. We set out to build culture, including ritual and tradition, but it’s another matter to start defining the boundaries of good and bad. I think possibly this should be done, but again probably with more community consultation.

Why So Serious

Related to both complaints is the fact that Petrov Day has been treated increasingly seriously. It’s because it’s serious that people will sanction you if you press the button. And it’s because you believe it’s too serious that you might want to object/boycott the ritual (well, that’s one reason).

I think the degree of seriousness that the ritual is treated with is one of the questions that should be reconsidered next year in consultation with the community.  It's possible, for instance, that Petrov Day should be a place where some amount of mischievousness is considered fair game, and not representative of someone's global character.

Notwithstanding, I personally want to defend the position that a very high degree of seriousness is appropriate: a serious ritual for a serious situation. The stakes we find ourselves facing in this century are genuinely high–astronomical value vs extinction–and it makes sense to me to have a ritual that we treat with reverence, to celebrate and encourage values that we treat as somewhat sacred. Or in short, things matter, so let’s act like they do. I don’t know that this argument will win out on net, but I think seriousness should be considered.

Aside from a general position that Petrov Day should not be serious, some have argued in particular the most recent Petrov Day ritual should be lighthearted because the only thing at stake is the LessWrong/EA Forum page going down. My response to that is sadness. There is understandably inferential distance between the LessWrong team and others about how valuable LessWrong is and what it means to take the site down for a day. As I wrote in the Petrov Day post:

One of the sites [LessWrong, EA Forum] going down means hundreds to thousands of people being denied access to important resources: the destruction of significant real value. What's more it will damage trust between the two sites...For the rest of the day, thousands of people will have a hard time using the site, some posts and comments will go unwritten.

LessWrong is not a mere source of entertainment. It’s a site whose content shapes how people think about their lives and make major decisions. If there was a person who was going to have their life changed by LessWrong (and this happens to many) who fails to because the site is down, that’s a tragic loss. 

LessWrong is also used as a major communication tool between researchers. LessWrong being offline is not so different from removing a day from a major research conference. Or, to change tack: the operating budget of the LessWrong website has historically been ~$600k, and this budget is artificially low because the site has paid extremely below-market salaries. Adjusting for the market value of the labor, the cost is more like $1M/year, or $2,700/day. If I assume LessWrong generates more value than the cost required to run it, I estimate that the site provides at least $2,700/day in value, probably a good deal more.

Still, if we want stakes for the ritual/exercise/game, probably better to use something with lower inferential distance. It’s on me as an organizer to mistakenly think that just because I think something is valuable, that will be transparent to others, and given that, I accept that it’s on me  that not everyone thought the last Petrov Day iteration should be a big deal.

I could imagine it being better if there’s $5-10k that simply gets burned if someone presses the button rather than going to some worthy cause. Either way, this debate has clearly not properly taken place.

For an idea of what next year could look like, see these notes from Ben Pace

An Aside: Repeating Mistakes

Many of the issues pointed out this year were pointed out last year. It’s a real failure to not have addressed them. This is my (Ruby’s) fault. I took over organizing Petrov Day this year (inviting the EA Forum to join LessWrong) but didn’t go back and re-read through the previous year’s comments. Had I done so, I could have avoided repeating some of the previous mistakes.

I do think that repeating mistakes is quite bad and am quite sorry for that.

Wrapping Up

Stanislav Petrov was on duty at a particularly fraught time in history. I think we are, too. This makes it imperative to think about the kinds of decisions we might face and prepare ourselves for them. It makes it crucial that know and practice our values and principles, so that we can rely on them even when temptations are strong or matters are unclear.

Rituals and traditions are what keep people true to their values. Having them or not might be the difference between us being a community that can succeed at its ambitious goals vs not–the difference between colonizing the stars and annihilation.

I regret the flaws of Petrov Day rituals so far, but I’m excited to keep iterating and innovating so we can make these essential values part of our community, cultures, and selves.


Feature idea: Notification when a parent comment is modified

Новости LessWrong.com - 21 октября, 2021 - 21:15
Published on October 21, 2021 6:15 PM GMT

Not sure how many people would consider this feature useful: Imagine that you reply on someone else's comment, and the person edits their comment later. I think it might be useful (perhaps depending on circumstances) to get a notification.

Notification "XY edited a comment you replied to" should appear at the same place as when you get a reply. In perfect case, the tooltip would highlight the difference between the original and the updated comment.

Use cases that I imagine:

  • Person A makes a comment. Person B makes a reply disagreeing with the original comment. Upon reading this, person A changes their mind and updates the original comment like this: "[EDIT: Actually, B makes a good argument against]". This feature would show this information in person B's inbox, without A having to write a separate reply to their comment.
  • Person A makes a comment. Person B makes a disagreeing reply. Person A silently updates their original comment to make B's response seem silly.

This feature would apply only to comments that reply to comments, i.e. not to the top-level comments, because I assume that minor modifications of articles are sometimes too frequent (and would flood the inboxes of top-level commenters), and because more people would notice a substantial stealthy article edit.

An argument against this feature is that some people can make frequent insubstantial edits to their comments (e.g. fix typos), which also could flood the inboxes of the repliers. Then this feature would be annoying. Possible solutions:

  • Multiple unread notifications for the same comment are merged into one.
  • Some heuristic (e.g. only punctuation or isolated words are modified) to detect insubstantial edits.
  • Or an opposite solution (covering only the first use case), where a heuristic would identify substantial edits (e.g. where the added text contains a word like "edit", "update", "change", "modify", "remove", "delete").

Your opinions?


Stoicism vs the Methods of Rationality

Новости LessWrong.com - 21 октября, 2021 - 21:07
Published on October 21, 2021 6:07 PM GMT

crossposted from spacelutt.com

Determining Control

The only thing that’s really out of your control is things that happen in the past, since time really only flows forward.

The “Challenging the Difficult” sequence in The Sequences is about how often you’ll be wrong at labelling something “impossible” (which is of course synonymous with “outside of your control”, except in “outside your control” being even more retreat-y than impossible as it implies another human can do it, just not you).

Your body will be ultimately destroyed, but this is not seen as bad, since it is out of your control.

Even if you can’t control it, it still seems bad.

It seems wrong to be changing your definition of bad based on what’s in your control or not.

Care, but don’t turmoil

Just realised: this is particularly bad when it comes to things you don’t have personal stake it. If you were to say “I don’t have control over my fatigue problem” too soon, they’re you’re gonna still want to think about it like it’s in your control and still go wild with attempts and ideas, because it still effects you, there’s no way of getting out of your sleep problem mattering in your world. You still feel it, whether or not it’s in your control.

But on a larger scale…

Eg. if some random kid in Africa is starving to death and someone tells you this and helping would be inconvenient, you could very well say “it’s not in my control” and totally forget about it and have it lose it’s effect on you. But the kid still starves, but the kid is not in your world. Thus you’re not motivated to really, really try before you actually declare something is impossible.

This is even more true when discussing abstract threats to the future.

I have seen many people struggling to excuse themselves from their ethics. Always the modification is toward lenience, never to be more strict. And I am stunned by the speed and the lightness with which they strive to abandon their protections. Hobbes said, “I don’t know what’s worse, the fact that everyone’s got a price, or the fact that their price is so low.” So very low the price, so very eager they are to be bought. They don’t look twice and then a third time for alternatives, before deciding that they have no option left but to transgress—though they may look very grave and solemn when they say it. They abandon their ethics at the very first opportunity. “Where there’s a will to failure, obstacles can be found.” The will to fail at ethics seems very strong, in some people.

Self Deception

If you’re able to manufacture indifference to anything, then you’ll be motivated to get more false negatives on things outside of your control so that you don’t have to go through the hard, hard work on changing them, since you have no emotions to spur you.

No Time-Wasting

Marcus Aurelius once wrote “You could leave life right now. Let that determine what you do and say and think.” That was a personal reminder to continue living a life of virtue NOW, and not wait. It was a form of memento mori – an ancient practice of meditating on your mortality.

This seems true and a great point. Especially since we don’t actually know death’ll be solved when the singularity comes.

The Memento Mori part seems solid. *Nods in approval*.

Even if death is solved, I wouldn’t excactly want to be putting things off all the times. The one who does doesn’t wait and puts finishing touches on their life and character all the time (the best they can at the time, knowing more growth is to come) I do think will have life better, will get more satisfied, will waste less time, and will regret less is something does happen.

Negative Visualisation

I have often mention how the phenomenon of Hedonic Adaptation means that we constantly get used to the things we have and then begin to take them granted. Negative visualization is a simple exercise that can remind us how lucky we are. The premise is simple, just imagine that bad things have happened, or that good things have not. You decide the scale of the catastrophe:

  • Losing all your possessions
  • Never having met your spouse
  • Losing a family member
  • Losing a sense such as your sight or your hearing.

You can also imagine how situations that you are about to embark in will go wrong.

While you may think that this type of pessimism is not conductive to a happy and fulfilling life, it can actually turn your life into pure gold by making you realise that all these bad things have not happened to you.

Seems solid, gratitude in general seems pretty scientifically backed up.

Eliezer literally on stoicism: https://www.lesswrong.com/posts/vjmw8tW6wZAtNJMKo/which-parts-are-me


Noticing the Value of Noticing Confusion

Новости LessWrong.com - 21 октября, 2021 - 21:04
Published on October 21, 2021 6:04 PM GMT

Crossposted from spacelutt.com

Your strength as a rationalist is your ability to be more confused by fiction than by reality. Either your model is wrong or this story is false.

Your model of the world is how you understand the world to work. If I think ice is frozen water, and ice is frozen water, then my model of the world is right. If I’m six and I think Santa Clause is the one who brought me presents, when really it was my parents, then my model of the world is wrong, whether I know it or not (and the whole point is that you don’t, because if you knew your model was wrong it wouldn’t be you’re true model anymore.)

Confusion is not understanding. If you are never confused, it means you understand everything all the time. It means your model of the world is exactly perfectly right.


Looking at it this way, it’s extremely obvious why over the past year as I’ve been practicing the skill of noticing my confusion more there’s been more and more opportunities until it’s just a constant, never-ending flow. So many micromysteries.

And it’s not only lying that it prevents you against. I’ve caught out lots of people in semi plausible lies, with the thought pattern being “If x, then almost definitely y, and I see no y so more likely you’re lying” and then I’m right and I feel really good.

One of the key aspects is knowing that your model is supposed to forbid things. People lying to you is a very obvious time when the story you’re being fed is wrong. But also you telling yourself an inaccurate story is the same as a lie for these purposes, as it’s another opportunity to say “hey wait, this doesn’t quite fit” and then say “I’m wrong!”.

Like when you’re eating dinner with someone, and they go to get up and you don’t know why they left. 99% of the time people get up, it’s to go and get water. So you assume they went to get water, and think you have understood what is going on.

But their cup is still sitting on their tray.


And after you’re good at noticing when you don’t understand, you get to play one of the funnest games available to man: trying to figure out what the hell is actually going on. And this is the bit where your brain is forever warped, this is the bit for why Eliezer says this is your strength as a rationalist. Did you notice your confusion when Eliezer said this, just this measly tidbit brain pattern was your strength as a rationalist?

It’s because once you start noticing all the times you don’t understand something, you can start actually trying to understand. You can start throwing out hypothesis, and weighing them based on their probabilities!

That’s right!

Like Harry James Potter Evens Verres!!!

And you know once you’ve done it right because every puzzle piece clicks together. It all makes sense now.

Or, and this is almost equally fun, you find that you’re model actually, legitimately, has no explanation for wtf just occurred.

Why would someone stand up at dinner in the school cafeteria, leaving their tray, phone, and cup?

  • To fill more water
    • Unlikely, they would’ve taken their cup
  • To get more food
    • Unlikely, they would’nt have gone to get more if they hadn’t finished what was already on their plate
  • To talk to one of their friends
    • Unlikely, they don’t have that many friends
  • To get dessert
    • Unlikely, why wouldn’t they have taken their bowl

What the FUCK is going on?

It’s important not to settle for one hypothesis just because it’s the best you have even if it doesn’t make sense. Even though I had no better explanation, I insisted that the hypothesis that the person I was with went with – that they went to get more water – didn’t make sense because they would’ve taken their cup.

So I sat in excitement, waiting to find out what really happened. Waiting to actually learn something new about the world.

You shouldn’t be able to explain things that your model of the world doesn’t explain. If you don’t bring these moments of small confusion to the forefront of your brain, you can never truly learn because you can never really feel the gaping hole in your understanding!

Noticing confusion is really fun, and pretty important. Being able to throw out hypotheses and weigh them based on probability is really fun.

Can you guess why they actually got up from dinner?

It was to speak to their teacher.


What's Stopping You?

Новости LessWrong.com - 21 октября, 2021 - 19:20
Published on October 21, 2021 4:20 PM GMT


This post is about the concept of agency, which I define as ‘doing what is needed to achieve your goals’. As stated, this sounds pretty trivial - who wouldn’t do things to achieve their goals? But agency is surprisingly hard and rare. Our lives are full of constraints, and defaults that we blindly follow, going past this to find a better way of achieving our goals is hard.

And this is a massive tragedy, because I think that agency is incredibly important. The world is full of wasted motion. Most things in both our lives and the world are inefficient and sub-optimal, and it often takes agency to find better approaches. Just following defaults can massively hold you back from achieving what you could achieve with better strategies.

Yet agency is very rare! Thinking past defaults, and your conception of ‘normal’ is hard, and takes meaningful effort. But this leaves a lot of value on the table. Personally, I’ve found cultivating agency to be one of the most useful ways I’ve grown over the last few years. And, crucially, this was deliberate - agency is a hard but trainable skill.

Why Care?

I’ve been pretty abstract so far about the idea of agency. A reasonable reaction would be to be pretty skeptical that there’s any value here. That you already try to achieve your goals, that’s what it means to have goals! But I would argue that it’s easy to be missing out on opportunities and better strategies without realising it. An unofficial theme of this blog is that the world is full of low-hanging fruit, you just need to look for it. Further, many of the people I most admire, who’ve successfully changed the world for the better, had a lot of agency - changing the world is not the default path!

To make this more concrete, I want to focus on examples of how agency has been personally valuable to me. The times I managed to step off the default path, look past my constraints, and be more ambitious and creative about how I achieved my goals.

  • By far the most significant was realising that I could take agency with my career and life path. That I could step away from the default of continuing my undergrad to a fourth year and masters and then doing a maths PhD or working in finance. And instead, I’ve spent the past year doing 3 back-to-back AI Alignment research internships, trying to figure out if this might be a path for me to have a significant positive impact on the world.
    • This was an incredibly good decision that led to much more personal growth. I now feel much less risk-averse, am a better engineer and researcher, have a much clearer idea of what the AI space is like, and have a much more concrete view of why AI Alignment matters and what progress on it might look like.
    • Further, I now have a job I’m very excited about, and a much clearer picture of what I want to do over the next few years.
    • Agency is relative to your context, and your defaults. I expect some people would have found this decision easy, but I found this surprisingly hard. I’d intended to do a fourth year for ages, and most of my friends were doing it - this strongly felt like the default, and doing something else felt risky and scary.
  • Being ambitious about taking on personal projects, rather than the default of being risk-averse and shying away from fear of failure and putting in effort
  • Realising that I can improve my social life by taking initiative - breaking free of the default path of forming surface level friendships with the people I run into naturally, and never putting myself out there.
    • Intentionally making close friends - practicing and learning how to form emotional bonds with friends has made my life so much better, and hopefully helped improve their’s too!
    • Taking social initiative - By getting better at reaching out, I’ve gotten really valuable mentorship and advice, job opportunities, and friends I now cherish
    • Having awkward conversations about ways I thought I’d hurt someone or being hurt, apologising, and growing closer rather than letting things fester
  • Improving myself and my life. Breaking out of the default mode of helplessness and realising that problems are for fixing, that I have the capacity to make things better.
    • Underlying my feelings of guilt, obligation and not meeting my standards, and learning how to manage this better
    • Understanding my own motivation and learning how to become excited about what I’m doing, or how to find things I am excited about.
    • Smaller everyday things - noticing small things which annoy me and fixing them, or items that could improve my life and buying them.
    • More generally, the spirit of self-experimentation, seeking novelty and doing new things, and being willing to go outside my comfort zone to see what that’s like. This has led to things from trying out pole-dancing to going round a room and asking people for sincere criticism.

As those examples hopefully illustrate, agency has been extremely valuable for me, and my goals. But it is not my place to tell you whether agency is right for you. Agency can be hard, stressful and exhausting! Sometimes the defaults are good enough. Instead, my goal in this post is to present the mindset of agency, and make a case that it can be valuable to you, for achieving what you want.

Exercise: What do you want? What are your goals? What are your dreams, your ambitions? How do you want to change the world? What is missing in your life? Take a minute to reflect on your favourite prompt before moving on.

Exercise 2: What’s stopping you?

What is Agency

When stated as ‘doing what is needed to achieve your goals’, agency feels like a pretty simple concept. But implicitly, I’m gesturing at a messy bundle of different skills. In this section, I want to break agency down what agency is, and the most important mindset and sub-skills. These are neither comprehensive nor compulsory for achieving agency, but I’ve found all of them valuable to cultivate:

  • Noticing and avoiding defaults: A core lens I view the world through is that our lives are full of defaults. Expectations that society places upon us, social norms that we follow, our own conception of our role and expected duties and path through life, our own conception of what is ‘normal’ vs ‘weird’. A key part of agency is noticing when these constrain you, and being willing to break them.
    • Part of this is avoiding groupthink - being able to think non-default thoughts, think for yourself, think things through from first principles, and deeply caring about having true beliefs.
      • Eg the kind of person who was concerned about COVID at the start of February, or someone who grew up in a deeply religious household and decided to de-convert.
    • Note that agency is not about knee-jerk nonconformity. You need to be willing to not conform - that’s what it means to avoid a default. But non-conformity fails to be agency when it’s about not following defaults because they’re defaults, rather than to achieve your goals - that’s still letting defaults control you. Instead, strive to notice your defaults, and check whether you want to avoid it.
      • For example, trying hard in exams is often the conformist choice. But if I care about getting a great job and my grades matter for that, then choosing to try hard is the agentic choice, and slacking off is not.
  • Finding opportunities: Noticing when there are opportunities to achieve my goals, chances to take agency. And being good at making my own opportunities, and actively seeking them out, rather than waiting for them to fall into my lap
    • Part of this is creativity - being able to see what other people don’t see, and generate a lot of ideas
    • Part of this is not thinking in defaults - being open to weird and unusual ideas, and taking them seriously
    • To me, this feels similar to the mindset of learning the rules to a game, and reflexively looking for ways to exploit, munchkin and break them.
  • Intentionality: Understanding why you’re taking actions, and keeping your goals clearly in mind. Being mindful of wasted motion, and checking whether you’re actually achieving what you want. Being deliberate.
    • Note that this is separate from identifying your goals in the first place. Spending time thinking about and eliciting your deep goals and values, the skill of prioritisation, is incredibly important. But it different enough to be worth separating from agency, which is about building on those goals.
      • Though this is a fuzzy boundary - we often absorb default goals from our social context, eg making money or seeking status. And it takes agency to look past this and check for what you actually value.
  • Taking action: Ultimately agency is about doing things to achieve my goals. It is important to learn how to convert thoughts and vague intentions into actions and change.
    • But agency is not just about having willpower and putting in effort, being able to act rather than procrastinating or always following the easy path. Agency is also about being strategic and intentional. For example, a hard-working student who exerts a lot of effort to read and re-read their notes exhibits less agency than a student who learns it with half the effort by spaced repetition, or who realises they don’t care about this course and drop it.
  • Ambition: Thinking big, and not being constrained by small-mindedness. Often you can achieve things far more important than it feels at first. We are bad judges of what we are capable of, and it is a tragedy to let a lack of ambition limit what you can achieve.
    • This can look like many things - being ambitious in changing your life, believing that you can make progress on your biggest problems rather than being helpless; being ambitious as a researcher by identifying the most important problems in your field and working on them; being ambitious about having a major positive impact and making progress on the world’s biggest problems, eg believing you might be able to save hundreds of millions of lives.
      • An underlying insight here is the importance of fat tails and upside risk
    • Though note that agency is not just about ambition, it is also about being intentional. Agency looks like actually trying, not just doing actions that seem defensible under the pretext of some grand vision. Trying to understand the problem, finding the points of leverage, and forming a theory of change for how to achieve your ambition. Noticing when your strategies aren’t working, learning from this, and doing something differently.
  • Don’t be a bystander: A particularly poisonous mindset that can hold you back from agency is the bystander effect. Saying that something is “not my problem”, implicitly relying on someone else to fix it. Asking whether you have to do something about it, rather than whether you want to, or whether doing it could bring you closer to your goals or help others. Framing things in terms of blame and obligation, rather than asking if you could do something about it..
    • This can be local, personal problems or larger problems in the world. From realising you can take agency and fix the things in your life that make you unhappy, like poor sleep, to contributing to large global challenges like climate change or pandemic prevention
    • Part of this mindset is taking responsibility - realising that you can do things and influence the world, and that by taking it upon yourself to fix or improve something the world will be better than if you did nothing. That if you just rely on others, the world will be worse.
    • Part of this mindset is to pick your battles - taking responsibility can itself be poisonous if you make everything your problem, every single thing that is wrong in the world and your life, and feel guilty if you fail to solve them. Focus on the problems you most care about and have leverage on.
      • Further, sometimes you will never be able to “solve” a problem. Making progress on a big problem can still be really valuable. And flinching away from problems so large they feel unsolvable can also hold you back.

A final note after all this discussion of what agency is and isn’t. In practice, it is rarely sensible to ask “is this actually agenty enough?” and imagining being able to justify your agency - that pushes me towards doing things that are clearly and legibly weird and original. Instead, agency is relative to your goals, and your defaults. A socially anxious introvert who decides to throw a party demonstrates far more agency than a confident extrovert who does it every weekend. Agency is what you make of it. The important question is whether it helps you achieve your goals, not whether it looks appropriately brave or non-conformist.

The agency to improve the world

An application of agency that is particularly important to me is taking agency in improving the world, in finding the most effective ways to have a positive impact. Agency is important here because you can do far more good than you do by following default ways to do good - I see this as one of the key insights I’ve gotten from the Effective Altruism movement. You can achieve far more if you look for missed opportunities, way to leverage your limited resources, ways to be far more ambitious and aim for a chance of having a major impact.

The mindset of taking responsibility for contributing to the world’s problems treating it as ‘not my problem’ can be particularly important here, and represent the difference between doing something and nothing. Eg, realising that you can actually put meaningful effort into fighting climate change, rather than just recycling and being environmentally conscious. But I think this mindset can harm people, so I wanted to give my take on how I view this.

Often, this can be framed in terms of obligations. That the world is full of problems, it is my job to fix them, and I have to do this. I reject that mindset. You don’t have to do anything. If you don’t care about problems, that is your prerogative.

Instead, I think in terms of my values. Over the course of my life, I have the capacity to influence the world towards my values, and it is my responsibility to do something about this. But this is not some weighty obligation to resent and feel guilty towards - these are my values and I actually care about doing something about it. Personally, I care deeply about human flourishing. I have the capacity to influence the world to be a better and safer place, and it is important to me to do something about this, to take agency and be ambitious about it. I’m a big fan of Nate Soares’ thoughts here.

Cultivating AgencyWhat’s Holding You Back?

Agency can be pretty rare. And part of why it’s rare is that it’s hard! And in particular, lots of things make it harder to be an agent. And before diving how to develop agency, it’s worth examining what’s holding you back, and seeing which things you can relax. Even if you can’t solve the things holding you back, often just identifying them can help!

There are two important categories here, defaults and constraints.

  • Defaults
    • Default roles and expectations
      • Eg, that the role of a good student is to do well in assignment, so you pour all your effort into your degree, without checking how much you care
      • This is particularly bad with expectations around careers and life paths. This is one of the most important decisions you’ll ever make!
        • Eg, the idea that a maths student will go into software, finance or academia, and you just need to find the best option
        • Eg, having Asian parents who insist that you need to become a doctor or a lawyer
    • Social norms - a deep sense of what is “normal” vs “weird”
      • Eg, feeling unable to skip small talk and talk about something interesting, even if you think both of you would enjoy it
      • Of course, often following social norms is the right call! But it’s valuable to see it as a trade-off, with benefits and consequences, rather than an iron-clad rule that induces anxiety to violate
    • Cultural narratives and groupthink - the ideas you’re taught when you’re young, and the things everyone around you believes
      • Politics is particularly bad for this - it’s very hard to hold right wing ideas if all your friends are left, and vice versa
      • It’s worth noting this even when you think you agree with the idea! Eg, I’d find it pretty hard to conclude that COVID vaccines aren’t worth taking given my social circles. I also genuinely think the vaccines are great, but it’s much harder to be truth-tracking here!
    • Default strategies - Instrumentally achieving your goals the way all your friends do. This is much worse with peer pressure, but even just a sense of what is “normal” can hold you back
      • Eg, learning by going to lectures, making notes, and revising by re-reading and doing past papers
      • Eg, letting keeping in touch with friends happen naturally and spontaneously, rather than being deliberate and intentional about it
    • The illusion of doing nothing - The feeling that doing nothing is safe, that you can’t be blamed, and that taking action is scary. And thus procrastinating on actually doing anything.
      • This is much worse when I’m considering doing something with risk!
      • And when dealing with option paralysis
  • Constraints - When you don’t have enough of some important resource, and feel constrained. Fundamentally, lacking Slack in your life.
    • Money
    • Energy levels - it’s much harder to be agentic when you’re tired all the time!
      • Note, if you commonly feel fatigued, I highly recommend getting tested for the most common causes of fatigue! Some, like iron deficiency or hypothyroidism, are easy to treat and easy to test for but often go undiagnosed.
      • There can be more mundane causes - poor sleep, poor diet, lack of exercise, etc
    • Time
      • Note - it’s worth distinguishing between not having enough time, vs not being able to use time well. I find that when I’m stressed, I always feel like I don’t have enough time
    • Commitments - overcommitting your time, having a packed schedule and a long list of obligations
      • On a deeper level, sometimes the problem is feeling unable to quit things, and not knowing how to say no when someone asks you to take on a new commitment.
      • I’ve recently gotten value from making a form I need to fill out when taking on any new commitment, estimating how much time and effort it’ll take, and checking whether it’s actually worth it
    • Mental health
    • Attention/focus - are you constantly distracted? Do you ever have long blocks of at least 2 hours where you’re confident you won’t be interrupted, and can think and reflect on things?
    • Physical health
    • It’s worth checking how much of the resource you actually need, and trying to quantify things. Or whether you can systematise taking care of it. Often the problem is less the constraint itself, and more the mental space tracking it and stressing over it takes up. Eg, it’s much better to make and keep to a budget than to constantly stress over how much money you spend.

Of course, just noticing a default or constraint is much easier than solving it. So what can you do? This is hard to give general advice on, but often noticing is the first step to doing something about it. Some personal examples:

  1. I noticed the constraint of not having enough mental space to try new things or take agency, due to a lack of Slack and too many commitments. In the short term, I carved out my Sunday afternoons to relax and work towards non-urgent stuff, or whatever I was excited about, and carved out time for weekly reviews, to reflect on my longer-term goals and on what opportunities I was missing when too in the moment. In the longer-term, I’ve set myself a much higher bar for future commitments, quit the lowest priority ones, and am slowly reducing my load.
  2. I noticed that I cared a lot about what felt normal and safe, vs weird and unusual, and found it took a lot of willpower to deviate from this. In the short term, I found particularly useful kinds of weird things to do, and practiced doing them. In the longer-term, I’ve tried to surround myself with friends who are ambitious, altruistic and agentic, and I am sufficiently socially influenced that this has helped me get better at overcoming defaults in general.

Exercise: What’s stopping you? If you suddenly became significantly more ambitious, what would you want to do? And what’s holding you back from doing that now?

Feeling Agency

The main path to cultivating agency, as I see it, is to practice! To initially do agentic things occasionally, and with effort. To (hopefully) have them go well, and get positive reinforcement. And slowly practice and develop the mindset of agency, and have the mental moves feel smoother and more reflexive, until this is something you do naturally.

To make this more concrete, I find it helpful to reflect on what agency feels like, and how to make each part of the process smoother and more natural.

  1. First, I have the spark of an idea, something to do. Noticing some opportunity, inefficiency, or a desire to do something interesting.
  2. Noticing and nurturing that spark of an idea. Resisting the urge to instinctively flinch away from it, and actually thinking about it. Checking whether I actually want to do it, but actually checking, not just flinching away from something weird and risky. Exploring the idea, understanding it, and figuring out what I could actually do
  3. Taking action - finally, actually doing something about it! Being concrete, avoiding overthinking, procrastination and option paralysis, and actually doing something.

So, how to make each of these smoother? Some immediate thoughts:

  • Ideas
    • The main thing is to be creative, and to open myself up to new ideas
      • Making time to reflect, think and brainstorm. I really like 5 minute timers for this
      • Read a wide range of interesting things, and try to step outside your bubbles
      • Talking to other people
        • Note - it’s important to distinguish between “I am doing an idea because someone told me to”, without checking whether you want it, and “I genuinely want to do this idea someone else suggested”
    • Be open to weird ideas - notice the ideas you immediately flinch away from. Notice if there are things you see someone else do that you’d never have thought of. Notice your default patterns of thought, and what these close you off from
    • Ambition - Some of my best ideas come from being unafraid to think big.
      • Ask yourself, “What would I do if I was a way more ambitious person?“
      • If trying to solve a difficult and intractable problem, ask “If I managed to completely solve my problem, what happened?”
  • Noticing and nurturing
    • Filters: Notice the filters in your head that cause you to flinch away from an idea. And check whether you actually want to follow these constraints, and whether they are connected to your goals, or just reflexes stored in your head.
      • A big one is risk-aversion - I often flinch away from ideas because it could go wrong, and this feels scary
        • But, please do actually check for risk! Many people are not agenty enough, but some are too agenty, and if you have a lot of agency and don’t check for risk, you can really hurt yourself
      • Fear of judgement, and a desire for social conformity
      • A personal sense of identity - it’s hard to do something that doesn’t feel “me”. For example, on some level I identify as nerdy and averse to exercise, and this makes it way harder to entertain ideas involving exercise
      • Often this is particularly hard because the flinch happens on auto-pilot, but noticing it takes self-awareness. For this problem, I find the technique of noticing emotions particularly helpful
    • Redefine failure: Often I flinch away from an idea because I’m afraid of failure. In this case, I find it useful to redefine what I’m aiming for, what success/failure actually mean. If you can pull this off, and orient towards something meaningful, failure is literally no longer an option!
      • A big one for me is making taking agency my goal. Deciding that this is a skill I want, and that any time I take agency I’m making progress.
        • Building it into my identity, and trying to become a person who actually does things
        • In the long-term, you don’t just want agency without good judgement. But I find it is much easier to first become able to do things, and then filter for the things most worth doing, rather than trying to do both at once.
      • Seeking novelty. It’s easy systematically under-explore, and don’t try new things enough, because the costs of not doing the standard option are concrete and visceral, while the benefits of discovering a new and better option are abstract. To counteract this, I try to build novelty seeking into my identity
      • When I’ve redefined failure, I also get past the endless agonising over whether this is the “best” thing to be doing. I’m no longer analysing the object level action, I’m making progress towards the kind of person I most want to be.
    • Be concrete: It’s easy to flinch away from an unknown idea without properly exploring it. Assume you do explore the idea, and try to make concrete what you would actually do. Suppose it goes wrong, and flesh out exactly how bad this would be, and what you could do about it.
      • It’s much easier to entertain an idea than to actually take action, and this helps reduce the flinch
  • Taking action
    • Be spontaneous and fast!
      • Create tight loops between taking actions and getting results.
      • Put a big premium on doing something now rather than later. Don’t leave enough time for motivation to fade
      • Have easy ways to rapidly commit to something. Eg, messaging the intention to a friend, putting it in your to-do list, scheduling a time to do it properly in your calendar, etc. Put in effort beforehand, to enable yourself to be spontaneous in the moment
    • Avoid overthinking and overplanning. Identify a rough plan, and a concrete first step, and then take it - you don’t need a perfect plan to start doing something.
    • Set a 5 minute timer, and try to do as much as you can before the timer goes off - often this is enough to get enough activation energy
    • Try to have a clear goal/intention when acting, don’t just go through the motions
    • I collect some related strategies in Taking the First Step
  • Reward - Finally, you want to feel good about taking agency after the fact! The main value of practicing agency is getting better at the skill, and for that you need positive reinforcement.
    • Ideally this happens automatically, if you take agency towards good things!
      • And if you take agency towards actively bad things, it’s worth checking whether agency is a skill you actually want to cultivate
    • Redefining failure really helps here - if you can get excited about seeking novelty or taking agency, it’s much easier to get strong and rapid positive reinforcement
    • Seek tight feedback loops - practice agency on small things where you’ll quickly find out whether it was a good idea
Concrete Advice

The following is a grab bag of more concrete ways to cultivate agency and put this into practice. Some of these are contradictory, and aimed at different people - look for the ones that resonate, and might be of value to you!

  • As outlined above, practice! Look for opportunities for agency in everyday life, and take them for the sake of practicing the skill.
    • This can be incredibly minor things, eg being the one who gets up from the table to refill an empty water jug.
  • Make time to regularly reflect. I am a big advocate for weekly reviews
    • Prompts like “what would I do if I was being really ambitious?” or “what opportunities came up recently?”, “what am I missing?” or even “how could I be more agentic?” Can be really helpful
  • Try to take ideas seriously. Notice when you flinch away from something because it feels weird or effortful, and actually think through the pros and cons. Give yourself permission to be weird and ambitious and to actually try
  • Notice the defaults in your life, and make efforts to step past them. Try new things! Expose yourself to new ideas, and new ways of thinking. Make friends very different from yourself.
  • Take care of yourself! Notice if you’re spreading yourself too thin. Make sure you have energy, good health, and take care of anything causing stress or taking up mental space. Build good systems. If these things are going wrong, aggressively prioritise dealing with it.
  • Take an action orientation - try to default to saying why not, rather than no. Be willing to experiment and see what happens
  • Seek mentors and role models who have agency, and who you can learn it from
    • Note - there are different kinds of mentorship, and most will not give you this
    • Personally, I find that by far the best way to teach agency to other people is via 1-1 conversations. Understanding what the other person wants and their problems and constraints and defaults. And making suggestions for ways to do something differently and take agency. And helping them check whether this is actually what they want, and then making the intention concrete and putting it into practice

Exercise: Did any of these resonate? What is something concrete you could implement in your life? Set a 5 minute timer, and do something about it right now.


Most of this post has been cheerleading for agency, and treating it as a virtue. But it’s worth reflecting on the drawbacks, and ways too much agency can hurt you. Some particularly notable drawbacks:

  • The attractor of non-conformity. Feeling uncomfortable doing anything normal, and defaulting to ignoring standard wisdom unless it’s obviously true. Sometimes things are done for a reason!
  • There are social consequences to weirdness, especially in certain cultures and social circles. This makes me sad, and I strongly advocate looking for friends who will help nurture your agency, but it is a real cost and consequence
  • Agency adds a lot of variance to things. The default path is normally fairly safe, while agency opens up a lot of new avenues. The prototypical example is someone who decides to try a lot of drugs, doesn’t know how to do it safely, and ends up badly hurting themselves.
  • Agency is hard. Making your own path can be exhausting and stressful, while following the default path can be pleasant and fine. Optimising small things may not be worth the effort
    • Again, I recommend picking your battles! It’s easy to fall into the attractor of thinking you must be agentic in everything, and feeling guilty when you’re not.

Overall, agency is one of the most useful skills I’ve ever gained (and I still have a lot of room to grow!). And hopefully in this post I’ve helped to flesh out what, exactly, I mean by agency, reasons to value it, and concrete ways to cultivate this skill.

So, if this post resonated and you want to gain agency, my final challenge to you is this. What are you doing to do about it?

Thanks to Duncan Sabien and the LessWrong Feedback Service for valuable feedback!


Covid 10/21: Rogan vs. Gupta

Новости LessWrong.com - 21 октября, 2021 - 18:10
Published on October 21, 2021 3:10 PM GMT

I finally got my booster shot yesterday. I intended to get it three weeks ago, but there was so much going on continuously that I ended up waiting until I could afford to be knocked out for a day in case that happened, and because it’s always easy to give excuses for not interacting with such systems. When I finally decided to do it I got an appointment literally on my block an hour later for that and my flu shot, and I’d like to be able to report there were no ill effects beyond slightly sore arms, but I’m still kind of out of it, so I’ll be fine but if I made some mistakes this week that’s likely the reason. I also had to wait the fifteen minutes. I would have simply walked out the moment they weren’t looking, but they held my vaccine card hostage until time was up. 

We now have full approval for every iteration of booster shots, including mix and match, for those sufficiently vulnerable. If you’re insufficiently vulnerable but would still rather be less vulnerable, there’s a box you’ll need to check. 

I got a chance to listen to the Rogan podcast with Gupta, and have an extensive write-up on that. It was still a case of ‘I listen so you hopefully do not have to’ but it was overall a pleasant surprise, and better than most of what passes for discourse these days.

Executive Summary
  1. Conditions continue to improve.
  2. Booster sequences including mix-and-match have been approved.
  3. Rogan did a podcast and I listed to it so you don’t have to.

Let’s run the numbers.

The Numbers Predictions

Prediction from last week: 481k cases (-12%) and 9,835 deaths (-11%).

Results (from data source unadjusted): 472k cases (-15%) and 11,605 deaths (+1%).

Results (adjusted for Oklahoma which will be baseline for next week): 472k cases (-15%) and 10,705 deaths (-3%). 

Prediction for next week: 410k cases (-13%) and 9,600 deaths (-10%).

Wikipedia reported over 1,100 deaths in Oklahoma this week. That’s not plausible, so I presume it was a dump of older deaths or an error of some kind, and removed 900 of them from the total. 

There’s no hard and fast rule for when I look for such errors or how I do the fixes, so you can decide if what I’m doing is appropriate. Basically if an entire region gives a surprising answer I’ll look at the individual states for a large discrepancy, which is at least slightly unfair since sometimes it makes the number look ‘normal’ when it shouldn’t, but time is limited. 

This is still more deaths than I expected, but given cases continue to drop I expect deaths to keep dropping. It’s possible there was another past death backlog I didn’t spot because it wasn’t big enough to be obvious.


Chart and graph are adjusted (permanently) by -900 deaths this week in Oklahoma.

Death counts seemed higher than plausible in general even after the fix, but it’s a small mistake. Next week will tell us whether or not it is a blip.


The South’s situation continues to improve rapidly, and it now has fewer cases than multiple other regions, but we see improvement everywhere. Solid improvement in the more northern states is especially promising in terms of worries about a possible winter wave. Can’t rule it out, but it seems somewhat less likely. 

We are now down more than 50% in cases from the recent peak, and over the last five weeks, although regionally that is only true in the South. But we’ve clearly peaked everywhere. 


Nothing ever changes. Which at this point is good. Steady progress is more meaningful each week as more of the population is already vaccinated.

Vaccine Effectiveness and Approvals

In the words of Weekend Editor: Today the FDA formally authorized Moderna boosters, J&J boosters, and all the mix-and-match combination boosters. This is very aggressive for them! 

Indeed, and congratulations to the FDA for doing the right thing, at least on this particular question. When someone does the right thing is the time to thank them, no matter how long overdue it might be. 

As per procedure now the CDC gets to have all the same discussions, because if there’s one thing we need enough of it’s veto points. We’ll know the outcome on that next week.

Vaccine Mandates

Support for older vaccine mandates is declining. This could end up being quite bad.

There continue to be claims that there will be massive waves of people quitting over vaccine mandates, this time in New York City. No, we are not going to lose half our cops, no matter how excited that prospect makes some people these days. We’ll find out soon enough:

On the one hand $500 is in the ‘let’s actually get it done’ range and worth it if it works, and should smooth over any general grumbling, on the other hand it’s enough that I’m pissed off that they’re getting that many extra tax dollars for what they should have done anyway. 

Here’s a Zeynep thread on the psychology involved. Lot of good food for thought.

A vaccine mandate carries with it the requirement to verify that it is begin followed. This in turn means verifying people’s vaccine status and ID at various points. When does this end? Some people who based on their previous writings really should know better are seemingly fine with ‘never’ and I notice I am most definitely not okay with that for everyday activities. There will be a point these mandates \(\) everyday actions turn negative, and it’s not that far off, and then we’ll have to figure out how to unwind them. Would be increasingly happy to start now. 

This post looks at vaccine persuasion in Kentucky, notes that $25 Walmart gift cards were a big draw, doesn’t seem to offer much hope that persuasion via argument would work. But have we tried bigger gift cards?

California state works are somehow vaccinated at a rate much lower than the state average. Ignoring for the moment that they don’t seem to be doing much to fix this problem, one can draw various conclusions about how the state government operates and hires based on this. And one can wonder why, if the state is willing to impose so many other restrictions, they can’t or won’t take care of business in this way. However, the article also notes that this is comparing the number who provided proof of their vaccination status as employees, versus the number who actually got vaccinated as adults. It seems that some employees may have simply decided not to provide proof, either as a f*** you to the demand for proof of vaccination, or because it seems that if you don’t vaccinate they ‘make’ you get tested a lot but that’s free and some people like the idea of getting tested frequently, so whoops. 

Washington State’s football coach is out: 

So is an NHL player, and even if you oppose mandates I hope most of us think that faking a vaccine card is not a permissible response.

This is a remarkably high tolerance of importantly fraudulent behavior. Very much does not seem like a sufficient response.

Not endorsed, but noting the perspective that the unvaccinated are holding us hostage, because the threat of potentially running out of health care capacity is the reason we still take major preventative measures, and if everyone got vaccinated we would go back to normal. I find the hostage situation metaphor apt because hostage situations are mostly because we choose to care about them. Every so often, someone on a show will grab a hostage, and the response will quite correctly be ‘I’m not going to reward threats to destroy value by giving you what you want’ and I wrote a contest essay back in grade school arguing this should be standard procedure. Instead, we’re more like a DC hero who thinks that if you point a weapon at any random citizen they are forced to hand over the world-destroying superweapon codes. I will leave it to you to draw the appropriate metaphor to our current situation on other fronts.

NPIs Including Mask and Testing Mandates 

From New Zealand and Offsetting Behavior comes the story of Rako. Rako offered to scale up their Covid-19 tests, the government said they weren’t interested, then when it turned out the tests were good they reversed course and decided to take what tests and capacity that did exist without much paying for them, among other disasters going on there, and hope that somehow anyone will be interested in helping with such matters next time around. It doesn’t look good. Neither does the Australian decision not to securely keep the police away from the contract tracing records

How much should you update on a Covid-19 test? We’ve got a new concrete reasonable attempt answer that, although it still uses the PCR test results as their ‘gold standard’ and thus is underestimating the practical usefulness of other testing methods. 

The Bayes factors for positive tests are pretty high. The ones for negative results are less exciting, but if you’re focusing on infectiousness, I believe you end up doing a lot better than this. Those who do math on this stuff a lot are encouraged to look into the details.

It turns out Rapid Antigen Testing works rather well at telling who is infectious. Here’s a thread explaining why they’re much more accurate than people thought, which is that when the PCR tests came back with what were effectively (but arguably not technically) false positives, not matching those was a failure. The fallback general official response has for a long time been something like this.

Where ‘have Covid’ is defined as ‘have or recently have had any trace of Covid’ although that’s rarely the thing that has high value of information. It is very reasonable for someone to want to know if they have (or others have) Covid, and it is also very reasonable for someone to know if they are (or others are) infectious. Different purposes, different tools, and it turns out both tools are highly useful. The mistake we made for over a year was saying that because people might also want to know if they have Covid, the test that is very good at detecting infectiousness and less good at detecting Covid was illegal, so we should instead not test at all or use a test that was more expensive, slower and less useful in context. It is good that things seem to be coming around a bit.

Restaurant that isn’t as good as Shake Shack lets people in, is now told they are out. 

This was only a temporary shutdown of one store, since they only have one store in San Francisco. Depending on exactly where their stores are, this could be a very smart move, as it wins them massive points with the outgroup.  

Permanent Midnight

Some of us think the ultimate goal is to become complacent about Covid-19 once it’s no longer a major threat, and return to our lives. Our official authorities say, madness.

This is an explicit call for vaccinated children to be forced to mask permanently. This is utterly insane. If not them, then who? If not now, then when?

I sincerely hope the kids neither forgive nor forget that this happened to them.

This also brings up another of my humble proposals of ‘maybe we should teach children that skipping a meal every so often is fine, so they have that valuable skill in life that’s done me worlds of good’ but mostly it’s that they are literally forcing children to go outside in the rain to eat.


Here’s Sam Bankman-Fried going over why calls for large permanent interventions are nowhere near ever passing cost-benefit tests, and giving an attempt at a calculation and thus an opportunity to nitpick and refine.

Semi-constant mask wearing costs a lot more than 0.4% of GDP. I don’t know exactly what you’d have to pay people to get them to wear masks indefinitely (with no other benefits) but I’d be stunned if it’s under 1% of consumption. Even if we knew it would work on its own with no help I have no idea why you’d even consider this.

Staying home if you have a sick housemate, to me, seems mostly like a good idea even before Covid-19. You can call this a cost of one day per year, but you have to make a bunch of assumptions to get there. Days of work can’t be fungible, so taking a random day off means your productivity is lost and can’t be made up later, and there aren’t substantial benefits from taking that extra day off on the margin. But that’s kind of weird, since if there was such a big net loss from losing a random day of work it strongly implies you’re not working enough at the baseline. And it seems likely to me that you save the office collectively a full day of productive work (since being low-level sick makes work less effective on top of less fun) by avoiding additional infections. 

The exception here would be if work on that day can’t be done from home, and isn’t fungible with either other times or other people, so you lose something close to a full day’s productive value. I think that is rarely the case, and that Sam’s history of being stupidly productive at all times makes this a blind spot. For most people, my model says that either (A) you can mostly get others to cover for you without too much loss and (B) most of the work where this isn’t true can be done remotely for a day or two. 

Zoom meetings are a mixed bag, but this week I had my first work in-person meeting in over a year and it was incredibly more productive than a similar Zoom meeting would have been. There are big advantages the other way, so this won’t always be the case, but I strongly agree that giving up on seeing people in person is a productivity (and living life) nightmare that costs way more than we could plausibly give up. But on the margin more Zoom meetings than 2019 is good actually.

That leaves the vaccines, which he estimates at a day of cost, and I don’t understand this number at all. Sometimes the vaccine will knock one out for a day, but this does not need to be the case and I wrote most of this the day after getting my booster shot. Over time, we’ll figure out the right dosing and regimens and the side effect impact will decline, and you can plan ahead so you choose a day when it’s not that expensive to be somewhat out of it. 

If we end up passing a ‘everyone must miss a work day after the shot so everyone feels permission to get the shot’ law then it could end up costing a day, I guess, but also giving people some paid time off at the time of their choice that they plan for seems like it isn’t even obviously a bad idea?

Rogan Versus Gupta

I got the chance to listen to Joe Rogan’s podcast with Dr. Gupta. It’s a fascinating combination of things, some of which are great and some of which are frustrating and infuriating, from both of them.

The opening is a discussion of why the two of them were willing to sit down together. Gupta sat down with Rogan to try and understand Rogan’s thinking process and because Rogan can reach a huge audience that is otherwise exceedingly difficult to reach, and to convince Rogan on vaccines. Rogan sat down with Gupta because Gupta’s public changing of his mind on marijuana (which they talk then about a bit) revealed to Rogan that Gupta is willing to look at the data, change his mind and admit when he’s wrong. 

In this past, both of them acquitted themselves well. The central point here was well taken. On its surface it was about the potential of marijuana and why we should not only legalize but embrace it and research what it can do for us, and I’m while I don’t have any desire to use it myself I am totally here for that. 

The real point was that one needs to think for oneself, look the data with your own eyes and an open mind, be curious and come to conclusions based on where that takes you, and that doing this is how you earn many people’s respect. That Gupta was here with the ability to engage in (admittedly imperfect, but by today’s standards pretty darn good) discourse because he’d shown himself in the past to be an honest broker and truth seeker acting in good faith. 

They then started getting down to it and discussing the situation in earnest. Compared to my expectations, I was impressed. Joe Rogan came in curious and seeking truth. He had many good points, including some where he’s more right than and where he was wrong, he was at least wrong, making substantive claims for reasons and open to error correction and additional data and argument. He was continuously checking to see if Gupta’s story added up and whether it lined up with Rogan’s model of the world in general, but was quite open to learning more.

Like any discourse or debate, there were many ways all participants could have done better.

Several people have noted that Joe Rogan is drawing a distinction between vaccines, where the burden of proof of safety is being put on the vaccines, and on various other things like Ivermectin, where he largely puts the burden on others to show they are not safe, and holds them to a very different standard. In general, it seems like Rogan is hunting for an angle whereby the vaccines will look risky. Not full solider mindset, but definitely some of that going on. 

It’s worth noting that Rogan explicitly states in minute 59 that the risks from the vaccines are very, very small. This is despite Rogan listing off people he claims to know who had what he thinks are deadly serious adverse reactions, so it’s not clear to me that he in his position should even believe these risks are all that small. 

Rogan’s point that Gupta is at far greater risk as a vaccinated healthy older adult, than a child would be unvaccinated, is completely correct and a kill shot when not tackled head on. None of our actions around children and this pandemic make any sense because we refuse to reckon with this. Gupta has no answer. The response ‘I think you have to draw a distinction between those that have immunity and those that don’t’ is not a meaningful answer here – saying the word ‘immunity’ and treating that as overwriting age-based effects is Obvious Nonsense and Gupta is smart enough to know that. As are his attempts to move back and forth between risk to self and risk to others when dealing with kids. If he wants to make the case that vaccinating kids is mostly about protecting others, that’s a very reasonable case, but you then have to say that part out loud.

Which is why Rogan keeps coming back to this until Gupta admits it. Gupta was trying to have it both ways, saying he’s unconcerned with a breakthrough infection at 51 years old, and that young children need to be concerned about getting infected, and you really can’t have this one both ways. Eventually Gupta does bite the bullet that child vaccinations are about protecting others, not protecting the child (although he doesn’t then point out the absurdity of the precautions we force them to take), and frames the question in terms of the overall pandemic. 

The question of protecting others was a frustrating place, and the one where I’m most disappointed in Rogan. Rogan pointed out that vaccinated people could still spread Covid-19 (which they can) and then said he didn’t see the point of doing it to protect others, whereas he’s usually smarter than that. Gupta pointed out that the chances of that happening were far lower, although he could have made a stronger and better case.

Gupta was very strong in terms of acknowledging there was a lot we didn’t know, and that he had a lot of uncertainty, and that data was constantly coming in, and in engaging the data presented with curiosity and not flinching, if anything taking Rogan’s anecdata a little too seriously but in context that was likely wise. 

The key moment where Rogan turns into the Man of One Study seems to start in minute 62. In response to Gupta referring to the study, Rogan has it brought up. The study’s surface claim is that for some group of young men, the risk of the vaccine causing myocarditis is 4.5x the chance of being hospitalized for Covid-19. Gupta had previously pointed out that the risk of myocarditis from Covid-19 is higher than that risk from the vaccine, and tries to point out that the study here is not an apples-to-apples comparison, as it’s comparing hospitalization risk to diagnosis risk. Rogan grabs onto this and won’t let go. It takes a few minutes and Gupta stumbles in places, but around the end of minute 65 Gupta gets through to Rogan that he’s claiming myocarditis risk from the disease is higher than from the vaccine. Rogan responds that this is inconsistent with the data from the study, which seems right. Then Gupta gives the details of his finding, but his finding is based on all Covid-19 patients in general, which is consistent with this particular risk being higher for young boys from the vaccine than from Covid-19, and potentially with the results of the study. 

At another point, Gupta threw the Biden administration under the bus on the issue of boosters, blaming them for daring to attempt to have an opinion or make something happen without waiting for word to first come from the Proper Regulatory Authorities, and claiming this was terrible and caused two people to resign and treating their decision to resign as reasonable (Rogan was asking about the resignations repeatedly). He equated ‘data driven’ with following formal procedure and only accepting Officially Recognized Formats of Data. I wasn’t happy about this, but the alternative would be to start speaking truth about the FDA.

My model is that Rogan’s take on vaccines differing from the standard line comes mainly from Rogan placing an emphasis on overall health and the strength of a person’s immune system, and from taking these questions seriously and spotting others not taking the questions seriously. 

Rogan’s entire model of health and medicine, not only his model of Covid-19, consistently gives a central role to maintaining overall good health. People should devote a lot of time and effort to staying in good health. They should eat right, exercise and stay active, maintain a healthy weight, take various supplements and so on. This is especially important for Covid-19, whose severity seems highly responsive to how healthy someone is, with large risk factors for many comorbidities, although not as large as age. 

From Rogan’s perspective, one option against Covid-19 is vaccination, but another option is to get or stay healthy. As Gupta points out multiple times, this is a clear ‘why not both’ situation, except that there’s complete silence around helping people get healthy, even though it’s a free action. It’s worth getting and staying healthy anyway, why not use Covid-19 as an additional reason to get people started on good habits? And if you’re unwilling to help people get healthy, why should we listen to you about this vaccine? Which is a fair point, you mostly shouldn’t listen to these people in the sense that their claims are not in general especially strong evidence. It’s that in this case, it’s very clear for multiple distinct reasons that they are right.

Minute 88 is when they get into Ivermectin. Joe Rogan is not happy that he was described as taking ‘horse dewormer.’ As he points out, this is very much a human medicine, regardless of how some people are choosing to acquire it, and those people are not him: “Why would they lie and call it horse dewormer? I can afford people medicine, motherf***er, this is rediculous. It’s just a lie. Isn’t a lie like that dangerous? When you know that they know they’re lying?” 

So then he played the clip, and the CNN statement wasn’t lying, exactly. It was technically correct, which as we all know is the best kind of correct – it said that he said he had taken several drugs including Ivermectin. Then it said that it was used to treat livestock, and that the FDA had warned against using it to treat Covid. Now all of those statements are technically correct – the FDA definitely warned about it and doesn’t want you doing that, and among other things Ivermectin is used to treat livestock, although it is also often used for humans and Rogan had a doctor’s prescription. 

Now, in context, does that give a distinctly false impression to viewers? Yes. Are they doing that totally on purpose in order to cause that false impression? Absolutely. Is it lying? Well, it’s a corner case, and technically I guess I’m going with no? Gupta’s response is that they shouldn’t have done it, but he’s not willing to call it a ‘lie’ and is denying that there was glee involved. (Morgan Freeman narrator’s voice: Oh, there was glee involved.)

Rogan asks, if they’re lying about this, what do we think about what they’re saying about Russia, or any other news story? And my answer would be that this is the right question, and that it’s the same thing. They’re (at least mostly) going to strive to be technically correct or at least not technically wrong, and they’re going to frame a narrative based on what they want the viewer to think, and as a viewer you should know that and act accordingly.

Later on comes the part that should be getting more attention. In minute 125, Rogan explains that he almost got vaccinated, but didn’t, and what happened.

  1. The UFC got some doses and offered one to Rogan. He accepted.
  2. Logistical issues. Rogan had to go to a secondary location to get it, his schedule didn’t allow it, had to be somewhere else, planned to take care of it in two weeks.
  3. During the two week period, Johnson & Johnson got pulled.
  4. Also, his friend had a stroke and Rogan connected this to the vaccination, whether or not this actually happened.
  5. Rogan goes “holy ****” and gets concerned.
  6. Another of Rogan’s friends has what looks like a reaction to the vaccine, gets bedridden for 11 days. And another guy from ju-jitsu that he knows had what looked like another issue, having a heart attack and two strokes.
  7. A bunch of these reactions don’t get submitted to the official side effects register.
  8. Rogan concludes that side effects are likely to be underreported.
  9. Rogan goes down a rabbit hole of research, finds opinions on shape of Earth differ.
  10. Rogan doesn’t get vaccinated, thinking he’s healthy and he’ll be fine.
  11. Rogan gets Covid-19, his family presumably gets it from him (Minute 135), it isn’t fun, but he gets over it and he’s fine, and they get over it and they’re fine.
  12. Rogan tells these stories to millions of people, teaching the controversy, but still advocating vaccination for the vulnerable and for most adults, but is highly skeptical about vaccinating kids and thinks people should be free to choose.
  13. Gupta tries to get Rogan to get vaccinated despite having been infected, while admitting Rogan has strong immunity already, which goes nowhere.
  14. Rogan says repeatedly that he’s not a professional, that you shouldn’t take his advice, to listen to professionals, that he is just some guy with no filter. But this includes naming The Man We Don’t Talk About as an expert.
  15. But of course, he knows that saying ‘my advice is not to take my advice’ mostly never works.

The first thing he mentions in his story, the start of this reversal, is when they pulled Johnson & Johnson to ‘maintain credibility.’ This is a concrete example of the harm done by that action. It contributed directly to Rogan not being vaccinated. That speaks to how many other people had similar reactions, and also Rogan then shared his thoughts with millions of people, some of whom doubtless therefore did not get vaccinated. 

The bulk of his points were about side effects in particular people that Rogan knew. From his perspective, the side effects looked very much like they were being severely underreported, especially since these particular side effect cases weren’t reported. How could he not think this? From his epistemic position, he’d be crazy not to think this. He has quite a lot of friends and people who would count as part of the reference class that he’d observe here, and the timing of some of what looked like side effects could easily have been a coincidence rather than causal, but still, he saw what looked like three of these serious cases in rapid succession, in people who seemed otherwise healthy. Meanwhile, similar risks are being used as a reason to pull one of the vaccines. 

He responded to all this quite strong (from his position) Bayesian evidence, combined with his good health and his model that Covid-19 was unlikely to be that bad for him, did a bunch of research that under these circumstances put him in contact with a bunch of Covid-19 vaccine skeptics, and declined the vaccine. 

I strongly feel he made the wrong decision, took an unnecessary risk and would have been much better off getting vaccinated. But mostly the heuristics and logic used here seem better than blindly trusting a bunch of experts. Sometimes that gets you the wrong answer, but so does trusting the experts. 

Given he continues to mostly advocate for vaccination of adults, and seems to have come around to believing the generally accepted vaccine safety profile, that both speaks highly to his epistemic process used since he was exposed to a bunch of his good friends who were peddling other conclusions rather forcefully, and also makes me think I know here he did make his mistake.

My guess (and I could be wrong, he didn’t make this explicit) is that the decision ultimately came down in large part to blameworthiness in Rogan’s mind. In the frame most of us have, vaccines are safe and effective, so if you get Covid-19 without being vaccinated that’s on you, and if you have one of the exceedingly rare serious side effects (many or more likely most of which are a coincidence anyway) then that’s not on you. The incidents with his friends reversed this for him, combined with thinking that outcomes from Covid-19 are tied to health. In his mind, if Covid-19 got him, that was his fault for being unhealthy. If the vaccine got him, that would be on him for seeing these things happening to his friends, and taking it anyway. So he did what most people do most of the time, especially when he saw the decision as otherwise only a small mistake, and avoided what he thought of as blame, and did what he could feel good about doing. And of course, the decision was in many ways on brand. But the undercurrent I sense is that yeah, he knew it was objectively a mistake in pure health terms, but not a huge one, so he just did it anyway. 

One thing that reinforces this is that Rogan comes back repeatedly to individual examples of people, especially young healthy people, who had problems that happened after getting vaccinated, and says that it was overwhelmingly likely that that particular person would have been fine had they gotten Covid-19. Which is true, but it was also far more overwhelmingly likely that they would not have had the problem they had if they got vaccinated. If you trade one risk for another smaller risk, sometimes the smaller risk happens to you. That’s what a risk is. But if you instinctively use forms of Asymmetric Justice, what matters is that this particular person is now worse off, even if on net people who took such actions are better off, therefore blame.

That of course is an aspect of vaccines being held to a different burden of proof. In his mind and many others, they’re unsafe until proven safe, and that includes long term data, and the prior on ‘artificial thing we made to do this’ in some sense is stronger than any of our ‘this is how this mechanically works or when we’d see the effects show up’ style arguments could hope to be. Whereas he puts his assortment of other stuff into a different bucket, with a different burden and a radically different prior. Which isn’t a crazy thing to do, from his perspective, although I don’t see it as mapping to the territory.

They finish up with a discussion about the lab leak hypothesis, and they certainly don’t make me less suspicious about what happened on that front. 

That’s a giant amount written about a three hour podcast I listened to (mostly at 1.5x speed) so you didn’t have to. It was less infuriating than I expected, and contained better thinking, and is to be hailed for its overall good faith. We need to be in a place where such actions and exploration are a positive thing, even when they make mistakes and even when they end up causing people to be pushed towards worse decisions in many cases. 

In Other News

Bioethicists have profoundly inverted ethics.

No, seriously, imagine speaking this sentence out loud. Say, to whoever is listening, “We don’t ask people to sacrifice themselves for the good of society.” 

Then realize that bioethicists are far more insane than that, because what they’re actually saying is, “We don’t allow people to sacrifice of themselves, or take risks, for the good of society.” 

Over half of respondents to this survey report being lonely, with only a small effect from identifying as autistic. We had a crisis of loneliness before and Covid-19 had to have made it much worse, and at this point I worry about such effects far more than Covid-19.

Not Covid, but a good politician never wastes a crisis, so here’s a look into the child care portion of the Build Back Better bill. I solved for the equilibrium, and I doubt anyone’s going to like it.

As the weeks continue to blend into one another, it seems like it’s getting to be time to formally write up my lessons from the pandemic. I don’t know when I’ll have the bandwidth, but I’m authorizing people to periodically ask why I haven’t finished that yet.


Rationality Vienna goes hiking

Новости LessWrong.com - 21 октября, 2021 - 14:34
Published on October 21, 2021 11:34 AM GMT

We’ll meet at 2pm at the terminus of tram D (Nußdorf, Beethovengang) to hike along Stadtwanderweg 1.

I expect around a dozen participants show up. (90%: 6 .. 18)

If the weather forecast is not appealing, we might change to a café on short notice, which change would be posted here. Assume default plan if there’s no announcement.


NATO: Cognitive Warfare Project

Новости LessWrong.com - 21 октября, 2021 - 12:57
Published on October 21, 2021 9:57 AM GMT

NATO seems to have a project on cognitive warfare and a few public reports online:

Interim Report

Based on the Understanding Phase findings, NATO has identified the following priorities:
- Develop a Critical Thinking online course
- Develop improvements to the decision making processes

- Leverage technologies, including VR and AI to develop tools in support of better cognition and better decision making

1 Jun 21 Cognition Workshop Report

Cognition includes three interrelated aspects that are reflected in the structure of the workshop: information, decision-making and neuroscience.

Cognitive Warfare

As global conflicts take on increasingly asymmetric and "grey" forms, the ability to manipulate the human mind employing neurocognitive science techniques and tools is constantly and quickly increasing. This complements the more traditional techniques of manipulation through information technology and information warfare, making the human increasingly targeted in the cognitive warfare. 


Successful Mentoring on Parenting, Arranged Through LessWrong

Новости LessWrong.com - 21 октября, 2021 - 11:27
Published on October 21, 2021 8:27 AM GMT


In June 2021, Zvi posted The Apprentice Thread, soliciting people to offer, or request, mentoring or apprenticeship in virtually any area. Gunnar_Zarncke offered advice on parenting, as the parent of four boys (incidentally, true of my grandmother as well) between the ages of 9 and 17, with the usual suite of observational skills and knowledge that comes with being a regular on this site. I responded with interest as my first child is due in November.

Gunnar and I are sharing our experience as an example of what a successful mentoring process looks like, and because his key points on parenting may be interesting to current and future parents in this community. I had several breakthrough-feeling insights which helped me to connect my LessWrong/rationalist schema to my parenting schema.

Gunnar and I began by exchanging messages about the parameters of what we were getting into. I was interested in his insight based on these messages and other comments and posts he had made on this site about parenting. We arranged a Google Meet video call, which confirmed that our personalities and philosophies were compatible for what we were undertaking.

We did not have a structured reading list, although I investigated resources as Gunnar suggested.  As we went along, Gunnar translated into English samples of notes taken by his children’s mother throughout their childhood and shared them with me. She had also systematically described the daily and weekly tasks a parent could expect in various development phases of the child’s life. I was an only child and have not parented before, so I found this extremely educational.

We had several video calls over the next few months and discussed a wide range of parenting-related topics. Gunnar also suggested this post, to report on our experience. I drafted the post, and Gunnar provided comments, which I merged, and after he reviewed the final version, we published it as a joint post.

By call number two, I was realizing that parenting was never going to be the sort of thing where I could read the “correct” book for the upcoming developmental stage, buy the “correct” tools, and thereby maximize outcomes. Instead, it would be a constant process of modeling the child’s mind, providing new inputs, observing behaviors, updating the model as needed, researching helpful tools, and iterating more or less until the kid is in its 20s. At first, this was intimidating, but I’ve come around to understanding that this just is the parenting process. This synthesis eventually gave me additional motivation and optimism. 

These calls gave me great comfort against anxiety about parenting, confidence, and a sense of human connection, all beyond what I expected. 

First call

Our first call was within a week of Zvi’s post. We described our backgrounds as people who were parented. Gunnar came from a large family; I came from a small one. We discussed how our parents nurtured positive traits in us and also touched on what our parents did that didn’t work. 

For example, my parents would frequently observe when other people were acting in ways consistent with the values they were trying to teach me, in addition to praising or otherwise rewarding me for acting that way myself. 

Gunnar's mother was mostly trusting of her children and "went with the flow," following her intuitions. His father was very fostering and offered a lot of practical education. He consciously created a safe environment. He said he learned this approach from his parents, who came from different backgrounds. Gunnar's grandmother came from a liberal Scandinavian family, and his grandfather came from disciplined Prussian family. His grandfather embraced his grandmother's liberal norms, which seems to have created a reliable high-trust environment for his father--despite difficult times during and after World War II. 

Gunnar segued into discussing general strategies for supporting children’s development. Highlights:

  • “Salami tactics”: Allow them to learn new behaviors and situations incrementally rather than all at once.
  • Developmental diary: Once a week, or more often, write down notes on what happened with each child during that period, what was effective parentingwise, what wasn’t. This was something that Gunnar came back to consistently. However, he is not confident that it is right for everyone, just that it was for him. I plan to do this as well.
    • Consistent reflection
    • Lessons to carry from one child to the next
    • Incorporate photographs
  • The saying goes, "Small kids, small problems; big kids, big problems." But the pattern goes like this:
    • With small kids you have a lot of very small tasks and problems: How to diaper. Why is the baby crying right now? Let's try this 5-minute game. Let's go to this 30-minute baby swimming class. We have to rock the crying baby for an hour until it finally sleeps. Oh, the baby is interested in this thing--oh, it's already gone. Why does X no longer work? Oh, Y works now.
    • As they grow older this switches to: Will they find friends at the new school? Taking the kid to soccer games every weekend--and staying there for cheering, photos, and small talk. Practicing math for hours before the exam. Working again and again on some fight between siblings. Helping to renovate the room.  Talking for hours about some conflict or problem.
  • Be alert to opportunities for teaching based on the child’s interests.
    • Model the behavior of conceiving and running experiments
    • Organize activities around projects (for example, in the garden)

Gunnar recommended several texts during this call and in a follow-up email, including:

We covered many topics in later calls, organized below by subject rather than chronology.

Child Cognition in General

We discussed more cognitive elements of parenting--the extent to which developing brains “need” new inputs and partially “know” what inputs they need but, if overloaded, will retreat to the familiar, and especially to you, and then consolidate.  Gunnar mentioned the Big Five as a good shorthand for observing kids’ personalities.  He shared the first of the translated parenting documents I mentioned above.

This discussion reminded me of Clark, Surfing Uncertainty, which I cannot recommend strongly enough.  After reading that book, I understood intellectually that brains seemed to be prediction/testing machines that thrive on stimulation, but I didn’t see that model as a frame to place over my parenting thoughts until Gunnar spoke about similar concepts in his own perception of parenting. This was a eureka moment for me.

Your kids spend even more cognitive energy on you than you do on them, because their survival depends on it (see also here). 

  • They will notice if you are stressed or worried.
  • They understand words you’re using before they can use those words themselves.

We discussed various ways to teach children before they are in school, and to augment what they learn in school.

  • Use homeschooling materials to assist them with their homework. (In Gunnar’s country, homeschooling is very rare; in mine, the USA, it’s a constitutionally protected right, consistent with Gunnar’s claim that the best homeschooling materials are in English.) I might never have considered these otherwise, because homeschooling in the USA is correlated with weird beliefs, and I was subconsciously assuming that homeschooling materials generated by weird-belief-holders would be somehow infected by the weird beliefs.  (Gunnar adds: They likely are infected by weird beliefs, but you can just keep the good parts.)
  • Avoid rote memorization, except where necessary--multiplication tables, for example.
  • Parents’ and teachers’ incentives are often misaligned (ideal methods for an entire room versus ideal methods for your own child).
  • Encourage kids to make testable predictions and bets.
  • At all verbal ages, you can talk to them in a more complex way than they are able to communicate, yet they will still understand some parts of it and absorb context and parts of meaning..

Conditioning works, but only on things you are consistent about. Corollary: if you’re not willing to be consistent on something, leave it out. (My parents used this on me when I learned how to whine. They agreed not to acknowledge anything I said in a whiny tone, and told me this would be their policy. According to them, it worked quickly.)

When the desired behavior is rare on its own, you can “cheat” by simulating the behavior (for example, in pretend play).

Rather than “No,” use “Yes, but” “yes, and” “yes, as soon as”. These are opportunities to show the child that you are also a person with needs, and to emphasize mutual responsibility.

  • Trust your instincts, yourself, your spouse, and offer the child a lot of trust.
  • The parent should behave such that the child unconditionally trusts that its needs will be met reliably.
  • Don’t lie to your kids.
  • Challenge them, but not so far that they feel physically unsafe.

Parenting is intense and challenging, not least when you are sleep deprived because of a baby’s sleep schedule. Observations:

  • Cultivate a support network of friends and other parents.
    • There is probably no substitute for in-person connection and support.
    • Talking helps.
  • Have a safe place to temporarily retreat to.
  • Consistently (perhaps a certain interval each day) set aside time for unstructured entire-family time.
  • The change in the marriage relationship requires focus and time to navigate.
  • Communicate feelings and stress with your spouse and provide physical support as needed.
Avenue for Neuroscience Research

Gunnar has an interesting, and possibly testable, hypothesis: One effect of puberty is to partially reset the values a child assigns to normative judgments, but not to procedural knowledge about reality. (Corollary: Whatever values you’ve taught your child will be more likely to survive if you’ve given them the information necessary to conclude that the value is correct.)  The cascade of puberty hormones could conceivably affect the chemicals in the brain responsible for adjusting weights of priors.  I don’t have enough neuroscience to develop this any further, but it’s “common knowledge” that many teenagers think their parents are idiots.  A biological explanation would explain how widespread the behaviors leading to this folk belief are.


I am grateful to Gunnar for his time, attention, and “gameness.” I am glad that this entire process happened, starting with Zvi’s initial post and ending with this post. I feel far more prepared than I did at the beginning, and I doubt that a person outside this community would have been able to get me there. I plan to implement is weekly development diary as a way to track trends, organize my own thoughts about parenting, and force myself to really think about what's going on.  Maybe most importantly, I have a role model for thinking hard about what's going on even with very young children. My only model for that before was cognitive scientists and their informative but ultimately clinical experiments.

I’ll give Gunnar the last word:

I enjoyed the mentoring tremendously. It is very rare to find someone so interested in parenting and taking the preparation so seriously. I felt myself and my advice highly valued. A good feeling that I hope many mentors share. Talking about my parenting experiences and insights also sharpened them and gave me more clarity about some of my thoughts on parenting. I highly appreciate all the note-taking that was done by Supposedlyfun.

One thing that I realized is how crowded the parent education market is and how difficult it is to find unbiased evidence-based material. I have been thinking quite a lot about this and hope to post about it sometime.

We have paused the mentoring for the time being and I am looking forward to how the advice works out in practice. We agreed on a call sometime after the family has adjusted to the new human being.


Experimenting with Android Digital Wellbeing

Новости LessWrong.com - 21 октября, 2021 - 08:43
Published on October 21, 2021 5:43 AM GMT

inspired by this post

Introduction: Small Deaths

I'm a morning person.

I usually wake up at about 6 AM. I read on my phone in bed until the toddler wakes up at 6:30, at which point I look after him till I take him to daycare at about 7:15. I then have till 9 AM free, during which time I get a lot of stuff done - both chores and personal projects.

I finish work at 6 PM. Either we have dinner with the kid, or I feed him dinner, and then we have dinner once he's in bed at about 7:30.

So by 8:30 PM I've eaten dinner, jobs are all done, and kid's in bed. I might put on the dishwasher, but other than that my evening's free till I start getting ready for bed at about 10:30 PM.

So what do I do in those 2 hours?

Sometimes I read a book. Sometimes I go for a walk.

But most of the time I stare at my phone. I catch up on lesswrong, gitter, emails, twitter, discord, and then once I've done all that I go back again and refresh them all in case something new has turned up.

Once I'm convinced that nothing exciting is going to pop up on my regular haunts I start to think about all the sites that might have fresh content. Maybe Scott Aaronson has posted something on Shtetl Optimized? And didn't I once read a blog by X which I vaguely enjoyed? He's probably posted something in the last year...

Eventually I probably find a sequence or short story I haven't read yet, or some random Wikipedia article - I wonder where the third lowest lake on earth is? What about the highest roads and villages, or northernmost islands?

I couldn't say I enjoy this experience. I'm vaguely bored and unhappy throughout, and I'd probably be happier just going straight to bed. The moment I stop and do something I immediately feel better and invigorated. But stopping is just so difficult!

I view this time as a small death. Time I'm just trying to kill. It doesn't relax me. I don't enjoy it. It has no benefits - it might as well not exist. I might as well be dead for those 2 hours.

How do I break out of it?

Experiment: Digital Wellbeing

I don't want to not be able to use my phone at all after 7 - I'm not brave enough for that.

Ideally I would be able to just turn off chrome after 7, since that removes anything open ended - I can only check apps I have installed on my phone.

Google provides Digital WellBeing controls on modern android phones. Unfortunately it doesn't have the ability to turn off specific apps at specific times. Neither does Parental Controls.

It does allow you to set a limit on the total amount of time you can spend on an app each day. I decided that might be good enough, so for now I've set a 1 hour limit on Chrome every day.


I hope that this will make me more likely to do any of these things in the evening:

  1. Read
  2. Walk
  3. Work on my OSS projects
  4. Write LessWrong posts
  5. Other productive/social activities

This experiment can fail in a number of ways:

  1. I increase the time limit till it is ineffective.
  2. I find ways to work around the time limit. I've already had to set a half hour timer on the "Google" app (the one that allows you to search for things directly from your home screen) because you can access arbitrary websites from there.
  3. I end up wasting my time in different ways - e.g. watching netflix all night, going on different apps on my phone (youtube, gitter, etc.).
  4. I don't enjoy the alternative activities I do in the evening as much as I thought.

I would say at the moment I spend more than an hour and a half on my phone about 4 evenings a week. I estimate that till now I spend a total of about 24 hours a week on my phone. I hope to reduce both these measures significantly.

I'm going to assess this in 1 month, and report how things are going. My predictions are:

  1. This works as well as I hope (only spend more 1.5 hours on phone in an evening when organizing something, less than 12 hours of weekly screen  time): 20%
  2. This does something, but not as much as I would like (3 or less evenings a week spending more than an hour and a half on my phone. Less than 20 hours a week total screen time): 50%
  3. This has no significant effect after a month: 30%

I'm also going to report subjectively how I feel about this process.


Emergent modularity and safety

Новости LessWrong.com - 21 октября, 2021 - 04:54
Published on October 21, 2021 1:54 AM GMT

Our default expectation about large neural networks should be that we will understand them in roughly the same ways that we understand biological brains, except where we have specific reasons to think otherwise. How do we understand human brains? One crucial way we currently do so is via their modularity: the fact that different parts of the brain carry out different functions. Neuroscientists have mapped many different skills (such as language use, memory consolidation, and emotional responses) to specific brain regions. Note that this doesn’t always give us much direct insight into how the skills themselves work - but it does make follow-up research into those skills much easier. I’ll argue that, for the purposes of AGI safety, this type of understanding may also directly enable important safety techniques.

What might it look like to identify modules in a machine learning system? Some machine learning systems are composed of multiple networks trained on different objective functions - which I’ll call architectural modularity. But what I’m more interested in is emergent modularity, where a single network develops modularity after training. Emergent modularity requires that the weights of a network give rise to a modular structure, and that those modules correspond to particular functions. We can think about this both in terms of high-level structure (e.g. a large section of a neural network carrying out a broad role, analogous to the visual system in humans) or lower-level structure, involving a smaller module carrying out more specific functions. (Note that this is a weaker definition than the one defended in philosophy by Fodor and others - for instance, the sets of neurons don’t need to contain encapsulated information.)

In theory, the neurons which make up a module might be distributed in a complex way across the entire network with only tenuous links between them. But in practice, we should probably expect that if these modules exist, we will be able to identify them by looking at the structure of  connections between artificial neurons, similar to how it’s done for biological neurons. The first criterion is captured in a definition proposed by Filan et al. (2021).: a network is modular to the extent that it can be partitioned into sets of neurons where each set is strongly internally connected, but only weakly connected to other sets. They measure this by pruning the networks, then using graph-clustering algorithms, and provide empirical evidence that multi-layer perceptrons are surprisingly modular.

The next question is whether those modules correspond to internal functions. Although it’s an appealing and intuitive hypothesis, the evidence for this is currently mixed. On one hand, Olah et al.’s (2020) investigations find circuits which implement human-comprehensible functions. And insofar as we expect artificial neural networks to be similar to biological neural networks, the evidence from brain lesions in humans and other animals is compelling. On the other hand, they also find evidence for polysemanticity in artificial neural networks: some neurons fire for multiple reasons, rather than having a single well-defined role.

If it does turn out to be the case that structural modules implement functional modules, though, that has important implications for safety research: if we know what types of cognition we’d like our agents to avoid, then we might be able to identify and remove the regions responsible for them. In particular, we could try to find modules responsible for goal-directed agency, or perhaps even ones which are used for deception. This seems like a much more achievable goal for interpretability research than the goal of “reading off” specific thoughts that the network is having. Indeed, as in humans, very crude techniques for monitoring neural activations may be sufficient to identify many modules. But doing so may be just as useful for safety as precise interpretability, or more so, because it allows us to remove underlying traits that we’re worried about merely by setting the weights in the relevant modules to zero - a technique which I’ll call module pruning.

Of course, removing significant chunks of a neural network will affect its performance on the tasks we do want it to achieve. But it’s possible that retraining it from that point will allow it to regain the functionality we’re interested in without fully recreating the modules we’re worried about. This would be particularly valuable in cases where extensive pre-training is doing a lot of work in developing our agents’ capabilities, because that pre-training tends to be hard to control. For instance, it’s difficult to remove all offensive content from a large corpus of internet data, and so language models trained on such a corpus usually learn to reiterate that offensive content. Hypothetically, though, if we were able to observe small clusters of neurons which were most responsible for encoding this content, and zeroed out the corresponding parameters, then we could subsequently continue training on smaller corpora with more trustworthy content. While this particular example is quite speculative, I expect the general principle to be more straightforwardly applicable for agents that are pre-trained in multi-agent environments, in which they may acquire a range of dangerous traits like aggression or deception.

Module pruning also raises a counterintuitive possibility: that it may be beneficial to train agents to misbehave in limited ways, so that they develop specific modules responsible for those types of misbehaviour, which we can then remove. Of course, this suggestion is highly speculative. And, more generally, we should be very uncertain about whether advanced AIs will have modules that correspond directly to the types of skills we care about. But thinking about the safety of big neural networks in terms of emergent modules does seem like an interesting direction - both because the example of humans suggests that it’ll be useful, and also because it will push us towards lower-level and more precise descriptions of the types of cognition which our AIs carry out, and the types which we’d like to prevent.


Work on Robin Hanson compilation

Новости LessWrong.com - 21 октября, 2021 - 01:03
Published on October 20, 2021 10:03 PM GMT

I think Robin Hanson's ideas are not read nearly as widely as they should be, in part because it's difficult to navigate his many, many blog posts (I estimate he's written 2000 of them). So I'd like to pay someone to read through all his writings and compile the best ones into a more accessible format. The default option would be an ebook like Rationality: from AI to Zombies, containing several thematically-linked sequences of posts; possible extensions of this include adding summaries or publishing physical copies (although let me know if you have any other suggestions).

I expect this to take 1-2 months of work, and will pay 5-10k USD (depending on how extensive the project ends up being). The Lightcone team has kindly offered to help with the logistics of bookmaking. My gmail address is richardcngo; email me with the subject line "Hanson compilation", plus any relevant information about yourself, if you're interested in doing this.


AGI Safety Fundamentals curriculum and application

Новости LessWrong.com - 21 октября, 2021 - 00:44
Published on October 20, 2021 9:44 PM GMT

Over the last year EA Cambridge has been designing and running an online program aimed at effectively introducing the field of AGI safety; the most recent cohort included around 150 participants and 25 facilitators from around the world. Dewi Erwan runs the program; I designed the curriculum, the latest version of which appears in the linked document. We expect the program to be most useful to people with technical backgrounds (e.g. maths, CS, or ML), although the curriculum is intended to be accessible for those who aren't familiar with machine learning, and participants will be put in groups with others from similar backgrounds. If you're interested in joining the next version of the course (taking place January - March 2022) apply here to be a participant or here to be a facilitator. Applications are open to anyone and close 15 December. (We expect to be able to pay facilitators, but are still waiting to confirm the details.)

This post contains an overview of the course and an abbreviated version of the curriculum; the full version (which also contains optional readings, exercises, notes, discussion prompts, and project ideas) can be found here. Comments and feedback are very welcome, either on this post or in the full curriculum document; suggestions of new exercises, prompts or readings would be particularly helpful. I'll continue to make updates until shortly before the next cohort starts.

Course overview

The course consists of 8 weeks of readings, plus a final project. Participants are divided into groups of 4-6 people, matched based on their prior knowledge about ML and safety. Each week (apart from week 0) each group and their discussion facilitator will meet for 1.5 hours to discuss the readings and exercises. Broadly speaking, the first half of the course explores the motivations and arguments underpinning the field of AGI safety, while the second half focuses on proposals for technical solutions. After week 7, participants will have several weeks to work on projects of their choice, to present at the final session.

Each week's curriculum contains:

  • Key ideas for that week
  • Core readings
  • Optional readings
  • Two exercises (participants should pick one to do each week)
  • Further notes on the readings
  • Discussion prompts for the weekly session

Week 0 replaces the small group discussions with a lecture plus live group exercises, since it's aimed at getting people with little ML knowledge up to speed quickly.

The topics for each week are:

  • Week 0 (optional): introduction to machine learning
  • Week 1: Artificial general intelligence
  • Week 2: Goals and misalignment
  • Week 3: Threat models and types of solutions
  • Week 4: Learning from humans
  • Week 5: Decomposing tasks for outer alignment
  • Week 6: Other paradigms for safety work
  • Week 7: AI governance
  • Week 8 (several weeks later): Projects
Abbreviated curriculum (only key ideas and core readings)Week 0 (optional): introduction to machine learning

This week mainly involves learning about foundational concepts in machine learning, for those who are less familiar with them, or want to revise the basics. If you’re not already familiar with basic concepts in statistics (like regressions), it will take a bit longer than most weeks; and instead of the group discussions from most weeks, there will be a lecture and group exercises. If you’d like to learn ML in more detail, see the further resources section at the end of this curriculum.

Otherwise, start with Ngo (2021), which provides a framework for thinking about machine learning, and in particular the two key components of deep learning: neural networks and optimisation. For more details and intuitions about neural networks, watch 3Blue1Brown (2017a); for more details and intuitions about optimisation, watch 3Blue1Brown (2017b). Lastly, see Simonini (2020) for an introduction to how deep learning can be used to solve reinforcement learning tasks.

Core readings:

  1. If you’re not familiar with the basics of statistics, like linear regression and classification:
    1. Introduction: linear regression (10 mins)
    2. Ordinary least squares regression (10 mins)
  2. A short introduction to machine learning (Ngo, 2021) (20 mins)
  3. But what is a neural network? (3Blue1Brown, 2017a) (20 mins)
  4. Gradient descent, how neural networks learn (3Blue1Brown, 2017b) (20 mins)
  5. An introduction to deep reinforcement learning (Simonini, 2020) (30 mins)
Week 1: Artificial general intelligence

The first two readings this week offer several different perspectives on how we should think about artificial general intelligence. This is the key concept underpinning the course, so it’s important to deeply explore what we mean by it, and the limitations of our current understanding.

The third reading is about how we should expect advances in AI to occur. AI pioneer Rich Sutton explains the main lesson he draws from the history of the field: that “general methods that leverage computation are ultimately the most effective”. Compared with earlier approaches, these methods rely much less on human design, and therefore raise the possibility that we build AGIs whose cognition we know very little about.

Focusing on compute also provides a way to forecast when we should expect AGI to occur. The most comprehensive report on the topic (summarised by Karnofsky (2021)) estimates the amount of compute required to train neural networks as large as human brains to do highly impactful tasks, and concludes that this will probably be feasible within the next four decades - although the estimate is highly uncertain.

Core readings:

  1. Four background claims (Soares, 2015) (15 mins)
  2. AGI safety from first principles (Ngo, 2020) (only sections 1, 2 and 2.1) (20 mins)
  3. The Bitter Lesson (Sutton, 2019) (15 mins)
  4. Forecasting transformative AI: the “biological anchors” method in a nutshell (Karnofsky, 2021) (30 mins)
Week 2: Goals and misalignment

This week we’ll focus on how and why AGIs might develop goals that are misaligned with those of humans, in particular when they’ve been trained using machine learning. We cover three core ideas. Firstly, it’s difficult to create reward functions which specify the desired outcomes for complex tasks (known as the problem of outer alignment). Krakovna et al. (2020) helps build intuitions about the difficulty of outer alignment, by showcasing examples of misbehaviour on toy problems.

Secondly, however, it’s important to distinguish between the reward function which is used to train a reinforcement learning agent, versus the goals which that agent learns to pursue. Hubinger et al. (2019a) argue that even an agent trained on the “right” reward function might acquire undesirable goals - the problem of inner alignment.

Thirdly, Bostrom (2014) argues that almost all goals which an AGI might have would incentivise it to misbehave in highly undesirable ways (e.g. pursuing survival and resource acquisition), due to the phenomenon of instrumental convergence.

While we can describe fairly easily what badly misaligned AIs might look like, it’s a little more difficult to pin down what qualifies as an aligned AI. Christiano’s (2018) definition allows us to mostly gloss over the difficult ethical questions.

Core readings:

  1. Specification gaming: the flip side of AI ingenuity (Krakovna et al., 2020) (15 mins)
  2. Introduction to Risks from Learned Optimisation (Hubinger et al., 2019a) (30 mins)
  3. Superintelligence, Chapter 7: The superintelligent will (Bostrom, 2014) (45 mins)
  4. Clarifying “AI alignment” (Christiano, 2018) (10 mins)
Week 3: Threat models and types of solutions

How might misaligned AGIs cause catastrophes, and how might we stop them? Two threat models are outlined in Christiano (2019) - the first focusing on outer misalignment, the second on inner misalignment. Muehlhauser and Salamon (2012) outline a core intuition for why we might be unable to prevent these risks: that progress in AI will at some point speed up dramatically. A third key intuition - that misaligned agents will try to deceive humans - is explored by Hubinger et al. (2019).

How might we prevent these scenarios? Christiano (2020) gives a broad overview of the landscape of different contributions to making AIs aligned, with a particular focus on some of the techniques we’ll be covering in later weeks.

Core readings:

  1. What failure looks like (Christiano, 2019) (20 mins)
  2. Intelligence explosion: evidence and import (Muehlhauser and Salamon, 2012) (only pages 10-15) (15 mins)
  3. AI alignment landscape (Christiano, 2020) (30 mins)
  4. Risks from Learned Optimisation: Deceptive alignment (Hubinger et al., 2019) (45 mins)
Week 4: Learning from humans

This week, we look at four techniques for training AIs on human data (all falling under “learn from teacher” in Christiano’s AI alignment landscape from last week). From a safety perspective, each of them improves on standard reinforcement learning techniques in some ways, but also has weaknesses which prevent it from solving the whole alignment problem. Next week, we’ll look at some ways to make these techniques more powerful and scalable; this week focuses on understanding each of them.

The first technique, behavioural cloning, is essentially an extension of supervised learning to settings where an AI must take actions over time - as discussed by Levine (2021). The second, reward modelling, allows humans to give feedback on the behaviour of reinforcement learning agents, which is then used to determine the rewards they receive; this is used by Christiano et al. (2017) and Steinnon et al. (2020). The third, inverse reinforcement learning (IRL for short), attempts to identify what goals a human is pursuing based on their behaviour.

A notable variant of IRL is cooperative IRL (CIRL for short), introduced by Hadfield-Menell et al. (2016). CIRL focuses on cases where the human and AI interact in a shared environment, and therefore the best strategy for the human is often to help the AI learn what goal the human is pursuing.

Core readings:

  1. Imitation learning lecture: part 1 (Levine, 2021a) (20 mins)
  2. Deep RL from human preferences blog post (Christiano et al., 2017) (15 mins)
  3. Learning to summarise with human feedback blog post (Stiennon et al., 2020) (25 mins)
  4. Inverse reinforcement learning
    1. For those who don’t already understand IRL:
    2. For those who already understand IRL:
Week 5: Decomposing tasks for outer alignment

The most prominent research directions in technical AGI safety involve training AIs to do complex tasks by decomposing those tasks into simpler ones where humans can more easily evaluate AI behaviour. This week we’ll cover three closely-related algorithms (all falling under “build a better teacher” in Christiano’s AI alignment landscape).

Wu et al. (2021) applies reward modelling recursively in order to solve more difficult tasks. Recursive reward modelling can be considered one example of a more general type of technique called iterated amplification (also known as iterated distillation and amplification), which is described in Ought (2019). A more technical description of iterated amplification is given by Christiano et al. (2018), along with some small-scale experiments.

The third technique we’ll discuss this week is Debate, as proposed by Irving and Amodei (2018). Unlike the other two techniques, Debate focuses on evaluating claims made by language models, rather than supervising AI behaviour over time.

Core readings:

  1. Recursively summarising books with human feedback (Wu et al., 2021) (ending after section 4.1.2: Findings) (45 mins)
  2. Factored cognition (Ought, 2019) (introduction and scalability section) (20 mins)
  3. AI safety via debate blog post (Irving and Amodei, 2018) (15 mins)
  4. Supervising strong learners by amplifying weak experts (Christiano et al., 2018) (40 mins
Week 6: Other paradigms for safety work

A lot of safety work focuses on “shifting the paradigm” of AI research. This week we’ll cover two ways in which safety researchers have attempted to do so. The first is via research on interpretability, which attempts to understand in detail how neural networks work. Olah et al. (2020) showcases some prominent research in the area; and Chris Olah’s perspective is summarised by Hubinger et al. (2019).

The second is the research agenda of the Machine Intelligence Research Institute (MIRI) which aims to create rigorous mathematical frameworks to describe the relationships between AIs and their real-world environments. Soares (2015) gives a high-level explanation of their approach; while Demski and Garrabrant (2018) identify a range of open problems and links between them. 

Core readings:

  1. Zoom In: an introduction to circuits (Olah et al., 2020) (35 mins)
  2. Chris Olah’s views on AGI safety (Hubinger, 2019) (25 mins)
  3. MIRI’s approach (Soares, 2015) (30 mins)
  4. Embedded agents (Demski and Garrabrant, 2018) (25 mins)
Week 7: AI governance

In the last week of curriculum content, we’ll look at the field of AI governance. Start with Dafoe (2020), which gives a thorough overview of AI governance and ways in which it might be important, particularly focusing on the framing of AI governance as field-building. An alternative framing - of AI governance as an attempt to prevent cooperation failures - is explored by Clifton (2019). Although the field of AI governance is still young, Muehlhauser (2020) identifies some useful work so far. Finally, Bostrom (2019) provides a background framing for thinking about technological risks: the process of randomly sampling new technologies, some of which might prove catastrophic.

Core readings:

  1. AI Governance: Opportunity and Theory of Impact (Dafoe, 2020) (25 mins)
  2. Cooperation, conflict and transformative AI: sections 1 & 2 (Clifton, 2019) (25 mins)
  3. Our AI governance grantmaking so far (Muehlhauser, 2020) (15 mins)
  4. The vulnerable world hypothesis (Bostrom, 2019) (ending at the start of the section on ‘Preventive policing’) (60 mins)
Week 8 (four weeks later): Projects

The final part of the AGI safety fundamentals course will be projects where you get to dig into something related to the course. The project is a chance for you to explore your interests, so try to find something you’re excited about! The goal of this project is to help you practice taking an intellectually productive stance towards AGI safety - to go beyond just reading and discussing existing ideas, and take a tangible step towards contributing to the field yourself. This is particularly valuable because it’s such a new field, with lots of room to explore.

Click here for the full version of the curriculum, which contains additional readings, exercises, notes, discussion prompts, and project ideas.


[AN #167]: Concrete ML safety problems and their relevance to x-risk

Новости LessWrong.com - 20 октября, 2021 - 20:10
Published on October 20, 2021 5:10 PM GMT

[AN #167]: Concrete ML safety problems and their relevance to x-risk Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world View this email in your browser Newsletter #167
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.
Audio version here (may not be up yet).
Please note that, while I work at DeepMind, this newsletter represents my personal views and not those of my employer. SECTIONS HIGHLIGHTS

Unsolved Problems in ML Safety (Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt) (summarized by Dan Hendrycks): To make the case for safety to the broader machine learning research community, this paper provides a revised and expanded collection of concrete technical safety research problems, namely:

1. Robustness: Create models that are resilient to adversaries, unusual situations, and Black Swan events.

2. Monitoring: Detect malicious use, monitor predictions, and discover unexpected model functionality.

3. Alignment: Build models that represent and safely optimize hard-to-specify human values.

4. External Safety: Use ML to address risks to how ML systems are handled, including cyberwarfare and global turbulence.

Throughout, the paper attempts to clarify the problems’ motivation and provide concrete project ideas.

Dan Hendrycks' opinion: My coauthors and I wrote this paper with the ML research community as our target audience. Here are some thoughts on this topic:

1. The document includes numerous problems that, if left unsolved, would imply that ML systems are unsafe. We need the effort of thousands of researchers to address all of them. This means that the main safety discussions cannot stay within the confines of the relatively small EA community. I think we should aim to have over one third of the ML research community work on safety problems. We need the broader community to treat AI safety at least as seriously as safety for nuclear power plants.

2. To grow the ML safety research community, we need to suggest problems that can progressively build the community and organically grow support for elevating safety standards within the existing research ecosystem. Research agendas that pertain to AGI exclusively will not scale sufficiently, and such research will simply not get enough market share in time. If we do not get the machine learning community on board with proactively mitigating risks that already exist, we will have a harder time getting them to mitigate less familiar and unprecedented risks. Rather than try to win over the community with alignment philosophy arguments, I'll try winning them over with interesting problems and try to make work towards safer systems rewarded with prestige.

3. The benefits of a larger ML safety community are numerous. They can decrease the cost of safety methods and increase the propensity to adopt them. Moreover, to ensure that ML systems have desirable properties, it is necessary to rapidly accumulate incremental improvements, but this requires substantial growth since such gains cannot be produced by just a few card-carrying x-risk researchers with the purest intentions.

4. The community will fail to grow if we ignore near-term concerns or actively exclude or sneer at people who work on problems that are useful for both near- and long-term safety (such as adversaries). The alignment community will need to stop engaging in textbook territorialism and welcome serious hypercompetent researchers who do not post on internet forums or who happen not to subscribe to effective altruism. (We include a community strategy in the Appendix.)

5. We focus on reinforcement learning but also deep learning. Most of the machine learning research community studies deep learning (e.g., text processing, vision) and does not use, say, Bellman equations or PPO. While existentially catastrophic failures will likely require competent sequential decision-making agents, the relevant problems and solutions can often be better studied outside of gridworlds and MuJoCo. There is much useful safety research to be done that does not need to be cast as a reinforcement learning problem.

6. To prevent alienating readers, we did not use phrases such as "AGI." AGI-exclusive research will not scale; for most academics and many industry researchers, it's a nonstarter. Likewise, to prevent needless dismissiveness, we kept x-risks implicit, only hinted at them, or used the phrase "permanent catastrophe."

I would have personally enjoyed discussing at length how anomaly detection is an indispensable tool for reducing x-risks from Black Balls, engineered microorganisms, and deceptive ML systems.

Here are how the problems relate to x-risk:

Adversarial Robustness: This is needed for proxy gaming. ML systems encoding proxies must become more robust to optimizers, which is to say they must become more adversarially robust. We make this connection explicit at the bottom of page 9.

Black Swans and Tail Risks: It's hard to be safe without high reliability. It's not obvious we'll achieve high reliability even by the time we have systems that are superhuman in important respects. Even though MNIST is solved for typical inputs, we still do not even have an MNIST classifier for atypical inputs that is reliable! Moreover, if optimizing agents become unreliable in the face of novel or extreme events, they could start heavily optimizing the wrong thing. Models accidentally going off the rails poses an x-risk if they are sufficiently powerful (this is related to "competent errors" and "treacherous turns"). If this problem is not solved, optimizers can use these weaknesses; this is a simpler problem on the way to adversarial robustness.

Anomaly and Malicious Use Detection: This is an indispensable tool for detecting proxy gaming, Black Balls, engineered microorganisms that present bio x-risks, malicious users who may misalign a model, deceptive ML systems, and rogue ML systems.

Representative Outputs: Making models honest is a way to avoid many treacherous turns.

Hidden Model Functionality: This also helps avoid treacherous turns. Backdoors is a potentially useful related problem, as it is about detecting latent but potential sharp changes in behavior.

Value Learning: Understanding utilities is difficult even for humans. Powerful optimizers will need to achieve a certain, as-of-yet unclear level of superhuman performance at learning our values.

Translating Values to Action: Successfully prodding models to optimize our values is necessary for safe outcomes.

Proxy Gaming: Obvious.

Value Clarification: This is the philosophy bot section. We will need to decide what values to pursue. If we decide poorly, we may lock in or destroy what is of value. It is also possible that there is an ongoing moral catastrophe, which we would not want to replicate across the cosmos.

Unintended Consequences: This should help models not accidentally work against our values.

ML for Cybersecurity: If you believe that AI governance is valuable and that global turbulence risks can increase risks of terrible outcomes, this section is also relevant. Even if some of the components of ML systems are safe, they can become unsafe when traditional software vulnerabilities enable others to control their behavior. Moreover, traditional software vulnerabilities may lead to the proliferation of powerful advanced models, and this may be worse than proliferating nuclear weapons.

Informed Decision Making: We want to avoid decision making based on unreliable gut reactions during a time of crisis. This reduces risks of poor governance of advanced systems.

Here are some other notes:

1. We use systems theory to motivate inner optimization as we expect this motivation will be more convincing to others.

2. Rather than having a broad call for "interpretability," we focus on specific transparency-related problems that are more tractable and neglected. (See the Appendix for a table assessing importance, tractability, and neglectedness.) For example, we include sections on making models honest and detecting emergent functionality.

3. The "External Safety" section can also be thought of as technical research for reducing "Governance" risks. For readers mostly concerned about AI risks from global turbulence, there still is technical research that can be done.

Here are some observations while writing the document:

1. Some approaches that were previously very popular are currently neglected, such as inverse reinforcement learning. This may be due to currently low tractability.

2. Five years ago, I started explicitly brainstorming the content for this document. I think it took the whole time for this document to take shape. Moreover, if this were written last fall, the document would be far more confused, since it took around a year after GPT-3 to become reoriented; writing these types of documents shortly after a paradigm shift may be too hasty.

3. When collecting feedback, it was not uncommon for "in-the-know" researchers to make opposite suggestions. Some people thought some of the problems in the Alignment section were unimportant, while others thought they were the most critical. We attempted to include most research directions.

[MLSN #1]: ICLR Safety Paper Roundup (Dan Hendrycks) (summarized by Rohin): This is the first issue of the ML Safety Newsletter, which is "a monthly safety newsletter which is designed to cover empirical safety research and be palatable to the broader machine learning research community".

Rohin's opinion: I'm very excited to see this newsletter: this is a category of papers that I want to know about and that are relevant to safety, but I don't have the time to read all of these papers given all the other alignment work I read, especially since I don't personally work in these areas and so often find it hard to summarize them or place them in the appropriate context. Dan on the other hand has written many such papers himself and generally knows the area, and so will likely do a much better job than I would. I recommend you subscribe, especially since I'm not going to send a link to each MLSN in this newsletter.


Selection Theorems: A Program For Understanding Agents (John Wentworth) (summarized by Rohin): This post proposes a research area for understanding agents: selection theorems. A selection theorem is a theorem that tells us something about agents that will be selected for in a broad class of environments. Selection theorems are helpful because (1) they can provide additional assumptions that can help with learning human values, and (2) they can tell us likely properties of the agents we build by accident (think inner alignment concerns).

As an example, coherence arguments demonstrate that when an environment presents an agent with “bets” or “lotteries”, where the agent cares only about the outcomes of the bets, then any “good” agent can be represented as maximizing expected utility. (What does it mean to be “good”? This can vary, but one example would be that the agent is not subject to Dutch books, i.e. situations in which it is guaranteed to lose resources.) This can then be turned into a selection argument by combining it with something that selects for “good” agents. For example, evolution will select for agents that don’t lose resources for no gain, so humans are likely to be represented as maximizing expected utility. Unfortunately, many coherence arguments implicitly assume that the agent has no internal state, which is not true for humans, so this argument does not clearly work. As another example, our ML training procedures will likely also select for agents that don’t waste resources, which could allow us to conclude that the resulting agents can be represented as maximizing expected utility, if the agents don't have internal states.

Coherence arguments aren’t the only kind of selection theorem. The good(er) regulator theorem (AN #138) provides a set of scenarios under which agents learn an internal “world model”. The Kelly criterion tells us about scenarios in which the best (most selected) agents will make bets as though they are maximizing expected log money. These and other examples are described in this followup post.

The rest of this post elaborates on the various parts of a selection theorem and provides advice on how to make original research contributions in the area of selection theorems. Another followup post describes some useful properties for which the author expects there are useful selections theorems to prove.

Rohin's opinion: People sometimes expect me to be against this sort of work, because I wrote Coherence arguments do not imply goal-directed behavior (AN #35). This is not true. My point in that post is that coherence arguments alone are not enough, you need to combine them with some other assumption (for example, that there exists some “resource” over which the agent has no terminal preferences). I do think it is plausible that this research agenda gives us a better picture of agency that tells us something about how AI systems will behave, or something about how to better infer human values. While I am personally more excited about studying particular development paths to AGI rather than more abstract agent models, I do think this research would be more useful than other types of alignment research I have seen proposed.


State of AI Report 2021 (Nathan Benaich and Ian Hogarth) (summarized by Rohin): As with past (AN #15) reports (AN #120), I’m not going to summarize the entire thing; instead you get the high-level themes that the authors identified:

1. AI is stepping up in more concrete ways, including in mission critical infrastructure.

2. AI-first approaches have taken biology by storm (and we aren’t just talking about AlphaFold).

3. Transformers have emerged as a general purpose architecture for machine learning in many domains, not just NLP.

4. Investors have taken notice, with record funding this year into AI startups, and two first ever IPOs for AI-first drug discovery companies, as well as blockbuster IPOs for data infrastructure and cybersecurity companies that help enterprises retool for the AI-first era.

5. The under-resourced AI-alignment efforts from key organisations who are advancing the overall field of AI, as well as concerns about datasets used to train AI models and bias in model evaluation benchmarks, raise important questions about how best to chart the progress of AI systems with rapidly advancing capabilities.

6. AI is now an actual arms race rather than a figurative one, with reports of recent use of autonomous weapons by various militaries.

7. Within the US-China rivalry, China's ascension in research quality and talent training is notable, with Chinese institutions now beating the most prominent Western ones.

8. There is an emergence and nationalisation of large language models.

Rohin's opinion: In last year’s report (AN #120), I said that their 8 predictions seemed to be going out on a limb, and that even 67% accuracy woud be pretty impressive. This year, they scored their predictions as 5 “Yes”, 1 “Sort of”, and 2 “No”. That being said, they graded “The first 10 trillion parameter dense model” as “Yes”, I believe on the basis that Microsoft had run a couple of steps of training on a 32 trillion parameter dense model. I definitely interpreted the prediction as saying that a 10 trillion parameter model would be trained to completion, which I do not think happened publicly, so I’m inclined to give it a “No”. Still, this does seem like a decent track record for what seemed to me to be non-trivial predictions. This year's predictions seem similarly "out on a limb" as last year's.

This year’s report included one-slide summaries of many papers I’ve summarized before. I only found one major issue -- the slide on TruthfulQA (AN #165) implies that larger language models are less honest in general, rather than being more likely to imitate human falsehoods. This is actually a pretty good track record, given the number of things they summarized where I would have noticed if there were major issues.


CHAI Internships 2022 (summarized by Rohin): CHAI internships are open once again! Typically, an intern will execute on an AI safety research project proposed by their mentor, resulting in a first-author publication at a workshop. The early deadline is November 23rd and the regular deadline is December 13th.

FEEDBACK I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email. PODCAST An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.
Subscribe here:

Copyright © 2021 Alignment Newsletter, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.



Подписка на LessWrong на русском сбор новостей