Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 9 часов 57 минут назад

Covid 11/4: The After Times

4 ноября, 2021 - 18:20
Published on November 4, 2021 3:20 PM GMT

The hope is that the pandemic will be fully and truly over because case counts will be sufficiently low, and vaccination rates sufficiently high, that we can all agree to move on and resume our lives.

The fear is that this will never happen. Either cases will climb back up and be sufficiently high to justify a continued emergency state, or they won’t but people will react in a nonsensical and disproportionate way to a tiny risk, forever damaging or even destroying much of our way of life. 

At this point, that potential reaction is the true risk factor. Children as young as five can be vaccinated, and anyone who wants one can effectively get a booster shot. There’s no risk left in the room that is different from many other background risks we all take every day. 

Meanwhile, case counts stopped declining this week outside of the South, so the strategy of ‘wait until cases are much lower’ is looking like a less promising strategy than it did before.

For you, in your life, outside of official meddling, the pandemic is over for you, if and when you decide it is over. 

If you want them to, your After Times can start today.

As far as my personal life is concerned, the After Times started last week. Pandemic over. I’ll still have to flash my vaccination card and toggle my mask on and off as required, but that’s all for show. It’s over. If my building keeps requiring masks and keeps refusing to let delivery people go upstairs, that’s annoying, but so is it when one’s day sadly requires pants.

This past week I dove a bunch into the logistical situation at the Port of Long Beach, and the Tweetstorm that helped change the container stacking rule in the city. That first post was my most widely read post ever by a wide margin. I wrote a follow-up, and notice that the logistics issues seem urgent in a way that Covid issues increasingly do not seem urgent. Perhaps I can continue to work on transitioning away from a Covid focus towards a focus on things that now matter more, on a variety of fronts. The more of these posts I can keep short, the better. 

Executive Summary
  1. Child vaccinations for ages 5-11 good to go.
  2. Case counts may no longer be declining.
  3. Vaccine mandate compliance very high when mandates are actually enforced.

Let’s run the numbers.

The Numbers Predictions

Prediction from last week: 400k cases (-9%) and 8,600 deaths (-10%).

Results: 442,620 cases (-1%) and 8,439 deaths (-11%).

Prediction for next week: 442k cases (no change) and 7500 deaths (-11%).

The death numbers are predictable. The case number is a major inflection point and will tell us a lot, given that cases stopped declining this week. It’s a very different story to see an increase versus no change versus a resumed decline, and all three are possible. For now I’m predicting no change, but it’s more likely that it meaningfully changes than that it stays essentially the same.


Deaths continue to follow cases with several weeks of delay.

There was little uncertainty here, which may change in a few weeks if case counts have stopped declining, but it’s rare now that the death numbers tell us much that we didn’t already know.


Oh no.

Numbers in the South continue to decline, but the increases in the Midwest and Northeast are exactly what we’d expect to see if we’re headed for a winter wave. Positive test percentages tell the same story, so this is probably not a data artifact. I don’t want to draw too many conclusions from one week of data, and child vaccinations are about to start which could help a lot, but the chances of things fading away into the background that easily seem a lot lower than they did last week.

Vaccinations  Vaccine Effectiveness

Vaccines are safe and effective, and boosters make them far more effective. You know what might be less effective than the vaccine? Homeopathic treatment, also known as nothing. So here we are, Green Bay Packers.

What’s great about this story is Rodgers petitioning to have his homeopathic treatments ‘count as vaccination‘ and also the NFL pretending it didn’t know what the situation was while Rodgers was going around ignoring the rules for unvaccinated players. Whoops. 

Vaccine Mandates

San Francisco wastes zero time, moves to impose vaccine requirements on five year old children after an eight week period to ensure that children have enough time to get vaccinated. How very generous of them. 

That which is not forbidden is mandatory. That which is not mandatory is forbidden. If you’re voting to make something not forbidden in such a context, you really do need to consider that this is definitely going to happen. 

As I’ve repeatedly noted, there’s a big difference between not getting vaccinated under normal circumstances, and not getting vaccinated even though it will get you fired. That’s a big difference.

It makes sense that the vast majority of the unvaccinated, when push comes to shove and their job is on the line, choose vaccination.

And that’s exactly what happens.

From 66% to 98% means that 94% of all unvaccinated employees agreed to get vaccinated. 

New York had a similar experience.

What’s funny about the NYPD situation is that the same people who would have cheered and laughed at the cops for quitting, are also cheering and laughing at the cops for not quitting. 

Air Canada suspends 800 more than employees without pay for not being fully vaccinated. That’s about 3% of their workforce, so right in line with other numbers.

Here’s an ethics professor who got fired for not getting vaccinated, willing to take a strong stand and pay the price. Yes, such people exist, but they’re rare. It’s unfortunate that the second half of her statement repeats a lot of what I believe to be misinformation, rather than sticking to her principled position that the danger level of Covid-19 simply doesn’t rise to the level that justifies violating bodily autonomy. 

This is in contrast to places where the alternative is weekly testing, especially when that weekly testing doesn’t actually happen. Those tactics are less effective. 

The Federal mandate isn’t going into effect until January 4, if it happens at all. Companies might find ways to not enforce it, but I expect similarly high compliance rates at any companies that do enforce it, and among federal employees, if and when it does go into effect. Republicans are trying to kill the mandate.

Does this mean that work requirements are effectively mostly involuntary? 

It could mean that, but it could also mean that switching jobs is annoying and expensive (and that for those where it wasn’t and they cared a lot about not being vaccinated, they simply already left), and it turns out that defiance of the vaccine mandate is shallow. In the face of actual costs, most fold like cheap tents. I’d expect the same if there was an (actually enforced) reasonably sized fine involved.

It also makes sense that many of the few who don’t do that are taking a stand for actual reasons, like allergic reactions.

There’s supposed to be medical exceptions available, but inevitably some people aren’t being given exemptions they need, because if you are sure to give out all the exemptions people need then lots of other people will get fake exemptions, so it’s very hard to not have this ruined for everyone. 

There’s also this approach. It’s not a great look, but the logic behind it makes sense. You don’t get a death benefit in case of a suicide either, and also I’m not sure why those who want it shouldn’t be buying their own life insurance, which charges different prices depending on a wide variety of risk factors.

These days, workers who refuse to get vaccinated against covid-19 may face financial repercussions, from higher health insurance premiums to loss of their jobs. Now, the financial fallout might follow workers beyond the grave. If they die of covid and weren’t vaccinated, their families may not get death benefits they would otherwise have received.

New York’s Metropolitan Transportation Authority no longer pays a $500,000 death benefit to the families of subway, bus and commuter rail workers who die of covid if the workers were unvaccinated at the time of death.

“It strikes me as needlessly cruel,” said Mark DeBofsky, a lawyer at DeBofsky Sherman Casciari Reynolds in Chicago who represents workers in benefit disputes.

Other employers have similar concerns about providing death or other benefits to employees who refuse to be vaccinated.

NPIs Including Mask and Testing Mandates 

This week’s investigation into our lack of reasonably priced or widely available rapid tests finds the same thing as every other investigation: The FDA dragged their feet sufficiently to drive most providers out of the market, so the few that are approved charge a lot.

Think of the Children

The referenced article makes the obvious point that vaccinated five-year-old children have almost exactly as much reason to wear masks in 2021 as they did in 2019. The fully vaccinated colleges are in deeply similar situations. For whatever reasons, the lives of young people are expendable and their experiences are not important, but their safety even at probabilities very hard to distinguish from zero has been declared paramount.

Young children being vaccinated opens the latest front in the war. Make no mistake, there is a war. 

In Other News

Britain approves molnupiravir, Merck’s treatment pill for Covid-19 (WaPo). 

I’m shocked, shocked to find politics going on in this establishment. 

The things policy isn’t being driven by here seem right. The thing it is does not seem complete, as those in Public Health and who are Very Serious People or offer Elite Consensus clearly can compete with and sometimes overrule public opinion, often not for the better. But yeah, science? How many divisions does it have?

It’s easy to confuse cause and effect. Often the people approve of whatever restrictions such folks get put into place, and blindly follow ‘official guidelines,’ rather than the other way around. Politicians in many places think they’re following polls, and maybe they even are, but that doesn’t mean that regular people are meaningfully driving events. It does offer hope and a (difficult to implement) model of action, if one could convince the public directly. 

In honor of polls driving policy, here are some recent poll numbers from Wisconsin, to give some context of where the public is on these matters. 

Mask requirements for schools split parents down the middle, and are strongly supported by those without children. That’s before child vaccinations, which presumably will move the needle on that at least a little. The same people who support mask requirements are very concerned about children falling behind or having mental health problems (and also, in an unrelated note, inflation). 

New Fluvoxamine results are in, and they look good. This now seems clearly like it should be part of the standard of care in appropriate cases.

I fully endorse that this is too good to check, so not checking.

Occasionally, in this world, there is justice.


What is the link between altruism and intelligence?

4 ноября, 2021 - 06:50
Published on November 3, 2021 11:59 PM GMT

When talking about the singularity, BMI's etc. I haven't really seen thoughts about our what our moral decisions will be like. When listening to i.e. Kurzweil, he seems to assume the extreme intelligence will make us all completely rational and want the best for everyone and ourselves, that psychopaths will stop being psychopaths and sadists will stop being sadists. I'm not saying there isn't any ground for this, but I still worry about it because I think there will be a time window of an unstable society where there are people with extremely enhanced cognitive capabilities but no laws to prevent any of them going rogue, misusing their power etc. Will all humans being super-Einsteins prevent a sadist from making a simulation where beings are tortured? Or from trying to take over the world and torture us all for eternity? Has there been any good research for this? Because it seems like the most crucial aspect of whether the singularity will go well or not. I still think it seems like society should advance slow in the early stages of BMI's and bioengineering of brains, to not let anyone get too far ahead so the good hearted smart people have time to adapt the laws, add more control and find out methods to prevent bad scenarios from happening.


If I have a strong preference for remote work, should I focus my career on AI or on blockchain?

4 ноября, 2021 - 06:08
Published on November 3, 2021 11:03 PM GMT

Hello LessWrong crew,

I am familiar with the ideas of Effective Altruism as I have read the 80,000 Hours career guide. I think it is a great guide and it definitely put a new perspective on the way I view my career.

A bit of my background: 

I have a master's degree in computer science. I am currently working remotely as a machine learning engineer.

Here is a list of the things that I am looking for in my career, ordered from most important to least important:

  1. Remote work
  2. High salary
  3. Impact

Maybe I'm not the paragon of Effective Altruism values, but if I'm being honest, I value remote work and high salary more than impact. Impact has the 3rd place, but it is still a factor.

Now onto my question:

A few years ago I read Superintelligence and got scared that AGI might make humanity go extinct. I then started focusing on machine learning and after graduating I ended up as a machine learning engineer, where I'm working currently.

Recently, however, I began questioning whether what I was doing is the right thing to do impact-wise. I believe blockchain to be a great technology as well (even though we are in a bubble right now). Fundamentally, I think blockchain is going to bring "power  to the people" and I think that's great. It's got it weaknesses now, sure, but over time I think they'll get ironed out.

Here are my top three reasons why I think I should switch to blockchain:

  1. Given my strong remote work preferences, I don't think I will make any impact in anything AI safety related. I think that the main discoveries are being made in companies such as OpenAI and DeepMind and they all require going to the office. Since I don't want to go to the office (my remote work preference is higher than my impact preference), I don't think I will be a part of a team that reaches a fundamental breakthrough. With blockchain, on the other hand, most jobs are remote and I could therefore contribute more.
  2. I am not 100% convinced that AI safety is an existential risk. There are some indications toward this (such as this one), but I think that it may very well be that worrying about AGI safety (as in it's an existential risk for all humans) is the same as worrying that aliens will come and destroy Earth or something similar. I am not denying the problems with current AI systems, but what I am saying is that I don't see a clear path to AGI and I think there's a lot of hand waving that goes on when talking about AGI safety at this point in time.
  3. One could make the argument that I should do machine learning engineering jobs and wait for AI safety related jobs to become remote. I would then be working on making some AI system safe. Here's the problem with this perspective: I'm not sure when we will come to a point where there are remote AI safety jobs available. What if there's no fundamental breakthrough in AI for another 30-40 years and I keep working on some non-AI safety related remote machine learning jobs to "keep my skills sharp in case they're needed", only to find myself never using them on actual AI safety problems.

Fundamentally, the only reason I'm interested in AI is because of AGI safety. And right now I'm not sure that AGI safety is a real existential threat and even if it is, given my remote work preferences I will probably have low to no impact on AGI safety. Blockchain, on the other hand, is already changing and will most likely continue to change the way we use the internet and is much more remote friendly.

What are your 2 cents? I'd like to bounce off perspectives off of others to see if I'm missing anything in my train of thought.

P.S. I cross-posted this on Effective Altruism forums to get multiple perspectives.


Baby Sister Numbers

4 ноября, 2021 - 04:10
Published on November 4, 2021 1:10 AM GMT

A few days ago, Lily (7y) told me about some Nora-inspired numbers:

  • The largest number is Noranoo.

  • If you try and make any larger number, you still get Noranoo. For example, Noranoo + 1 = Noranoo, and Noranoo * 2 = Noranoo.

  • Otherwise, it behaves normally. You can have Noranoo - 1, dubbed "Norklet". This means Noranoo - 1 + 1 = Noranoo, while Noranoo + 1 - 1 = Norklet. This didn't bother her.

  • Noranoo * -1 is Norahats. It is the smallest number, and like Noranoo any attempt to go lower keeps you at Norahats.

  • These are very large numbers: much bigger than a googol.

This is a kind of saturation arithmetic, more of a computersy approach than a mathy one, since you give up associativity, distributivity, the successor function being an injection, and all that.

On the other hand, it's slightly more elegant than a typical computational implementation of saturation, because it is symmetric around zero. Normally, you are using some number of bits, which gives you 2^N distinct values, and so an even number of integers. Typically we set the minimum integer to be one larger, in absolute value, than the maximum one. In this case, though, there are an odd number of integers. I asked whether perhaps Norahats * -1 * -1 * -1 could be Norklet and not Noranoo, but Lily insisted that Noranoo and Norahats were equal in magnitude.

Comment via: facebook


Depositions and Rationality

4 ноября, 2021 - 03:10
Published on November 4, 2021 12:10 AM GMT

Reading this guide on deposition strategy I was impressed by how many of the techniques lawyers use on uncooperative witnesses seem like good rationality techniques (a deposition is a sworn oral testimony taken out of court without a supervising judge).

Searching LessWrong, nobody seems to have noticed the parallel. Searching the CFAR participant handbook, a number of these techniques are fitting and novel.


These techniques are designed to pull out answers from uncooperative witnesses that are legally compelled to answer truthfully. Uncooperative witnesses are often coached by a defense team in avoidance techniques.

Avoiding difficult or unpleasant questions is a common self-defeating impulse. This is true both individually, and in social interactions. Hesitation is always easy, rarely useful. Legal repercussions are normally absent, so social and group truth-seeking are not fully identical to a deposition, but the formal commitment to finding a truthful answer may be a sufficient common factor.

Summary of techniques
  1. Technique zero is to recognize that it is likely that you will face an evasive witness, and commit explicitly to generate a written plan to deal with them.
  2. "Marshal the facts and standards that you can use to build toward your ultimate question". Before requesting a conclusion ("did the motor carrier you represent violate Federal Motor Carrier Safety Regulations?") request individual facts that build up to your conclusion ("does FMCSR apply to your company?", "does FMCSR requires that your company ensure compiance by drivers?", "do you provide training and testing for your drivers on the contents of the FMCSR?", "does FMCSR requires that drivers keep accurate logs of their trips?", etc).
  3. Exhaustion. Make the witness commit to their answer by asking "what else?", "is that all?", "is there any material you need to fully answer the question?".
  4. Avoiding rabbit trails. The correct response is to let go or write down the tempting line of questioning and proceed with exhaustion.
    • Example: "Q. Give me all the reasons you believe Dr. Smith conformed to the standard of care. A. He identified the median nerve at the time of the surgery which was abnormally small and congenitally demyelinated, he protected the median nerve with [...]."
      • Incorrect response: "Q. What do you mean abnormally small?"
      • Correct response: "Q. Is there any other reason you believe Dr. Smith confirmed to standard of care?"
  5. Restate and summarize. "Exhaustion of a particular topic may require many questions and many pages in a transcript. If left in its raw form, the testimony maybe unmanageable and unusable with a jury or court."
  6. Boxing in, by bracketing[1]. People who claim to have no idea about a quantity will often give surprisingly tight ranges when explicitly interrogated.
    • Example: "Q. How far apart were your truck and Mrs. Agan's car? A. I don't know. Q. Could it have been at least 5 feet, say the distance between you and me? A. No, that is to close. Q. Could it have been 15 feet, say the distance between yourself and that wall? A. No, that is too far. Q. So it is fair to say that Mrs. Agan's car was between 5 and 15 feet from your truck? A. Correct."
  1. Boxing in, facts-witnesses-documents. "A technique that forces the witness to commit to testimony and/or describe any and all possible circumstances that might allow their future testimony to change. Witnesses explain their change in testimony by using one or more of three broad categories of information that a witness didn't have or consider during their deposition. These three broad categories are: facts they did not know or recollect, witnesses/individuals they had not spoken to at the time of the deposition, and/or documents they had not seen, recollected or considered. If none of those things exist, there is no basis for the witness' testimony changing."
  2. Creating commitments from witnesses. Overlaps significantly with 1 ("marshall the facts"), focused on establishing the witness is/should be able to answer the questions. This is done by establishing that they are the person with most knowledge about the matter of the questions, that they are bound to answer and aware of this, that they do not need further preparation to provide answers. This line of questioning may reveal that the witness is not able to answer your questions to satisfaction, and you should be questioning someone else.
  3. Dealing with "I don't know". Two broad strategies are outlined:
    1. "Nail down the fact that this witness doesn’t know something of importance" using facts-witness-documents box-in.
    2. "Convince the witness that while he/she 'may not know' or they are 'not sure', there is a plausible explanation/definition/standard that they will (inevititably) accept as true."
  4. Setting a timeout to think of answers ("Having thought about it for over a minute can you think of any other safety reasons for the 'No Left Turn Rule' for truck drivers?"). Arguably part of exahustion.
  5. Dealing with witnesses that want you to define your terms. Three broad strategies are outlined:
    1. "Ask for and Adopt the definition used by the witness"
    2. "Use the person’s life experiences to create a reasonable, fair definition"
    3. Use a regulation/dictionary/thesaurus to establish common use
  6. Dealing with witnesses that want to avoid responsibility by quibbling with word choice ("That’s just a guideline, not a rule"). The response is to establish that the norm is expected of them, that they expect them of others, and for good reason.
  7. Dealing with hedging answers ("for the most part", "mostly", "not necessarily"). Call out that the answer leaves room for doubt and interpretation, and ask the witness to make exceptions explicit until exhaustion ("that is all").
  8. Demolishing rationalizations. This is not explored in detail. An example is given. (see after "When a witness has obviously manufactured some bogus rationale").
  9. Interrupt witness rants.
  10. Do not allow witnesses to interrupt your questions.
Points of coincidence with the CFAR Handbook
  • "Marshalling the facts" is similar in spirit to identifying cruxes, in that any disagreement is supposed to be reduced to an explicit minimal argument.
  • "Convincing a witness to accept a standard" uses the framework of policy level decisionmaking: making a witness accept that a norm applies to them is easier by having them endorse it first; it then becomes blatantly hypocritical for them to reject it.
  • Setting apart some time for a witness to actually try recalling further information is almost exactly the same as resolve cycles.
Points of disagreement with the CFAR Handbook
  • "False beliefs feel the same as true ones" means that witnesses are probably too willing to nail down their testimony and end up accidentally perjuring themselves ("Q. If you were provided facts about the position and description of the other cars, would that make a difference? A. No Q. If you had a conversation with one of your passengers about what they remember [...]? A. No").
  • "Boxing in by bracketing" or "calibrated estimation" is missing from the handbook
  1. This technique appears almost verbatim under "calibrated estimation" in Douglas Hubbard's How to measure anything. ↩︎


The Opt-Out Clause

4 ноября, 2021 - 00:59
Published on November 3, 2021 9:59 PM GMT

(cross-posted from my blog)

Let me propose a thought experiment with three conditions.

First, you're in a simulation, and a really good one at that. Before you went in, the simulators extracted and stored all of your memories, and they went to great lengths to make sure that the simulation is completely faultless.

Second, you can leave any time you like. All you have to do to end the simulation, regain your memories, and return to reality is recite the opt-out passphrase: "I no longer consent to being in a simulation". Unfortunately, you don't know about the opt-out condition: that would kind of ruin the immersion.

Three, although you're never told directly about the opt-out condition, you do get told about indirectly, phrased as a kind of hypothetical thought experiment. Maybe someone poses it to you at a party, maybe you read it on twitter, maybe it's a blog post on some niche internet forum. You're guaranteed to hear about it at least once though, to give you a fair chance of leaving. But it's vague and indirect enough that you can dismiss it if you want, and probably forget about it in a week.

It's not enough to think the opt-out phrase, you have to actually say it or write it. So the question is, hypothetically, would you?


Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22]

3 ноября, 2021 - 21:23
Published on November 3, 2021 6:22 PM GMT

We (Redwood Research and Lightcone Infrastructure) are organizing a bootcamp to bring people interested in AI Alignment up-to-speed with the state of modern ML engineering. We expect to invite about 20 technically talented effective altruists for three weeks of intense learning to Berkeley, taught by engineers working at AI Alignment organizations. We will cover all expenses. 

We aim to have a mixture of students, young professionals, and people who already have a professional track record in AI Alignment or EA, but want to brush up on their Machine Learning skills.

Dates are Jan 3 2022  - Jan 22 2022. Application deadline is November 15th. We will make application decisions on a rolling basis, but will aim to get back to everyone by November 22nd.

Apply here

AI-Generated image (VQGAN+CLIP) for prompt: "Machine Learning Engineering by Alex Hillkurtz", "aquarelle", "Tools", "Graphic Cards", "trending on artstation", "green on white color palette"

The curriculum is still in flux, but this list might give you a sense of the kinds of things we expect to cover (it’s fine if you don’t know all these terms):

  • Week 1: PyTorch — learn the primitives of one of the most popular ML frameworks, use them to reimplement common neural net architecture primitives, optimization algorithms, and data parallelism
  • Week 2: Implementing transformers  — reconstruct GPT2, BERT from scratch, play around with the sub-components and associated algorithms (eg nucleus sampling) to better understand them
  • Week 3: Training transformers — set up a scalable training environment for running experiments, train transformers on various downstream tasks, implement diagnostics, analyze your experiments
  • (Optional) Week 4: Capstone projects

We’re aware that people start school/other commitments at various points in January, and so are flexible about you attending whatever prefix of the bootcamp works for you. 


The bootcamp takes place at Constellation, a shared office space in Berkeley for people working on long-termist projects. People from the following organizations often work from the space: MIRI, Redwood Research, Open Philanthropy, OpenAI, Lightcone Infrastructure, Paul Christiano’s Alignment Research Center and more.

As a participant, you’d attend communal lunches and events at Constellation and have a great opportunity to make friends and connections.

If you join the bootcamp, we’ll provide: 

  • Free travel to Berkeley, for both US and international applications
  • Free housing
  • Food
  • Plug-and-play, pre-configured desktop computer with an ML environment for use throughout the bootcamp

You can find a full FAQ and more details in this Google Doc.

Apply here


Rockville Dinner Meetup

3 ноября, 2021 - 05:15
Published on November 3, 2021 2:15 AM GMT

We'll be meeting for dinner on Tuesday, November 9, at Rockville Town Center. There's parking nearby at the Rockville Town Square garage (https://goo.gl/maps/poMwc75efjHNjUuDA). The Rockville Metro station is closed until December, so please make alternate plans if taking transit.

We usually grab food from Lebanese Taverna; you can do the same, or pick one of the other restaurants nearby.

If the weather is good enough, we'll sit outside at a picnic table. If it's not good, we'll meet inside Lebanese Taverna.

COVID info: Please only come if you are vaccinated and have not had any COVID symptoms or exposure recently. Montgomery County still has an indoor mask mandate, so bring a mask to wear while ordering food inside.


Why isn't there more rationalist punk rock?

3 ноября, 2021 - 04:06
Published on November 3, 2021 1:06 AM GMT

Could call it like rat punk or something? 


Paths Forward: Scaling the Sharing of Information and Solutions

3 ноября, 2021 - 02:50
Published on November 2, 2021 11:50 PM GMT

Follow-up to: An Unexpected Victory: Container Stacking in the Port of Long Beach

My chronicle of how Ryan managed to change the container stacking rule in the Port of Long Beach quickly became the most viewed post I’ve ever had. A lot of people were excited to spread the word about this story. This makes one think about the stretch goal, where we get to do things like this more often, potentially even creating this virtuous cycle:

  1. Identify a concrete problem.
  2. Explain the gears behind the problem.
  3. Present a solution to the problem, and explain how it would work.
  4. People notice this, and amplify the signal.
  5. Signal causes people with ability to implement solutions to notice.
  6. Solution is implemented.
  7. Victory.
  8. Except, you know, with planning and strategy involved.
  9. Every time this happens, it becomes more normal to do it again, and it becomes easier to get good solutions amplified and implemented.
  10. We get to thinking we can solve problems, and change things for the better…
    1. Including when the solutions are less and less trivial
    2. Including they require non-zero downsides or non-zero amounts of time
    3. It becomes standard to solve problems and blameworthy to not solve them.

Setting aside for now how much the rule change mattered. For this to be exciting, it must be an asymmetric weapon. Does this work for positive-sum solutions to problems without working, or working less well, for zero-sum resource grabs? 

My instinct was clearly yes, this only worked because it was obviously correct. If it had not been obviously correct, the effort fails, for overdetermined reasons. This isn’t obvious to others, and a lot of this post explores the gears involved.

Asymmetric Attention Weapons

I was highly frustrated to see this reaction, which I instinctively (if unfairly) translated as ‘actually, attempting to communicate information that might cause people to do things is bad, because the reference class is zero-sum games that compete for attention’:

It’s important to intuition pump why this is the opposite of most of the attention economy, and why such strategies are asymmetric. The information and solution both being true and important was central to the Tweetstorm’s success.

From my previous post‘s ‘I see what you did there’ list.

  1. Starts with a relatable physical story of a boat ride, and a friendly tone.
  2. Tells a (mostly manufactured) story that implies (without saying anything false) how the ride led him to figure these things out, which gives rhetorical cover to everyone else for not knowing about or talking about the problem. We can all decide to pretend this was discovered today.
  3. Then he invokes social consensus by saying that ‘everyone agrees‘ that the bottleneck is yard space. Which is true, as far as I can tell, everyone did agree on that. Which of course implies that everyone also knows there is a bottleneck, and that the port is backed up, and why this is happening. The hidden question of why no one is doing much about this is deflected by starting off pretending (to pretend?) that the boat ride uncovered the problem.
  4. Describes a clear physical problem that everyone can understand, in simple terms that everyone can understand but that doesn’t talk down to anyone. He makes this look easy. It is not easy, it is hard.
  5. Makes clear that the problem will only get worse on its own, not better, for reasons that are easy to understand.
  6. Makes clear the scope of the problem. Port of Long Beach effectively shuts down, we can’t ship stuff, potential global economic collapse. Not clear that it would be anything like that bad, but it could be.
  7. Gives a decision principle that’s simple, a good slogan and again can be understood by everyone, and that doesn’t have any obvious objections: Overwhelm the bottleneck.
  8. Gives a shovel-ready solution on how to begin to overwhelm the bottleneck, at zero cost, by allowing containers to stack more.
  9. Gives more shovel-ready solutions on top of that, so that (A) someone might go and do some of those as well, (B) someone can do the first easy thing and look like it’s some sort of compromise because they didn’t do the other things, (C) encourage others to come up with more ideas and have a conversation and actually physically think about the problem and (D) make it clear the focus is on finding solutions and solving problems, and not on which monkey gets the credit banana.
  10. Makes it clear solutions are non-rivalrous. We can do all of them, and should, but also do any one of them now.
  11. Gives a sense of urgency, and also a promise of things getting better right away. Not only can you act today, Sir, you are blameworthy tomorrow if you do not act, and you will see results and rewards tomorrow if you do act. Not only reactions to the announcements, physical results on the ground. That’s powerful stuff.
  12. Ends by noting that leadership is what is missing. You could be leadership and demonstrate you’re a good leader, or you can not do that and demonstrate the opposite. Whoever solves this is the leader.

It’s also worth flagging now that the people who amplified the Tweetstorm and made it succeed were not random Twitter users, but rather mostly belonged to the set of people who often care about the gears of physical models, who are more likely to respond in stronger fashion to many of these asymmetries. More on that in the section The Multilevel Signal Amplifier.

The Boat Ride

At the time, I didn’t fully appreciate the boat ride. It was more than a relatable and friendly story. The boat ride showed that Ryan was thinking about the real problem and looking for a real solution. That he was problem solving. 

This was an asymmetric costly signal. Ryan had to spend time and money renting a boat and taking a three hour tour. It’s asymmetric because those seeking truth gain knowledge, and others… get to say they’re on a boat

Imitators will get this wrong. 

If you’re trying to explain a physical problem and how you know about it, and chart a physical solution, you can think like a person thinking about physical problems and realize that seeing the problem yourself via this particular boat ride is a good idea. 

If you’re drumming up attention for other reasons, you won’t think like that. Your attempt to think of ‘something like Ryan’s boat ride’ will be incongruent searching for a physical solution. It won’t resonate. 

Was the boat ride designed to sell the Tweetstorm? Probably. But it wouldn’t have worked if the problem and solution were fake.

This cycles back into the ‘people don’t do things’ principle. Especially real physical things. People don’t take boat rides. You can say ‘well then if boat rides work everyone will start taking boat rides’ but no, mostly they won’t do that. 

Think about Duncan’s story above about renting a billboard to get a job…

Back to the Tweetstorm

In the third step, Ryan invokes social consensus. This worked, and it was important that this worked. If threads had been full of people disputing this, the whole thing would have gone down much differently, whether it was true or not. 

The fourth step is crucial. The question is whether it is asymmetric in a good way. I think the answer is mostly a clear yes. If you are forced to have and present a gears-level physical model (e.g. the containers are sitting on the trucks because there’s nowhere for them to go) then that’s going to be pretty good at differentiating accurate versus inaccurate models. It’s also going to force you to communicate information about the physical world and how you understand it. To the extent that Ryan’s explanation was inaccurate or simplified, there have been comments that have pointed that out, allowing all involved to build a better physical model. Whereas there’s no other surface area worth attacking. Describing things simply and understandably is a good test for whether you understand them, and are trying to present an accurate model.

Points five and six asymmetrically favor problems worthy of attention over those less worthy.

Point seven invokes a strong general principle. This isn’t foolproof and other things are also happening, but promoting strategies that use strong general principles of physical problem solving over those that don’t seems good.

Point eight is a concrete shovel-ready solution. There are already vast forces favoring easy solutions that are shovel-ready over hard solutions and those that aren’t shovel-ready. If ‘you don’t have a shovel-ready solution so I’m not going to pay you any mind’ that’s a huge obstacle. It isn’t always possible to hill climb your way to victory, and there’s the worry that this will favor ‘do this symbolic gesture’ style calls to action. Yet there are advantages, especially together with ‘concrete physical model’ requirements to this vector, which act against purely symbolic actions. 

Open for Business

A good contrast point is the initiative to ‘keep the ports running 24/7.’ There is a clear and simple bottleneck, time. There are twenty-four hours in a day. Why not operate the port and trucks around the clock? 

The obvious first answer is some combination of ‘lack of qualified workers,’ ‘unwillingness to pay what it would take to hire those workers’ and ‘dock workers unions in California, haha, good luck.’ Clock time is unlikely to be the current bottleneck. 

The LA Times post discussed below notes that many daytime pickup slots are going unused because of other bottlenecks, and nighttime slots where available mostly go unused. So while many terminals aren’t operating at night, there’s little reason to expect 24/7 operations would help much short term. 

Something that naively looks like a shovel-ready simple physical proposal, to keep the port open longer, turns out to be missing gears.

White House officials said “port operators” will be responsible for paying the longshoremen and actually keeping the ports open longer hours.

And also this:

Among them, a pandemic-related surge in demand for durable goods in the United States, an outdated domestic freight and rail system, factory shutdowns in places like China and Vietnam, and a shortage of skilled longshoremen on the West Coast.

There are ways to unlock additional skilled labor, including at odd hours. They involve paying a lot more money than anyone’s yet willing to pay.

Keeping ports open longer so at least some operations happen at night seems reasonable. With time other bottlenecks can respond. Stakes are high, so throw everything you have at the problem. 

My sense is ‘keep the ports open 24/7’ passes some of the Ryan Tests, but fails others, and emerged from an importantly different process where one must Do Something and various people looked for Bold Actions, and found an unusually plausibly useful one. Which can be a bug rather than a feature, since it might mean you want to solve physical problems and thus have questionable motivations and loyalties. Here, it might hit the sweet spot of getting credit for plausibility while being noticeably insufficiently implemented to work. A winner on all fronts, except solving the problem.

Points nine and ten are great. The more non-rivalrous actions are possible, the more of them you raise, and the more useful information you share, the better. I don’t know that they made much difference here, but to the extent they helped this is usefully asymmetric. It’s a good test. If you’re on top of a complex physical situation, you should generally be able to suggest multiple worthwhile actions, because there’s so much low hanging fruit.

Giving a sense of urgency is on the scarier side of the list on this front. It reinforces worries about favoring things that show results in the very short term (e.g. tomorrow, or at most the two-week blame time horizon). The false negatives, where things are too slow or complex to count, are highly unfortunate, but it’s an excellent bullshit filter to avoid false positives. There’s a concrete hypothesis: that this will reduce the number of ships waiting to use the port. Either that happens or it doesn’t. 

A Question of Leadership and Reference Classes

The leadership trick is mostly symmetrical for different actions, but favors action over inaction. Is that good?

That depends on the reference class, so it’s complicated. 

If the reference class is ‘person does thing’, then it’s not only good, it’s great. People don’t do things. They should do more things. Getting person to do a physical useful thing is by default amazingly great. 

If the reference class is ‘lifting a restriction on person doing thing’ then yeah, that might be even better. There’s Chesterson’s Fence to worry about, as the restriction was put there by someone and there might have been a good reason, so you should understand how that happened, but there’s a lot of strong positive selection here. Lifting a random restriction that exists in life is probably not a good idea in isolation, although it wouldn’t shock me if I was wrong about that. Lifting a restriction that doesn’t seem to be physically accomplishing anything and is plausibly a bottleneck to important action, that you actually manage to and choose to change, is another matter. Given all the biases against actions, including the action of lifting the restriction, I’m happy to say that it was probably a good idea on the margin, especially in cases where the market test for costs versus benefits is ‘yes, do that, not remotely close.’ 

Again, this is complicated, and many people have opposite intuitions. And if we introduced a way for people to get any restriction lifted that they don’t like, that can go bad in any number of obvious ways. 

If the reference class is ‘things that look like bold leadership’ then that’s a little less obvious. A lot of very negative things look like bold leadership, but so do most very positive things. One wants to be able to count on system design or various feedback systems to rule out or discourage enough of the sufficiently negative things here.

If the reference class is ‘impose a new restrictive rule’ then that’s not a great reference class and I’d be skeptical of encouraging such things, especially if the rule discourages or prevents action, or tells those taking action they can only do actions that include features we like in ways that make doing the thing less attractive. That tends to go badly. 

My model counts on various systems to be asymmetrical. It thinks things that have big downsides or costs, or wouldn’t work, or don’t come with accurate physical models attached, face more objections and resistance when using these channels. I both believe this to be true now, and I believe it is something we can and need to work to keep true and to make more true going forward. 

The Multilevel Signal Amplifier

It’s worth looking at who amplified the signal. Mainstream sources didn’t notice anything, but my Twitter feed was full of references to the thread, and I picked up on it at least four or five times. It was a lot of people in the types of circles that me and people reading this type of post tend to follow. In particular, it was the exact people who respond to physical models and arguments at least somewhat. I did not amplify it myself, because my bar for amplifying anything on Twitter is super high and I didn’t see this as something that would work, which hopefully I will fix in the future. 

I am confident that many of these particular people were reading the thread critically and thinking for themselves about the problem. These particular amplifiers were most certainly asymmetric, and wouldn’t have cooperated if this hadn’t had its key characteristics, including that the story and solution made sense. 

Earlier this year, I proposed a model of sense-making, information flow and action, which I’ll fully restate here.

  1. There are Level-1 people who are sufficiently above the sanity waterline that they can synthesize information and come up with new models and new solutions and write them up.
  2. Then the other Level-1 people and also the Level-2 people evaluate it and decide whether it’s worth passing along – Level-2 people aren’t good enough to generate wholly new stuff, but are good enough to sort through the proposals and often add important details and refinements, point out mistakes and so on.
  3. Then the Level-3 people look at the results of that Level-2 process, and send the message out more broadly, including to the public.

Number of levels here is arbitrary, but the point remains. The hope is that this could provide an alternative sense-making operation to existing story-generation operations that operate on different principles that don’t primarily aim to track the physical world. 

Within that system, we can model Ryan as a Level-1 player who also presented strategically, coordinating with other high-level players in this alternate informal structure to craft and amplify the signal to maximize the chances of persuasion and action. This still requires using the model-first informal structures of this cluster, and passing its tests requiring things to make sense. The big difference was that it was coordinated and planned, and that it included the people who ‘have day jobs’ and spend most of their time doing stuff rather than spreading signals. Getting them to help is not easy, and very much depends on the contents of the signal. And the fact that this worked and caused change is a data point that these systems can in the right circumstances wield power.

It makes sense both to understand what the required circumstances are and how to make them happen, and then to actually use them when the opportunity arises. 

Long Beach Logistics Dive

This seems to finally be a good write-up of what happened, from the LA Times. It makes it clear that yes, the Twitter thread and the forwarding thereof was causal. The question is how much change was caused.

In Long Beach itself, the message came through loud and clear. Friends had tagged Mayor Robert Garcia on the thread, and he forwarded it to his staff. By the end of the day, the city had lifted the restriction on stacking the 9-foot-tall containers only two high, and now capped it at four.

“Within a few hours we had all decided that we had to make this change immediately,” Garcia said. The relaxed rules are in effect for 90 days, and only apply to businesses that were already zoned for storing containers.

“If you are in a neighborhood, you don’t want to look out your front yard and see five- or six-stacked-high cargo containers. It is a blight and environmental justice issue, no question there,” Garcia said. But with the holiday season and piles of dockside containers looming large, he thought it was worth a shot.

The NIBMYs should not exactly be running scared of what might come next. Also:

The real issue, Schrap says, is just that the shipping companies won’t take their empty containers back: “The ocean carriers need to come sweep out the empties.” Adding insult to injury, the ocean carriers bill truckers a late fee every day for unreturned empty containers, Schrap said, even if they won’t accept them at the port. “It makes you want to pick up your laptop and Frisbee it out into the backyard,” Schrap said. “That’s the frustration running through our veins.”

The impact of this shift, however, is difficult to measure. The change only applies to yards within the city of Long Beach that weren’t already zoned for higher stacking levels, which had long been in effect in the industrial zone closest to the port. More than 240,000 containers are currently waiting on the docks, with an additional 500,000-plus sitting on the ships offshore.

The chief executive of the Harbor Trucking Assn., which represents the trucking companies dealing with these issues every day, says that any little bit helps, but the measure doesn’t change all that much. For one thing, allowing the stacks to climb higher doesn’t guarantee they’ll do so.

Lisa Wan, director of operations at the trucking firm RoadEx America, said that the firm ordered a top loader specifically to be able to move and stack containers months ago — but delivery delays meant it won’t show up until next week.

“We’re lucky we’re getting it at all, so we don’t complain,” Wan said. Once that arrives, the firm will be able to stack 20 to 30 empty containers in its yard, freeing up chassis for more pickups at the ports.

Details in logistics matter. Other changes plausibly matter more than the stacking rule. One thing mentioned is that drop-off windows for empty containers are being widened, which allows trucks to schedule the drop-off first then pick up a full container, without which neither can happen. 

Meanwhile, the Port of Los Angeles is continuing to say no to extra stacking ‘out of concern for residents’ and it seems you can’t make them, or at least no one has succeeded and no one seems to be trying. So in terms of follow-up, we can’t even get this particular free rule changed in one of the two ports in question. Global supply chains are no match for ‘this might not look pretty.’ 

The good news is that we’re seeing other alternative efforts.

But other measures — not mentioned in Petersen’s thread — are going into effect. The Port of Los Angeles and Buscaino’s office are working to open 12 lots in and around the terminals that could store up to 30,000 containers, and the Port of Long Beach has already expanded its temporary storage yards to store 12,000 containers.

‘Working to open’ is not the same as opening, so we’ll see how far that goes and how fast, but it shows momentum behind finding more storage capacity. My model says that focusing attention on the problem of storage capacity, in any form, opens up the possibility of other physical solutions. 

Petersen, for his part, said he was gratified by the response to his tweet thread, even if policymakers are finding different solutions. 

That’s exactly the thing. Peterson directed attention towards a physical problem, and its physical solutions. Despite this, it’s still non-trivial to figure out whether the situation at the ports is improving or not.

Other times, the solution is as simple as ‘highlight an obvious and easy to fix problem and the solution will appear’ and this presumably is a clear example of Using One’s Power For Good, whether or not it was the most efficient option available:

This, while a trivial example, seems like an excellent system for getting at least some $@*t fixed. A lot of the time all it takes is the right person’s attention in the right place, combined with the incentive of others paying attention to whether the job gets done. Not sure how well it scales, but it’s a start, and I predict it would cause people to follow and read Ryan more often and thus enhance the ability to do the big things when it matters, rather than detract, if curation was good. 

No New Taxes

The other mentioned solution was a fee imposed on containers, paid for by shipping companies, that escalates daily until the problem is solved. There are scenarios where this is an excellent idea, and those where it’s a deeply stupid one, so which will it be? Ryan definitely has some thoughts on that, and on some other issues as well. Let’s see what else he’s been up to, starting with his thoughts on the fee. I’ll put his correction at the top for clarity, which changes the timeline but not the core situation.

Ryan’s model of the fee is that it works like this.

  1. Containers are stuck.
  2. Everyone involved would love to unstick the containers.
  3. But they can’t.
  4. City government imposes a fee for not unsticking the containers.
  5. Containers don’t move any faster, because they can’t.
  6. Fee is paid by shipping companies and charged to businesses.
  7. Businesses pass the fee on to customers, raising prices.
  8. Effectively, the cities have imposed a tariff on imports, stealing money from the rest of the country in ways that wreak havoc.
  9. This may be the straw that breaks the camel’s back for some companies, forcing them under.

This is certainly part of the story, but it isn’t obviously the whole story. The core assumption is that companies trying to ship don’t have any options besides giving up or continuing to wait, at least over any time frame that matters, preventing them from responding to the price signal. It also implies that some companies giving up on getting their stuff unloaded would be bad in this situation.

I’d question both of those assumptions, starting with the second one. Suppose we have the ability to unload X ships worth of containers each day, but there is X+Y ships worth of demand for imported goods, so the backlog gets longer each day. You’d want the least valuable goods to give up their slots for the more valuable goods until this balances. By charging a fee on delays, we presumably cause some number of ships with less profitable cargo to give up or reroute.

The second option is to reroute. I don’t know the extent to which this is short-term feasible, and there are certainly large costs involved in redoing all the logistics, but I do see enough people pointing out capacity elsewhere that it seems likely a sufficiently large price incentive would cause some ships to use other ports. We very much want them to use other ports.

The tax scales rapidly with the length of delays, which seems good. If there are long enough delays, it forces the issue, making it more valuable to abandon cargo than to use the port. But there’s basically no scenario where this reduces throughput of the port, because the moment the port is below capacity the tax goes away. 

In general, taxing deadweight losses, and other things one wants to see less of, is a good idea. So I see it potentially working more like this.

  1. Lots of people want to use the port.
  2. But they can’t, not all at once, and things get backed up.
  3. Thus they all sit around.
  4. If you raise the price to use the port when it’s backed up, some people will stop trying to use the port.
  5. Thus making the port easier to use. Efficient allocation of resources.

Which of these two stories dominates is something I don’t know. However there’s definitely an additional problem, which is that the fees are going to the city. That’s a transfer from everyone else to the city, as the port ‘holds up’ the country to capture more of the value of shipping, and this encourages charging fees that are too high, whether or not the correct fee is higher than zero. With 40% of the shipping into America coming through LA and Long Beach, they have a lot of market power here, and once they start using it to make money there is a great temptation not to stop. Thus, it seems imperative to ensure that any fees charged make it to the federal coffers, or at least go towards expanding capacity and functionality at the ports (perhaps in the short term by hiring more small boats and more work crews to make things move faster), rather than staying with the city as profits. 

I’ve also seen the claim that this doesn’t include empty containers due to a technicality, which would be quite the mistake if true. 

I note that I do not see much amplification of the ‘don’t tax us for something out of our control’ proposal. It didn’t pass the tests. I’m amplifying it here because it’s worth thinking about on various levels, but I’m not endorsing the request because I’m not convinced he is right. 

Root Causes

It’s also worth noticing Ryan’s pinned tweet

It goes on a lot longer than that, but you get the idea. It’s all very traditional at this point.

Thus, it is traditional to go back and forth on this question, with various people pointing out in sequence:

  1. How awful is the pursuit of short term metrics that strip all the slack and resilience out of everything.
  2. That there are plenty of long term bets, such as Amazon and Tesla, with plenty of unprofitable companies having super high valuations on the basis of their potentials, and that if anything Wall Street is too long term if you look at it the right way and so is venture.
  3. That this too is short term metrics. That what’s going on is the response to metrics that indicate growth, rather than response to what drives long term success. Which in turn means that long term success is determined by showing those growth metrics, because this allows for vastly better fundraising and thus better long term success, which is a much, much better vector than actually preparing for the future and being resilient and making sure to match physical reality.
  4. The cycle continues.

Mostly I take Ryan’s side in all that. My experience trying to raise funds, and looking at what determines valuations, and in balancing maximizing valuation versus maximizing what’s otherwise good for the business, leads me strongly towards the primary effects being short term thinking and hardcore Goodhart’s Law action to maximize the metrics you expect people to care about and the questions that will come up when people do highly shallow investigations or hold pitch meetings. Yes, sometimes the story in question is one of long term potential and growth, but that doesn’t actually change the mechanisms, it simply changes what type of ‘returns’ on what type of investments are being measured and judged. 

I don’t want to get into comment discussions of that, however, or make an attempt to prove the point at this time, because it’s a super deep rabbit hole with no end, so I’ll endorse the view but not claim to have properly ‘made the case.’ Note of course that this is the opposite of a shovel-ready physical solution to a physical problem. Ryan has no solution, because this is a different kind of problem with no easy solutions, even if he’s right. The best solution I know about is ‘keep founders in control and ensure they don’t need to care about valuations or raising money’ but that only raises further questions, and so on. 

There’s also a need for balance. We should have been more ready than we were, and we should be dealing better with the situation we do have. But being completely ready for everything is overkill. Should we be able to keep up with a hundred year surge in demand?

I’d argue that the answer is pretty much no, we shouldn’t. 

What’s happening now, at a fundamental level, is that we want more stuff and we want it now, and we mostly can make the additional stuff but we can’t ship all the available stuff at once because we’re out of capacity. The good news is that the stuff we want isn’t a life-or-death situation and it wouldn’t be some great tragedy if our amount of imported stuff was the same as it was a few years ago. We want more stuff and we want it now because we have more spending power with which to buy stuff and less alternative ways to spend money and time, so we’d like to have more stuff, but it’s not as if we need this stuff. We want it. Thus, allocation by price is appropriate, and it’s not fun to deal with that, but it’s a cost like any other. 

If you are considering whether to be ready for such a surge in demand, you need to multiply the gains to being prepared by the probability that it happens and compare it to the cost. I’m not at all confident that the companies involved here made any mistakes. The real bottlenecks here are the ports and the trucks. You can hold some extra inventory or what not, but that only buys you so much time and so much profit, and from what I know of trucking (mostly from the Odd Lots podcast) getting more capacity there for the long run doesn’t even make much sense as a concept. So it’s mostly on the ports and maybe the railroads, which is much closer to a public choice problem than a market capital allocation problem.

Bottleneck Tennis

It’s also worth noticing Ryan’s previous Tweetstorm from October 20, when he got the dock workers a Taco Truck. This one seems like it was likely a template for the boat ride, and seems more likely to have been a real source of information. It was also public relations on multiple fronts, but in a good way. 

The biggest insight in that previous storm that didn’t end up in the later storm is about the ports being open late but no one coming in at night, and there not being any way to quickly scale up skilled labor, and there being tons of no-shows by trucks so appointments are ‘full’ but still not many trucks are involved. 

This likely interacts with the problem of empty containers, where trucks make appointments and then can’t keep the appointment because they’re stuck with empty containers they can’t unload. 

This also points to another bottleneck, which is that if you believe the account here, the appointment system seems clearly to be broken. If there are lots of no-shows for appointments with no way to have someone else pick up the slack when they don’t show, then the system requires a redesign, or the incentives involved need to change. I don’t have the expertise to know the right answers, but it does seem like ‘charge a fee when appointments are missed and pay those fees to those who keep their appointments’ would be a good first proposal, and you could go from there. Chances are the important bottlenecks mostly lie elsewhere.

This account from a trucker seems like another great example of concrete physical information about physical problems. It emphasizes the bottleneck nature of the situation, where solving one issue often won’t help much. There are several places listed where capacity will have to increase, and where that would mean much higher labor costs. 

The bottleneck that stuck out to me was getting trucks in and out of the port. The trucks, by this account, need to be in three lines in order to pick up a container, because there’s multiple places with limited throughput, resulting in hours lost on each trip. I’m not Ryan and thus don’t have a concrete path to solving the issue, but it seems like it should be possible to speed up the act of entering or leaving the port by expanding the number of lines available for trucks to go through? And in such situations, once you cross a threshold, there stops being a line, which in turn would dramatically increase the number of trips each truck could make, and allow us to move to the next bottleneck? 

Where To Next?

There are two fronts to consider. There’s the supply chain issues, and there’s the more general project of getting things to happen.

In terms of the supply chain problems, it seems clear that Ryan noticeably improved the situation, but the situation is far from solved. Solving it will be a long term process, and we’ll be playing bottleneck tennis as solving one problem highlights others and makes them worse. 

There’s still lots of low-hanging fruit on the logistics front, starting with Ryan’s change only being implemented in Long Beach and not Los Angeles. There’s also signs of other solutions starting to come online, and that could be helped along in terms of making it shovel-ready and finding the right physical solutions. 

The whole thing makes me want to take up logistics. It’s high stakes, fascinating stuff where there’s high returns for actually solving problems properly. No idea if that’s one of the right lessons, or not. There’s certainly a price that would make it happen.

Then there’s the question of using this method again in the future. I hope I have laid out here a strong case that the methods of amplification used by Ryan can be deliberately invoked again in the future, and that they do a good job of asymmetrically selecting what things to amplify, with useful and accurate messages favored over less useful and less accurate ones. That’s never going to be the whole story, but overall I am optimistic on these fronts. Although, by default, people don’t do things, so there won’t be that many more attempts to use the system, and what attempts there are won’t be optimized as much as Ryan’s was. 

Participating in the amplification system in a way that helps maximize its levels of asymmetry, and that improves the quality of the things you move forward, is available to essentially anyone reading this. You might not be a Level-1 player capable of generating new models and solutions, but if you got this far you are definitely at least a potential Level-3 player who can evaluate the signals being amplified, provide error correction and decide whether to amplify them further, and reward those who are doing higher level model and solution generation and information sharing.

The best thing to do here, of course, is to go out and look for physical problems, find and share information about what’s wrong, and look for possible solutions. Especially good are solutions that pass the Ryan Tests, that are shovel ready, can be implemented by a single authority or other agent, and that would show clear results that can be measured and tested, especially without any serious downsides or losers, but not all of that is required to become worthwhile. 

Then, if there is something that checks enough of the boxes, it’s time to be deliberate, and focus on winning and being persuasive, via the amplification channels and telling a convincing story when the payloads reach their destinations. Which, in turn, builds additional momentum, and the cycle can continue.

The other best thing is to help refine and improve the models and information presented here, but it feels like the default failure mode to focus too much on that and not enough on continuing the work. When in doubt, do more of the work, generate more data and more examples, and learn by doing. 


Austin LW/SSC Meetup: Roots of Progress Crossover

3 ноября, 2021 - 02:12
Published on November 2, 2021 11:12 PM GMT

This time we'll be having a crossover meetup with Roots of Progress, whose posts you may have seen on this site before. Light snacks provided, cash bar.

Time: 1:30 pm, Saturday November 6th. (That's when our reserved area opens but it's just to gather before the brief 2pm talk from Jason Crawford.)

RSVP: Please help with planning by RSVPing at this link.

Background Info: Roots of Progress was started by Jason Crawford to build up a model of the causes of human progress. The ideas have significant overlap with what you might be familiar with from LessWrong and SlateStarCodex. Links:

Jason will kick us off with some brief remarks at about and we'll have time to chat with him and other followers of the site/organization.


To reiterate: The meetup is at The Front Page Pub1023 Springdale Rd (far East Austin, near 183 and Cesar Chavez), NOT our usual location. Please follow posted notices about mask policy, and stay home if you are sick.


What's the difference between newer Atari-playing AI and the older Deepmind one (from 2014)?

2 ноября, 2021 - 23:59
Published on November 2, 2021 8:59 PM GMT

My impression was that thing that put Deepmind on the map was an AI that could play multiple Atari games. Lately there's been new Atari-playing AI (both from Deepmind and other companies) that are making the news. Are they doing basically the same thing 2014 Deepmind was doing but better? Are they doing a fundamentally different thing? Can someone explain the diff like I'm five?


Transcript: "You Should Read HPMOR"

2 ноября, 2021 - 21:20
Published on November 2, 2021 6:20 PM GMT

The following is the script of a talk I gave for some current computer science students at my alma mater, Grinnell College. This talk answers "What do I wish I had known while at Grinnell?".

Hi, I'm Alex Turner. I’m honored to be here under Sam’s invitation. I'm in the class of 2016. I miss Grinnell, but I miss my friends more—enjoy the time you have left together. 

I’m going to give you the advice I would have given Alex2012. For some of you, this advice won’t resonate, and I think that’s OK. People are complicated, and I don’t even know most of you. I don’t pretend to have a magic tip that will benefit everyone here. But if I can make a big difference for one or two of you, I’ll be happy. 

I’m going to state my advice now. It’s going to sound silly. 

You should read a Harry Potter fanfiction called Harry Potter and the Methods of Rationality (HPMOR). 

I’m serious. The intended benefits can be gained in other ways, but HPMOR is the best way I know of. Let me explain.

When I was younger, I was operating under some kind of haze, a veil, distancing me from what I really would care about.

I responded to social customs and pressure, instead of figuring out what is good and right by my own lights, how to make that happen, and then executing. Usually it’s fine to just follow social expectations. But there are key moments in life where it’s important to reason on your own. 

At Grinnell, I exemplified a lot of values I now look down on. I was extremely motivated to do foolish or irrelevant things. I fought bravely for worthless side pursuits. I don’t even like driving, but I thought I wanted a fancy car. I was trapped in my own delusions because I wasn’t thinking properly. 

Why did this happen, and what do I think has changed? 

On Caring

First, I was disconnected from what I would have really cared about upon honest, unflinching reflection. I thought I wanted Impressive Material Things. I thought I wanted a Respectable Life. I didn’t care about the bible, but I brought it with me to my dorm anyways so that I’d be more “wholesome” according to my cultural background. Chasing something someone convinced me to believe I wanted, but which I don’t care about. 

I became motivated to unironically reflect on what is good, how I want the universe to look by the time I’m done with it—to reason about what matters without asking for permission. Not so that you can show how caring you are on social media. But because some things are fucking important. Peace, learning, freedom, health, justice. Human flourishing. Happiness. 

When I inhabit my old ways of thinking about altruism, they evoke guilt and concern: “The world will burn. I have to do my part.” If, however, I’ve discharged my duties by donating and recycling and such, then I no longer feel guilty. But the cruel fact is that no matter what I do, millions of people will die of starvation this year. Due to a coincidence of space and time, none of these people happen to be my brother or sister, my mother or father. None are starving two feet away from me. But who cares if someone starves two feet away, or 42 million feet away—they’re still starving! 

What I’m saying here is subtler than “care a lot.” I’m gesturing at a particular kind of caring. The kind from the assigned essay. Since you all read it, I probably don’t need to explain further, but I will anyways. Some extreme altruists give almost everything they have to charity. It’s natural to assume they have stronger “caring” feelings than you do, but that may not be true. 

The truth is that I am biologically incapable of caring as much as 9 million x (how much I would care if my brother starved). My internal “caring system” doesn’t go up that many decibels, it just silently throws an emotion overflow error. Does that mean I can’t, or don't want to, dedicate my life to altruism? No. It means I ignore my uncalibrated emotions, that I do some math and science to estimate how I can make the biggest difference, and then do that. 

What does this have to do with Harry Potter? HPMOR made me realize I should care in this way. HPMOR let me experience the point of view of someone intelligently optimizing the world to be a better, more moral place. HPMOR let me look through the eyes of someone who deeply cares about the world and who tries to do the most good that they can. The experience counts.

You’ll notice that CS-151 doesn’t start off with a category theoretic-motivation of functional programming in Scheme, with armchair theorizing about loop invariants, parametric polymorphism, and time complexity. There are labs. You experience it yourself. That’s how the beauty of computer science sticks to you

HPMOR is the closest thing I know to a lived experience of gut-level caring about hammering the world into better shape

On Foolishness

Second, in 2016, I was enthusiastic, optimistic, and hard-working. I was willing to swim against social convention. I was also foolish.

By “foolish”, I don’t quite mean “I did pointless things.” I mean: “My cognitive algorithm was not very good, and so I did pointless things.” By analogy, suppose it’s 2026, and you’re doing research with the assistance of a machine learning model. Given a hypothesis, the model goes off and finds evidence. But suppose that the model anchors on the first evidence it finds: Some study supports the idea, and then the model selectively looks for more evidence for its existing beliefs! Wouldn’t this just be so annoying and stupid

In some parts of your life, you are like this. Yes, you. Our brains regularly make embarrassing, biased mistakes. For example, I stayed in a relationship for a year too long because I was not honest with myself about how I felt. 

In 2014, I scrolled past a News Feed article in which Elon Musk worried about extinction from AI. I rolled my eyes—“Elon, AI is great, you have no idea what you’re talking about.” And so I kept scrolling. (If someone made a biopic about me, this is where the canned laugh track would play.) 

The mistake was that I had a strong, knee-jerk opinion about something I’d never even thought about. In 2018, I reconsidered the topic. I ignored the news articles and sought out the best arguments from each side of the debate. I concluded that my first impression was totally, confidently wrong. What an easy way to waste four years. I’m now finishing my dissertation on reducing extinction risk from AI, publishing papers in top AI conferences. 

My cognitive algorithm was not that great, and so I made many costly mistakes. Now I make fewer. 

What, pray tell, does this have to do with Harry Potter? HPMOR channels someone who tries to improve their thinking with the power and insight granted by behavioral economics and cognitive psychology, all in pursuit of worthy goals. The book gave me a sense that more is possible, in a way that seems hard to pick up from a textbook. (I took cog-psych classes at Grinnell. They were evidently insufficient for this purpose: I didn’t even realize that I should try to do better!)

HPMOR demonstrates altruistic viciousness: How can I make the future as bright as possible? How can I make the best out of my current situation? What kinds of thinking help me arrive at the truth as quickly as possible? What do I think I know, and why do I think I know it? What would reality look like if my most cherished beliefs were wrong?

In the real world, we may stand at the relative beginning of a bright and long human history. But to ensure that humanity has a future, to make things go right—that may require finding the truth as quickly as possible. That may require clever schemes for doing the most good we can, whatever we can. That may require altruistic viciousness. (See: the Effective Altruism movement.)

Taken together, caring deeply about maximizing human fulfillment and improving my cognitive algorithms changed my life. I don’t know if this particular book will have this particular effect on you. For example, you might not be primarily altruistically motivated on reflection. That’s fine. I think you may still selfishly benefit from this viewpoint and skillset. 

HPMOR isn’t the only way to win these benefits. But I think it’s quite good for some people, which should make it worth your time to try 5–10 chapters. I hope you benefit as much as I did. 

You can find the book at www.hpmor.com (I recommend the PDF version). You can find the unofficial, very good podcast reading on Spotify. You can find me at turneale@oregonstate.edu


Why we need prosocial agents

2 ноября, 2021 - 19:22
Published on November 2, 2021 3:19 PM GMT

Context: I recently submitted an application to the Open Philanthropy AI Fellowship, and ended up thinking a lot about why studying Multi-Agent Interactions is important. 

To build machine-learning systems (agents) that are useful in the real world, they need to be able to cooperate with each other and with humans, not only at deployment but throughout their lifetime. I predicate my research on this and the following beliefs:

Multi-agent deployment: Transformative AI is unlikely to appear in isolation, but will instead be developed by multiple competing actors, giving rise to a large population of capable agents [1, 2].

Multi-agent training: Transformative AI is likely to be produced by an automatic curriculum, and one promising approach for this is multi-agent curricula [3].

Lifetime learning: Agents are not only trained then deployed in the real-world, but are continually trained afterwards. This training will likely be decentralised and within a population of competing agents, as such, the boundary between training and deployment will become less clear.

This creates the following concerns:

It is difficult to deploy cooperative agents: Apriori, agents that are cooperative or altruistic are vulnerable to being exploited by more selfish agents. Thus, agents successful in competitive markets are likely to be selfishly motivated. Large populations of selfish agents easily lead to catastrophic events, for example large market failures [4] or resource exhaustion [5].

It is difficult (post-deployment) to train cooperative agents: Agents that are trained in a population of selfish agents will be unable to develop cooperative strategies [6]. This restricts our ability to create cooperative agents over time, thus making selfish (misaligned) AI more likely.

It is difficult to de-risk multi-agent systems: Transformative AI trained by multi-agent interactions are not only shaped by their reward function but the interactions with other agents in their population [7]. System failure cannot be attributed to a single-agent, thus work on (single agent) interpretability or reward modelling is likely insufficient to de-risk these interactions [8].

To address the risk from these concerns, we need to:

Develop prosocial agents: Altruistic agents do not survive in competitive markets whilst the behaviour of selfish agents leads to cooperation failure [9]. Thus we require prosocial agents - those which actively seek (cooperative) optimal policies whilst being intolerant of exploitation. This behaviour mitigates the aforementioned concerns whilst sustaining the mechanisms of a competitive market.

Ensure multi-agent curricula incentivise prosociality: We need to guarantee that weak AI retains its prosociality when trained post-deployment. Thus to ensure training still produces aligned AI, we need to build our understanding of these multi-agent systems, and build methods that continually incentivise prosociality.

[1] https://www.alignmentforum.org/posts/dSAJdi99XmqftqXXq/eight-claims-about-multi-agent-agi-safety
 [2] Critch, Andrew, and David Krueger. “AI Research Considerations for Human Existential Safety (ARCHES).” arXiv preprint arXiv:2006.04948 (2020).
 [3] Leibo, Joel Z., et al. “Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research.” arXiv preprint arXiv:1903.00742 (2019). 
[4] https://en.wikipedia.org/wiki/2010_flash_crash 
[5] https://en.wikipedia.org/wiki/Resource_depletion 
[6] Axelrod, Robert, and William Donald Hamilton. “The evolution of cooperation.” science 211.4489 (1981): 1390-1396. [7]https://www.alignmentforum.org/posts/BXMCgpktdiawT3K5v/multi-agent-safety
[8] Yudkowsky, Eliezer. Inadequate equilibria: Where and how civilizations get stuck. Machine Intelligence Research Institute, 2017. 
[9] https://longtermrisk.org/research-agenda


Forecasting Newsletter: October 2021.

2 ноября, 2021 - 17:07
Published on November 2, 2021 2:07 PM GMT

  • Prediction Markets & Forecasting Platforms
  • Job Board
  • In the News
  • Blog Posts
  • Long Content

You can sign up for this newsletter on substack, or browse past newsletters here. If you have a content suggestion or want to reach out, you can leave a comment or find me on Twitter.

Prediction Markets & Forecasting PlatformsPolymarket

Polymarket is being investigated by the US Commodity Futures Trading Commission (a). Back in 2008, several Nobel Prize winners and other academics called on US agencies to clarify and establish regulations on prediction markets (a). So at this point, I feel it's on the US Securities Exchange Commission and the Commodity Futures Trading Commission to have regulated this sooner, not on Polymarket not to have acted somewhat unilaterally.

Otherwise, Polymarket has continued to make more incremental usability improvements which make it more convenient for users to trade, add and withdraw funds from the site.


"nobody freak out but now you can asymmetrically adjust the tails of your logistic curves on continuous input questions on Metaculus" — @casens

Since March 2020, Metaculus has provided forecasting and modeling resources to public health professionals as they've made crucial decisions in tracking and combating COVID-19. These efforts include the ongoing Keep Virginia Safe and the recently concluded Virginia Lightning Round forecasting tournaments, which were developed in partnership with the Virginia Department of Health and the University of Virginia Biocomplexity Institute, and were designed to enhance COVID-19 modeling efforts while contributing to the ongoing public health policy conversation in Virginia.

I feel that there is a disconnect between the "crucial decisions" and the very small prize pool of $1,000 given to forecasters (a). The moment that Metaculus' questions influence public health policy in Virginia at all, their Department of Health's willingness to pay should go through the roof.

In collaboration with Rethink Priorities (and with Michael Aird in particular), Metaculus has started short (a) and long-term (a) nuclear risk tournaments.

Good Judgment

The best comments on Good Judgment Open for October, as curated by myself, were:

  • Kogo (a) mentions that neither Europe nor China have the incentive to push for their Comprehensive Agreement on Investment in the current political standoff.
  • borisn outlines why the Democrats will most likely loose the House of Representatives (a) but retain control of the Senate (a), based on historical frequencies.
  • dada (a) wonders why the Good Judgment Open crowd assigns such a low probability to Iran and the US rejoining the JCPOA by the end of this year.
  • TerrySmith (a) mentions that Good Judgment Open seems to have beaten PredictIt on the 2020 election forecasts, so looking at betting odds might not be that informative for the Good Judgment Open crowd when discussing the upcoming election in Virginia.
  • belikewater (a) mentions a declassified intelligence report on COVID (a) which turned out to be inconclusive.

Good Judgment Open begins the In The News (a) and Dubai Future Experts (a) challenges.

Odds and ends

Futuur (a) is a prediction markets platform that recently came into my radar. They allow Americans to participate with a play-money currency, and the rest of the world to trade in dollars and in a variety of cryptocurrencies. They seem legitimate, and have been running a version of their site since 2017. But I would advise some caution: because all of their crypto-currencies go into the same pool, Futuur could be on the hook if the price of any one cryptocurrency moves too much too fast.

Reddit expands their prediction functionality (a) (original sources: 1 (a), 2 (a)). I've also become aware of a prediction community of Reddit at r/Predictor (a). With 14.4k members on that community alone, Reddit might just have become one of the biggest prediction platforms around, almost without even trying. h/t @marshallk (a).

Augur continues to focus on sports betting on Polygon with Augur Turbo, and saw upwards of $1M in trading volume, most likely because of the influx of subsidized liquidity (a).

CSET-Foretell continues to make progress (a) on their campaign around the future of the relationship between the US Department of Defense and US tech companies (a). They will have an event discussing their preliminary findings on the 10th of November (a).

Hedgehog Markets concluded their first two competitions, and will launch new ones, still focused on crypto and sports. Kalshi has also made a few improvements on their desktop webpage, and added some range markets, such as this one (a) on Chicago temperatures.

Forecasting Job Board.

The Perry World House (PWH) team at the University of Pennsylvania is looking for a Program Manager for the Future of the Global Order (a) because the current holder of the position is joining Founders Pledge. PWH has been "doing lots of work on implementing probabilistic forecasting methods in the U.S. government, and the person taking this job would likely continue work on those issues". One particularly high-quality piece of work by PWH previously mentioned in this newsletter was Keeping Score: A New Approach to Geopolitical Forecasting (a).

The Global Priorities Institute is looking for a Research Assistant (a) to aid its investigation into making forecasting a core research area. They are offering £17.48 ($23.85, 20.62€) per hour.

Metaculus is searching for "analytical storytellers" (a) on a rolling basis, paying around $0.3 per word ("essays are compensated at $300 each and at $25 per forecast question, with additional compensation awarded for especially high-quality essays attracting a significant readership")

North Dakota is looking for economic forecasting consultants (a). Although the offer seems to be aimed at individual consultants, I feel that it would also be interesting for forecasting platforms/prediction markets to apply.

Blog Posts

Charles Dillon of Rethink Priorities and SimonM look at How does forecast quantity impact forecast quality on Metaculus? (a). More forecasters increase forecast quality, but the effect is small beyond 10 or so forecasters.

Forecasting performance as a function of the number of predictors, by SimonM using Metaculus data.

Forecasting performance as a function of the number of predictors, by SimonM using Metaculus data.

One possible driver of this effect could be Metaculus allowing up to 10 forecasters to meaningfully coordinate in the public comments section, but not much beyond that.

David Friedman looks at whether the past IPCC temperature projections/predictions have been accurate? (a)

The predictions look better now than they did in 2014, high three times out of four, low once, and only once has actual warming been below the predicted range. They are still running a little high but the results look consistent with random error. That makes it at least possible that the IPCC researchers are now modeling the climate system well enough to produce reasonable estimates of its future behavior.

Jaime Sevilla writes about his current best guess on how to aggregate forecasts (a):

In Learning from our (the USA's) defeat (a) Tanner Greer of The Scholar's Stage looks at the leadership team of the second Bush's administration. It seems very much worth reading in terms of improving one's models of the world.

A Salesforce blogpost (a) advertises the wonders of cloud-based enterprise resource planning solutions (such as Salesforce itself.) Nonetheless, they still know what they are talking about.

In the News

Quartz covers the shutdown of Facebook's Forecast (a) in more depth.

FiveThirtyEight (a) writes a data-driven analysis of Biden's approval ratings.

US car sales are expected to plummet (a) due to chip shortage. I keep seeing this term "chip shortage", but it seems to me that this is more of a "supply chain mismanagement" issue because chips alone can't really make up that high a relative proportion of a vehicle's prize. Not also that Tesla doesn't seem to be affected by this shortage.

Warmer-than-normal temperatures could help save Americans on home heating costs, which could be elevated this year due to high energy prices (a), reports CNN. I find this curious because I'd expect CNN to avoid mentioning anything that could suggest that climate change is not unalloyedly negative. Still, the article doesn't mention the impact of Biden on energy prices, nor climate change directly.

Sephora, a beauty products brand, integrates AI more into its forecasting and replenishment software (a). To be clear, this is just business as normal, but it still feels like the kind of thing which is more likely in a world with short AI timelines (a).

Long Content

Issues with Futarchy (a) compiles possible failure modes with a governance model proposed by Robin Hanson where decisions would be made based on prediction markets.

Note: Due to EA Global (a), I now have a backlog of forecasting effort posts (a) posted during October. They will be incorporated into the next edition of this newsletter.

Note to the future: All links are added automatically to the Internet Archive, using this tool (a). "(a)" for archived links was inspired by Milan Griffes (a), Andrew Zuckerman (a), and Alexey Guzey (a).

Are you Alex Lawsen? The Alex Lawsen?

— Anonymous, EA Global 2021.


Vaccine Requirements, Age, and Fairness

2 ноября, 2021 - 15:10
Published on November 2, 2021 12:10 PM GMT

In talking about potentially resuming contra dancing, with a vaccination requirement, one reaction I've received from several people is that it's not fair to start until everyone can get vaccinated. The idea is that since some people are younger than the current limit, or have children who are younger, it would not be appropriately inclusive to resume.

I do really like that contra dancing is so open. I started dancing at NEFFA when I was very little, and have brought my kids with me to all sorts of dances. They've danced at family dances and in a carrier at regular dances, and come with me when organizing local dances, playing dance weekends, and on tour. Including children in our community is very important to me.

On the other hand, if this means we can't hold dances until all ages can get vaccinated, we're going to be waiting a long time. While we are now frustratingly close to allowing vaccination for kids 5-11 (FDA approved last week, CDC is meeting today), the 2-4 cohort is not expected until 2022 at the earliest, with younger children even later. Which means for the next few months at minimum, if we are going to require vaccination (no one has been pushing for allowing unvaccinated children), the choice is between a dance for adults and older children, or no dance.

The feedback from both the spontaneous Porchfest dance and the outdoor contra was really positive: this is something a lot of people have really been missing. While I would much prefer it if all ages could safely be included, given a choice between (a) dances for ~98% of regular participants now and everyone later and (b) nothing now, everyone later, I think we should go with (a).

(Asking other parents (n=5), I haven't found anyone who thinks dances need to hold off for age fairness reasons; I've only heard this from non-parents (n=4). I then asked Lily (7y) and Anna (5y) what they thought, and they both said that until everyone could get vaccinated it wasn't fair to resume, even if that takes a very long time. I asked what they would think if they could get vaccinated but Nora (4m) still couldn't, and Anna said yes while Lily said she couldn't decide.)

Comment via: facebook


Models Modeling Models

2 ноября, 2021 - 10:08
Published on November 2, 2021 7:08 AM GMT

I - Meanings

Now that we have some more concrete thinking under our belt, it's time to circle back on Goodhart's law for value learners. What sorts of bad behavior are we imagining from future value-learning AI? What makes those behaviors plausible, and what makes them bad?

Let's start with that last point first. Judgments of goodness or badness get contextualized by models, so our framing of Goodhart's law depends on what models of humans we tolerate. When I say "I like dancing," this is a different use of the word 'like,' backed by a different model of myself, than when I say "I like tasting sugar." The model that omces to mind for dancing treats it as one of the chunks of my day, like "playing computer games" or "taking the bus." I can know what state I'm in (the inference function of the model) based on seeing and hearing short scenes. Meanwhile, my model that has the taste of sugar in it has states like "feeling sandpaper" or "stretching my back." States are more like short-term sensations, and the described world is tightly focused on my body and the things touching it.

Other models work too! That's fine, there's plenty to go around.

The meta-model that talks about me having preferences in both of these models is the framing of competent preferences. If someone or something is observing humans, it looks for human preferences by seeing what the preferences are in "agent-shaped" models are powerful for their size. (At least, up to some finite amount of shuffling that's like a choice of prior or universal Turing machine. Also note that the details of the definition of "agent-shaped" matter. I choose a definition where it's okay to have models that have limited domains of validity, which means you can get conflicts between preferences with partially overlapping domains of validity.)

So when we call certain behavior "bad," the usage of that word might carry with it the implication of what way of thinking about the world that judgment is situated in, like how "I like dancing" makes sense when situated in a model of chunks of my day. There's not one True Model in which the True Meaning of the word "bad" is expressed, though there can still be regularities among the different notions of badness.

II - Mergers

What were the patterns that have stood out from our previous discussions of what humans think of as bad behavior in value learning?

The most common type of failure, especially in modern day AI, is when humans are actively wrong about what's going to happen. They have something specific in mind when designing an AI, like training a boat to win the race, but then they run it and don't get what they wanted. The boat crashes and is on fire. We could make the boat racing game more of a value learning problem by training on human demonstrations rather than the score, and crashing and being on fire would still be bad behavior.

For simple systems where humans are good at understanding the state space and picturing what they want, this is all you need, but for more complicated systems (e.g. our galaxy) humans can only understand small parts or simple properties of the whole system, and we apply our preferences to those parts we can understand. From the inside, it can be hard to feel the distinction! We can want things about tic-tac-toe or about the galaxy with the same set of emotions. What makes deciding what to do with the galaxy different is that we have these scattered preferences about different parts and patterns, and the different parts don't stay neatly separate from each other. They can interact or overlap in ways that bring our preferences into conflict.

This is a key point. Inter-preference conflicts aren't an issue that ever comes up if you think of humans as having a utility function, but they're almost unavoidable if you think of humans as a physical systems with different possible models. The nail in the coffin is that us humans can't fit the whole galaxy into our heads, nor could evolution fit it into our genes, and so out of necessity we have to use simple heuristics that work well pragmatically but can come into conflict. If humans don't resolve their preference conflicts ideally, this can lead to bad behavior like thinking the grass is always greener on the other side of the decision tree.

Bad preference aggregation can also lead to new-ish bad behavior on the part of a value learner. This bad behavior can look like encountering a situation where humans are conflicted or inconsistent, and then resolving that conflict using a method that humans don't agree with. An AI that resolves every deep and thorny moral dilemma by picking whichever answer leads to the most paperclips seems bad, even if it's hard to point out what goes wrong on the object level.

That's an extreme example, though. A value learner can fail at resolving preference conflicts even in cases where the right choice seems obvious to humans. If I like dancing, and I like tasting sugar, it might seem obvious to me that what I shouldn't do is never go dancing so that I can stay at home and continually eat sugar. The line between different sorts of bad behavior is blurry here. The obviousness that I shouldn't become a sugar-hermit can be thought of either as me doing preference aggregation between preferences for tasting sugar and dancing, or as an object-level preference in a slightly more complicated model of my states and actions. We want both perspectives to give similar results.

What this illuminates is that humans have meta-preferences: preferences about how we should be modeled.  These preferences are inferred from humans' words and actions, just like other preferences. On one hand, like other preferences, they're necessarily simple and can come into conflict, making our lives harder. On the other hand, like other preferences, their limited scope allows us some wiggle room in terms of satisfying them, making our lives easier.

Unfortunately, we can't dive too deep into how preference aggregation should be done, here. It's very hard, and I don't know the solution, and also it's somewhat outside the scope of this post. Just to give a taste, problems arise when we want to compare preferences in different ontologies. As with the dancing vs. sugar example, we could do this comparison by cashing out both models into one more fine-grained model. But it's not okay to just treat the more fine-grained model on its own terms and use it to fit human preferences from their behavior. It comes back to meta-preferences; I don't want to be modeled in the most fine-grained way. That would lead to unpalatable positions like "whatever the human did, that's what they wanted" or "the human wants to follow the laws of physics." This reflects that resolving conflicts across ontologies can't be done by looking for which is "correct," we have to face head-on the problem of translation, and resolve conflicts using meta-preferential principles like "fairness."

One further complication of trying to incorporate meta-preferences into modeling humans is that if how you balance preferences depends on your preferences, where you end up is going to depend on where you started. This can lead to certain problems (Stuart), and we might want to better understand this process and make sure it leads somewhere sensible (me).  However, some amount of this dynamic is essential - for starters, picking out humans as the things whose values we want to learn (rather than e.g. evolution) and insisting that human actions are at least a little bit correlated with our preferences have exactly the type signature of meta-preference. Learning human meta-preferences can push you around in meta-preference-space, but you've still got to start somewhere.

How does all this connect back to Goodhart? I propose that a lot of the feeling of unease when considering trusting value learning schemes reliant on human modeling, a lot of this feeling that small perturbations might lead to bad things happening, is because we don't think they're satisfying our meta-preferences. Without satisfactory application of meta-preferences, it seems like getting what we want out of a value learner would be a fragile shot in the dark, where deviations that seem "small" in units of bits of random noise might have a large impact on how good the future is. If you squint: Goodhart's law.

III - Motives

If human preferences live in simplified models of the world, this raises an obvious question: should we only trust these preferences within the domains of validity of those models? Does this mean that the really good futures lie within the domain of validity of our preferences?

Long story short? Yes.

The rest of this section is the long story long.

What's a domain of validity, anyhow? One way it can be is that the domain of validity comes bundled with the model of the world. This is like Newtonian mechanics coming with a disclaimer on it saying "not valid above 0.1 c." This way keeps things nice and simple for our limited brains. But there's another way that's even better to reason about (but impractical for human use), which is that we could have a plethora of different models of the world, and where they broadly agree we call it a "domain of validity," and as they agree less, we trust them less. When I talk about individual preferences having a domain of validity, we can translate this to there being many similar models that use variations on this preference, and there's some domain where they more or less agree, but as you leave that domain they start disagreeing more and more.

One more wrinkle is that our models in this case have two outputs: they make predictions about the world, and they also contain inferences about human values. Sometimes they can agree about predictions but disagree about values, or vice versa. Which domain of validity do we care about - predictions or preferences?

Turns out it's basically always preferences. Imagine I get dumped out the airlock of a spaceship into hard vacuum. Very quickly, modeling me as a person is going to stop making useful predictions (e.g. about my future motion), and it will be more pragmatic to model me as a bag of wet meat - vacuum is outside the predictive domain of validity of many person-level models of me. But my preferences about getting dumped out the airlock have no such problem - the models that predict me in day-to-day life all tend to agree that it's bad.

This is a strong intuition pump for using the preferential domain of validity when aggregating, and not worrying too hard about predictive accuracy. This requires our magical cross-model preference translator, but we've already assumed that into existence anyhow. In the reverse case, where there are models that are equally good at predicting our actions, and equally satisfy meta-preferences, but are put in a situation where they disagree about which of our internal psychological states are "preferences," it also seems reasonable that we care about the preferential domain of validity.

What would ever incentivize a person or AI to leave the domain of validity of our preferences? Imagine you're trying to predict the optimal meal, and you make 10 different models of your preferences about food. If nine of these models think a meal would be a 2/10, and the last model thinks a meal would be a 1,000/10, you'd probably be pretty tempted to try that meal anyway, right? Even if these models agree on all everyday cases, and even if they make identical predictions about your experiences during and after the meal. Even if the meal is Hades' pomegranate and all your other models are telling you you'll regret it.

This becomes a question about how you're aggregating models. Avoiding going outside the domain of validity looks like using an aggregation function that puts more weight on the pessimistic answers than the optimistic ones, even if the optimistic ones have been specially selected for being really optimistic. We've circled back to meta-preferences again; I don't want one of my preferences or one way of modeling me to be super-duper-satisfied at the expense of all others. This is in (non-fatal) tension with what we invoked meta-preferences for in part II, which is that there are some preferences and some ways of modeling me that I prefer to others.

This ties back into the fact that there is not One True way of modeling ourselves but we just don't know which it is. Such an epistemic position on models of human preferences necessitates certain rules for preference aggregation. (For example, if you treat extreme values in some some domain as evidence against a model being the True model, then this evidence would equally affect whether we trust that model in other situations.) Because we aren't just trying to figure out which model of us is the One True model, it's okay to violate such simple rules in our preference aggregation.

IV - Methods

As a final section, let's circle back to some of the arguments from Goodhart Taxonomy and see how they're holding up in a framing where we have to compare models to other models, rather than to the True utility function.

The different types of Goodhart in that post are different reasons why a small perturbation of the proxy is likely to lead to large divergences of score according to the True Values. We can make a fairly illuminating transformation of these arguments by replacing "proxy" with "one model," and "True utility function" with "other plausible models." In this view, Goodhart processes drive apparently-similar models into disagreement with each other.

  • Extremal Goodhart: 

    • Old style: When optimizing for some proxy for value, worlds in which that proxy takes an extreme value are probably very different (drawn from a different distribution) than the everyday world in which the relationship between the proxy and true value was inferred, and this big change can magnify any discrepancies between the proxy and the true values.
    • New style: When optimizing for one model of human preferences, worlds in which that model takes an extreme value are probably very different than the everyday world from which that model was inferred, and this big change can magnify any discrepancies between similar models that used to agree with each other. Lots of model disagreement often signals to us that the validity of the preferences is breaking down, and we have a meta-preference to avoid this.
    • This transformation works very neatly for Extremal Goodhart, so I took the liberty of ordering it first in the list.
  • Regressional Goodhart:

    • Old style: If you select for high value of a proxy, you select not just for signal but also for noise. You'll predictably get a worse outcome than the naive estimate, and if there are some parts of the domain that have more noise without totally tanking the signal, the maximum value of the proxy is more likely to be there.
    • New style: If you select for high value according to one model of humans, you select not just for the component that agrees with the average model, but also the component that disagrees. Other models will predictably value your choice less then the model you're optimizing, and if there are some parts of the domain that tend to drive this model's estimates apart from the others' without totally tanking the average value, the maximum value is more likely to be there.

      Also, if you average all your models together and select for high average value you can still treat model disagreement like noise when it lacks obvious correlations. If there's some region where your models disagree with each other, a lot, uncorrelated, the maximum average value will more likely be in that region. As with Extremal, we would rather not go to the part of phase space where the models of us all disagree with each other.
    • My addition of the variance-seeking pressure under the umbrella of Regressional Goodhart really highlights the similarities between it and Extremal Goodhart. Both are simplifications of the same overarching math, it's just that in the Regressional case we're doing even more simplification (requiring the specific case where there's a noise term with nice properties), allowing for a more specific picture of the optimization process.
  • Causal Goodhart:

    • Old style: If we pick a proxy to optimize that's correlated with True Value but not sufficient to cause it, then there might be appealing ways to intervene on the proxy that don't intervene on what we truly want.
    • New style: If we have modeled preferences that are correlated but one is actually the causal descendant of the other, then there might be appealing ways to intervene on the descendant preference that don't intervene on the ancestor preference.

      There's a related potential issue when we have modeled preference that are coarse-grainings or fine-grainings of each other. There can be ways to intervene on the fine-grained model that don't intervene on the coarse-grained model.

These translated Goodhart arguments all make the same change, which is to replace failures according to particular True Values with unstable or undefined behavior. As Stuart Armstrong would put it, Goodhart's law is model splintering for values.

Although this change may seem boring or otiose, I think it's actually a huge opportunity. In the first post I complained that the naive framing of Goodhart's law didn't admit of solutions - now, this new-style framing changes something crucial. When comparing a model to the True Values, we didn't know the True Values. But when comparing models to other models, nothing there is unknowable!

In the next and final post, the plan is to tidy this claim up a bit, see how it applies to various proposals for beating Goodhart's law for value learning, and zoom out to talk about the bigger picture for at least a whole paragraph.


Can homo-sapiens sustain an economy parallel to AI's?

2 ноября, 2021 - 10:03
Published on November 2, 2021 7:03 AM GMT

In a world where taxis and trucks drive themselves, code writes itself etc. there's several ways humans can fit into such picture.
Perhaps they'll get rich by owning shares, perhaps UBI will be introduced, perhaps they'll move to jobs higher-in-abstraction-ladder using AI as tools, perhaps we'll just serve as testers/aligners/goal-setters.
But there's a possibility we'll have nothing to offer to such "ascended economy", or perhaps big part of a society will not be needed, while some other small part will benefit (Elyzium-style).

(I'm assuming the "progressive" part of the world is not hostile, just doesn't need the "traditional" part - in particular they respect their property rights)

Assuming this later scenario: is it possible that "traditional people" form some kind of closed system in which they still trade with each other, ignoring the outside progress?
In particular, would it require some strong coordination between them to refrain from buying/selling to the outside, or would it rather be the most natural and selfish thing for them to do try to trade locally?

Has it ever happened in the history, that a tribe has successfully walled off (without an active help from the outside to protect their customs)?


[Book Review] "The Bell Curve" by Charles Murray

2 ноября, 2021 - 08:49
Published on November 2, 2021 5:49 AM GMT

Factor analysis is a mathematical method of inferring simple correlations between observations. It's the foundation of the Big Five personality traits. It's also behind how we define intelligence.

A person's ability to perform one cognitive task is positively correlated with basically every other cognitive task. If you collect a variety of cognitive measures you can use linear algebra to extract a single measure which we call g.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} . Intelligence quotient (IQ) is a test specifically designed to measure g. IQ isn't a perfect measure of g but it's convenient and robust.

Here are six conclusions regarding tests of cognitive ability, drawn from the classical tradition, that are now beyond significant technical dispute:

  1. There is such thing as a general factor of cognitive ability on which human beings differ.

  2. All standardized tests of academic aptitude or achievement measure this general factor to some degree, but IQ tests expressly designed for that purpose measure it most accurately.

  3. IQ scores match, to a first degree, whatever it is that ordinary people mean when they use the word intelligent or smart in ordinary language.

  4. IQ scores are stable, although not perfectly so, over much of a person's life.

  5. Properly administered IQ tests are not demonstrably biased against social, economic, ethnic, or racial groups.

  6. Cognitive ability is substantially heritable, apparently no less than 40 percent and no more than 80 percent.

Charles Murray doesn't bother proving the above points. These facts are well established among scientists. Instead, The Bell Curve: Intelligence and Class Structire in American Life is about what g means to American society.

Stratification Educational Stratification

Smarter people have always had an advantage. The people who go to college have always been smarter than average. The correlation between college and intelligence increased after WWII. Charles Murray argues that the competitive advantage of intelligence is magnified in a technological society. I agree that this has been the case so far and that the trend has continued between 1994 when Murray published his book and 2021 when I am writing this review.

SAT scores can be mapped to IQ. The entering class of Harvard in 1926 had a mean IQ of about 117. IQ is defined to have an average of 100 and a standard deviation of 15. Harvard in 1926 thus hovered around the 88th percentile of the nation's youths. Other colleges got similar scores. The average Pennsylvania college was lower with an IQ of 107 (68th percentile). Elite Pennsylvania colleges had students between the 75th and 90th percentiles.

By 1964, the average student of a Pennsylvania college had an IQ in the 89th percentile. Elite colleges' average freshmen were in the 99th percentile.

Charles Murray uses a measure called median overlap to quantify social stratification. Median overlap indicates what proportion of IQ scores the lower-scoring group matched or exceeded the median score in the higher-scoring group. Two identical groups would have a median overlap of 50%.

Groups Being Compared Median Overlap High school graduates with college graduates 7% High school graduates with Ph.D.s, M.D.s, or LL.B.s 1% College graduates with Ph.D.s, M.D.s, and LL.B.s 21%

College graduates are not representative of the population. If most of your social circle is (or will be) a college graduate then your social circle is smarter than the population mean.

The national percentage of 18-year-olds with the ability to get a score of 700 or above on the SAT-Verbal test is in the vicinity of one in three hundred. Think about the consequences when about half of these students are going to universities in which 17 percent of their classmates also had SAT-Vs in the 700s and another 48 percent had scores in the 600s. It is difficult to exaggerate how different the elite college population is from the population at large—first in its level of intellectual talent, and correlatively in its outlook on society, politics, ethics, religion, and all the other domains in which intellectuals, especially intellectuals concentrated into communities, tend to develop their own conventional wisdoms.

Occupational Stratification

You can arrange jobs by their relative status. Job status tends to run in families. This could be because of social forces or it could be because of heritable g. We can test which hypothesis is true via an adoptive twin study. A study in Denmark tracked several hundred men and women adopted before they were one year adoptive. "In adulthood, they were compared with both their biological siblings and their adoptive siblings, the idea being to see whether common genes or common home life determined where they landed on the occupational ladder. The biologically related siblings resembled each other in job status, even though they grew up in different homes. And among them, the full siblings had more similar job status than the half siblings. Meanwhile, adoptive siblings were not significantly correlated with each other in job status."

High-status jobs have become much more cognitively demanding over the last hundred years. Charles Murray uses a bunch of data to prove this. I'll skip over his data because the claim it's so obviously to someone living in the Internet age. Even being an marketer is complicated these days.

Credentialism is a real thing. Could it be that IQ causes education which causes high status jobs but cognitive ability doesn't actually increase job performance? Or does sheer intellectual horsepower have market value? We have data to answer this question.

The most comprehensive modern surveys of the use of tests for hiring, promotion, and licensing, in civilian, military, private, and government occupations, repeatedly point to three conclusions about worker performance, as follows.

  1. Job training and job performance in many common occupations are well predicted by any broadly based test of intelligence, as compared to narrower tests more specifically targeted to the routines of the job. As a corollary: Narrower tests that predict well do so largely because they happen themselves to be correlated with tests of general cognitive ability.

  2. Mental tests predict job performance largely via their loading on g.

  3. The correlations between tested intelligence and job performance are higher than had been estimated prior to the 1980s. They are high enough to have economic consequences.

IQ tests frequently measure one's ability to solve abstract puzzles. Programming interview algorithm puzzles are tests of a person's abstract problem-solving ability. I wonder how much of Google's algorithm interview tests predictive power comes from g factor. Some of it must. The question is: How much? If the answer is "a lot" then these tests could be a de facto workaround for the 1971 Supreme Court case Griggs v. Duke Power Co. which found that IQ-based employment constituted employment discrimination under disparate impact theory.

An applicant for a job as a mechanics should be judged on how well he does on a mechanical aptitude test while an applicant for a job as a clerk should be judged on tests measuring clerical skills, and so forth. So decreed the Supreme Court, and why not? In addition to the expert testimony before the Court favoring it, it seemed to make good common sense…. The problem is that common sense turned out to be wrong.

The best experiments compel lots of people people to do things. The US military compels a lots of people to do things. Thus, some of our best data on g's relationship to job performance comes from the military.

Enlisted Military Skill Category Percentage of Training Success Explained by g Percentage of Training Success Explained by Everything Else Nuclear weapons specialist 77.3 0.8 Air crew operations specialist 69.7 1.8 Weather specialist 68.7 2.6 Intelligence specialist 66.7 7.0 Fireman 59.7 0.6 Dental assistant 55.2 1.0 Security police 53.6 1.4 Vehicle maintenance 49.3 7.7 Maintenance 28.4 2.7

"[T]he explanatory power of g was almost thirty times greater than of all other cognitive factors in ASVAB combined." In addition, the importance of g was stronger for more complicated tasks. Other military studies find similar results to this one.

There's no reason to believe civilian jobs are any less dependent on g than military jobs. For cognitively-demanding jobs like law, neurology and research in the hard sciences, we should expect the percentage of training success explained by g to be well over 70%. Similar results appear for civilian jobs.

If we measure civilian job performance instead of military training success we get a smaller (but still large) impact of g. Note that the measures below probably contain significant overlap. Part of college grades' predictive power comes from them being an imperfect measure of g.

Predictor Validity Predicting Job Performance Ratings Cognitive test score .53 Biographical data .37 Reference checks .26 Education .22 Interview .14 College grades .11 Interest .10 Age -.01

Charles Murray's data shows that a secretary or a dentist who is one standard deviation better than average is worth a 40% premium in salary. Such jobs undersell the impact of worker variation among job performers. Jobs with leverage have a disproportionate impact on society. Anyone who has worked in a highly-technical field with leverage (like software developers, scientists or business executives) knows that someone one standard deviation above average is worth much more than 40% more.

As technology advances, the number of highly-technical jobs with leverage increases. This drives up the value of g which increases income inequality.

Social Partitioning

The cognitive elite usually partition ourselves off into specialized neighborhoods. For example, I live in Seattle. Seattle is one of the most software-heavy cities in the world. Seattle contains headquarters of Microsoft and Amazon are here. You can barely throw a router without hitting a programmer. You'd expect highschools to be full of technical volunteers. But that's only in the rich neighborhoods. I, weirdly, live in a poor, dangerous[1] neighborhood where I volunteer as a coach for the local high school's robotics club. If I wasn't around there would be no engineers teaching or coaching at the highschool. None of my friends live here. They all live in the rich, safe neighborhoods.

Heritability of Intelligence

The most modern study of identical twins reared in separate homes suggests a heritability for general intelligence .75 and .80, a value near the top of the range found in contemporary technical literature. Other direct estimates use data on ordinary siblings who were raised apart or on parents and their adopted-away children. Usually the heritability estimates from such data are lower but rarely below .4.

The heritability of intelligence combines with cognitive stratification to increase IQ variance. The average correlation husband-wife IQ is between .2 and .6. Whatever the number used to be, I expect it has increased in the 27 years since The Bell Curve was published. Technically-speaking, elite graduates have always married each other. However, the concentration of cognitive ability among elites increases the genetic impact of this phenomenon.

Negative Effects of Low Intelligence

All the graphs in this section controls for race by including only white people.


Is poverty caused by IQ or by one's parents' social class? What would you bet money that the answer is?

Parental social economic status matters but the impact is small compared to IQ.

The black lines intersect at an IQ of 130. I think that once you pass a high enough threshold of intelligence, school stops mattering because you can teach yourself things faster than schools can teach you. Credentials don't matter either because exceptional people are wasted in cookie-cutter roles.

High School Graduation

There was no IQ gap between high school dropouts and graduates in the first half of the 20th century, before graduating high school became the norm. After high school became the norm, dropouts became low IQ.

IQ Percentage of Whites Who Did Not Graduate of Pass a High School Equivalency Exam >125 0 110-125 0 (actually 0.4) 90-110 6 75-90 35 <75 55

In this case, IQ is even more predictive than parental social economic status. However, for temporary dropouts, social economic status matters a lot. (In terms of life outcomes, youths with a GED look more like dropouts than high school graduates.)

The image I (and Charles Murray) get is of dumb rich kids who get therapists, private tutors, special schools—the works. Highschool is easier to hack than college (and work, as we'll get to later). The following graph is, once again, white youths only.

Labor Force Participation

Being smart causes you to work more. Being born rich causes you to work less.

Being smart causes work-inhibiting disability.

No. of White Males per 1,000 Who Reported Being Prevented from Working by Health Problems IQ No. of White Males Per 1,000 Who Reported Limits in Amount or Kind of Work by Health Problems 0 >125 13 5 110-125 21 5 90-110 37 36 75-90 45 78 <75 62

Lower-intelligence jobs tend to involve physical objects which can injure you. However, this fails to account for the whole situation. "[G]iven that both men have blue-collar jobs, the man with an IQ of 85 has double the probability of a work disability of a man with an IQ of 115…the finding seems to be robust." It could be that dumb people are more likely to injure themselves or that they misrepresent their reasons not working or both.

Technically, unemployment is different from being out of the labor force. Unemployment also shows that being smart is negatively correlated with being unemployed in 1989.

Parental socioeconomic status had no measurable effect on unemployment. All that money spent on buying a high school diploma does not transfer to increased employment status. The following graph is of white men.


Young white women with lower IQ are much more likely to give birth to an illegitimate baby in absolute terms and relative to legitimate births. How much more?

IQ Percentage of Young White Women Who Have Given Birth to an Illegitimate Baby Percentage of Births that are Illegitimate >125 2 7 110-125 4 7 90-110 8 13 75-90 75-90 17 <75 32 42

Not only are children of mothers in the top quartile of intelligence…more likely to be born within marriage, they are more likely to have been conceived within marriage (no shotgun wedding).

As usual, IQ outweighs parental socioeconomic status. The following graph is for white women.

Remember that IQ correlates with socioeconomic status. "High socioeconomic status offered weak protection against illegitimacy once IQ had been taken into account."

Welfare Dependency

Charles Murray gives a bunch of graphs and charts about how IQ affects welfare dependency. I bet you can guess what kind of a relationship they show.


A low IQ [of the mother] is a major risk factor [for a low birth weight baby], whereas the mother's socioeconomic background is irrelevant.

Surprisingly to me, the mother's age at birth of the child did not affect her changes of giving birth to a low-birth-weight baby. Poverty didn't matter either. I suspect this is because America has a calorie surplus. I predict poverty was a very important factor in extremely poor pre-industrial societies.

A mother's socioeconomic background does have a large effect (independent of the mother's IQ) of her child's chances of spending the first years of its life in poverty. This isn't to say IQ doesn't matter. It's just the first result in our entire analysis where IQ doesn't dominate all other factors.

Mother IQ does have a big impact on the quality of her childrens' home life.

IQ Percentage of Children Growing Up in Homes in the Bottom Decile of the HOME Index >125 0 110-125 2 90-110 6 75-90 11 <75 24

The children of mothers with low IQs have worse temperaments (more difficulty and less friendliness), worse motor & social development and more behavior problems. (There's a bump in some worse outcomes for the smartest mothers, but this might just be an artifact of the small sample size.) The mother's socioeconomic background has a large effect on childrens' development problems, though not quite as high as the mother's IQ.

If you want smart kids then a smart mother is way more important than the mother's socioeconomic background. By now, this should come as no surprise.


High IQ correlates with not getting involved with the criminal justice system. Move along.

Ethnicity and Cognition

Different ethnic groups vary on cognitive ability.

Jews—specifically, Ashkenazi Jews of European origins—test higher than any other ethnic group…. These tests results [sic] are matched by analyses of occupational and scientific attainment by Jews, which consistently show their disproportionate level of success, usually by orders of magnitude, in various inventories of scientific and artistic achievement.

"Do Asians Have Higher IQs than Whites? Probably yes, if Asian refers to the Japanese and Chinese (and perhaps also Koreans), whom we will refer here as East Asians." Definitely yes if "Asian" refers to Chinese-Americans. This can be entirely explained by US immigration policy. It is hard to get into the USA if you are an East Asian. The United States has discriminated against Asian immigrants for most of its history and continues to do so. The United States is a desirable place to life. If you're an Asian and you want to get into the US then it helps to be smart. If would be weird if Asian-Americans weren't smarter than other immigrants. ("Other immigrants" includes all non-Asian, non-Native Americans.) Since intelligence is significantly heritable and people tend to intermarry among our own ethnic groups (often because the alternative was illegal[2]), a founder effect can be expected to persist across the handful of generations the United States has existed for.

The Bell Curve is mostly about America. It's disconcerting to me when he suddenly compares American students to students from Japan and Hong Kong. When he says "black" he uses a sample of African-American (and not Africa-African) but when he says "Japanese" he uses a sample of Japan-Japanese (and not Japanese-American). When he says "Jews" he includes the whole global diaspora and not (I presume) Latino converts.

I think Charles Murray fails to realize that Asian-Americans are such a biased sample of Asians that the two must be separated when you're studying g. Fortunately, Asia-Asians are not a critical pillar of Murray's argument. Charles Murray tends to bucket Americans into black and white and somtimes Latino.

Black and White Americans

These differences are statistical. They apply to populations.

People frequently complain of IQ tests being biased. It is possible to determine whether a test is biased.

"If the SAT is biased against blacks, it will underpredict their college performance. If tests were biased in this way, blacks as a group would do better in college than the admissions office expected based on just their SATs." In either case "[a] test biased against blacks does not predict black performance in the real world in the same way that it predicts white performance in the real world. The evidence of bias is external in the sense that it shows up in differing validities for blacks and whites. External evidence of bias has been sought in hundreds of studies. It has been evaluated relative to performance in elementary school, in secondary school, in the university, in the armed forces, in unskilled and skilled jobs, in the professions. Overwhelmingly, the evidence is that the major standardized tests used to help make school and job decisions do not underpredict black performance, nor does the expert community find that other general or systematic difference in the predictive accuracy of tests for blacks and whites."

IQ tests often involve language. A smart Russia-Russian genius who does not speak English would fail an IQ test given in English. "For groups that have special language considerations—Latinos and American Indians, for example—some internal evidence of bias has been found, unless English is their native language." Native language is not an issue for African-Americans because African-Americans are native English speakers.

What about cultural knowledge? "The [black-white] difference is wider on items that appear to be culturally neutral than on items that appear to be culturally loaded. We italicise this point because it is both so well established empirically yet comes as such a surprise to most people who are new to this topic."

What about test-taking ability and motivation? We can test whether testing itself is behind a black-white difference by comparing standard IQ tests to tests of memorizing digits. Reciting digits backwards takes twice as much g as reciting them forward. This experiment controls for test-taking ability and motivation because the forward and backward recitations are given under identical conditions. The black-white difference is about twice as great concerning reciting digits backwards as it is concerning reciting digits forwards.

Reaction correlates strongly with g but movement time is less correlated. Whites consistently beat blacks on reaction time even tests though black movement time is faster than white movement time.

Any explanation for a the black-white IQ difference based on culture and society must explain the IQ difference, the number recitation difference, the reaction time difference, the movement time similarity and the difference in every cognitive measures of performance and achievement.

Lead in the water or epigenetic effects of slavery would constitute such an explanation. Such explanations would throw into doubt whether the difference is genetic but would also prove biological determinism.

What about socioeconomic status? The size of the black-white IQ gaps shrinks when socioeconomic status is controlled for. However, socioeconomic status is at least partially a result of cognitive ability. "In terms of the numbers, a reasonable rule of thumb is that controlling for socioeconomic status reduces the overall B/W difference by about third."

We can test for whether socioeconomic status causes the IQ difference by comparing blacks and whites of equal socioeconomic status. If the black-white IQ difference was caused by socioeconomic status then blacks and whites of equal socioeconomic status would have similar IQs. This is not what we observe.

It might be that the black-white difference comes from a mix of socioeconomic status plus systemic racism.


Charles Murray's analysis of Africa-Africans bothers me for the same reason his analysis of Asians bothers me. In this case, he assumes African-Americans are representative of Africa-Africans. For instance, he discusses how difficult it is "to assemble data on the average African black" even though African-Americans are mostly from West Africa. Given pre-historical human migration patterns, it is my understanding that West Africans are more genetically distant from East Africans than White people are from Asians. If I am right about Africa-African diversity then Africa-Africans are too broad of a reference class. He should be comparing African-Americans to West Africans[3].

Charles Murray believes scholars are reluctant to discuss Africa-African IQ scores because they are so low. I think he means to imply that African-African and African-American IQs are genetically connected. I think such a juxtaposition undersells the Flynn Effect. Industrialization increases the kind of abstract reasoning measured by IQ tests. Fluid and crystallized intelligence have both increased in the rich world in the decades following WWII. The increase happened too fast for it to be because of evolution. It might be due to better health or it could be because our environment is more conducive to abstract thought. I suspect the Flynn Effect comes from a mix of both. The United States and Africa are on opposite ends of the prosperity spectrum. Charles Murray is careful to write "ethnicity" instead of "race", but his classification system is closer to how I think about race than how I think about ethnicity. African-Americans and Africa-Africans are of the same race but different ethnicities.

African blacks are, on average, substantially below African-Americans in intelligence scores. Psychometrically, there is little reason to think that these results mean anything different about cognitive functioning than they mean in non-African populations. For our purposes, the main point is that the hypothesis about the special circumstances of American blacks depressing their test scores is not substantiated by the African data.

I disagree with Charles Murray's logic here. Suppose (in contradiction to first-order genetic pre-history) that Africa-Africans and the African diaspora were genetically homogeneous. A difference in IQ between African-Americans and Africa-Africans would imply that which society you live in substantially influences IQ. If America is segregated in such a way that kept African-Americans living in awful conditions then we would expect African-Americans' IQs to be depressed. Jim Crow laws were enforced until 1965. Martin Luther King Jr. was shot in 1968, a mere 26 years before the publication of The Bell Curve. Blacks and whites continue to be de facto racially segregated today in 2021. Even if racism ended in 1965 (it didn't), 29 years is not enough time to complete erase the damage caused by centuries of slavery and Jim Crow.

Charles Murray does acknowledge the possible effect of systemic racism. "The legacy of historic racism may still be taking its toll on cognitive development, but we must allow the possibility that it has lessened, at least for new generations. This too might account for some narrowing of the black-white gap."

Black-White Trends

The black-white gap narrowed in the years leading up to the publication of The Bell Curve. This is exactly what we would expect to observe if IQ differences are caused by social conditions because racism has been decreasing over the decades.

Charles Murray acknowledges that rising standards of living increase the intelligence of the economically disadvantaged because improved nutrition, shelter and health care directly removes impediments to brain development. The biggest increase in black scores happened at the low end of the range. This is evidence that improved living conditions of life improved IQ because the lowest hanging fruit hangs from the bottom end of the socioeconomic ladder.

How much is genetic?

Just because something is heritable does not mean the observed differences are genetic in origin. "This point is so basic, and so commonly misunderstood, that it deserves emphasis: That a trait is genetically transmitted in individuals does not mean that group differences in that trait are also genetic in origin." For example, getting skinny early can be caused by genetics or it can be caused by liposuction. The fact that one population is fat and another population is skinny does not mean that the difference was caused by genetics. It could just be that one group has better access to liposuction.

As demonstrated earlier, socioeconomic factors do not influence IQ much. For the black-white difference to be explained by social factors, those factors would have to exclude socioeconomic status.

Recall further that the B/W difference (in standardized units) is smallest at the lowest socioeconomic levels. Why, if the B/W difference is entirely environmental, should the advantage of the "white" environment compared to the "black" be greater among the better-off and better-educated blacks and whites? We have not been able to think of a plausible reason. An appeal to the effects of racism to explain ethnic differences also requires explaining why environments poisoned by discrimination and racism for some other groups—against the Chinese or the Jews in some regions of America, for example—have left them with higher scores than the national average.

One plausible reason is that Chinese-Americans and Jews value academic success stronger than whites and blacks. African-Americans' African culture was systematically destroyed by slavery. They never got the academic cultural package. We could test the cultural values hypothesis by examining at what happens when Chinese or Jewish kids are raised by white families and vice versa. The Bell Curve doesn't have this particular data but it does have white-black data. An examination of 100 adopted children of black, white and mixed racial ancestery found that "[t]he bottom line is that the gap between the adopted children with two black parents and the adopted children with two white parents was seventeen points, in line with the B/W difference customarily observed. Whatever the environmental impact may have been, it cannot have been large." This is evidence against the cultural transmission hypothesis—at least when comparing blacks and whites. Several other studies "tipped toward some sort of mixed gene-environment explanation of the B/W difference without saying how much of the difference is genetic and how much environmental…. It seems highly likely to us that both genes and the environment have something to do with racial differences. What might the mix be? We are resolutely agnostic on that issue; as far as we can determine, the evidence does not yet justify an estimate…. In any case, you are not going to learn tomorrow that all the cognitive differences between races are 100 percent genetic in origin, because the scientific state of knowledge, unfinished as it is, already gives ample evidence that environment is part of the story."

For Japanese living in Asia, a 1987 review of the literature demonstrated without much question that the verbal-visuospatial difference persists even in examinations that have been thoroughly adapted to the Japanese language and, indeed, in tests developed by the Japanese themselves. A study of a small sample of Korean infants adopted into white families in Belgium found the familiar elevated visuospatial scores.

The study of Korean infants seems like the right way to answer this question. The only issue is the small sample size.

What's especially interesting to me, personally, is that "East Asians living overseas score about the same or slightly lower than whites on verbal IQ and substantially higher on visuospatial IQ." This suggests to me that the stereotype of white managers supervising Asian engineers might reflect an actual difference in abilities. (If anyone has updated evidence which contradicts this, please put it in the comments.)

"This finding has an echo in the United States, where Asian-American students abound in engineering, in medical schools, and in graduate programs in the sciences, but are scarce in laws schools and graduate programs in the humanities and the social sciences." I agree that unfamiliarity with the English and American culture is not a plausible explanation for relatively subpar Asian-American linguistic performance. Asian-Americans born in the United States are fluent English speakers. However, I offer an alternative explanation. It could be that engineering, medicine and the sciences are simply more meritocratic than law, the humanities and the social sciences.

Interestingly, "American Indians and Inuit similarly score higher visuospatial than verbally; their ancestors migrated to the Americas from East Asia hundreds of centuries ago. The verbal-visuospatial discrepancy goes deeper than linguistic background." This surprised me since the Inuit are descended form the Aleut who migrated to America around 10,000 years ago—well before East Asian civilization. It's not obvious to me what environmental pressures would encourage higher visuospatial ability for Arctic Native Americans compared to Europeans.

Charles Murray dismisses the hypothesis that East Asian culture improves East Asians' visuospatial abilities.

Why do visuospatial abilities develop more than verbal abilities in people of East Asian ancestry in Japan, Hong Kong, Taiwan, mainland China, and other Asian countries and in the United States and elsewhere, despite the differences among the cultures and languages in all of those countries? Any simple socioeconomic, cultural, or linguistic explanation is out of the question, given the diversity of living conditions, native languages, educational resources, and cultural practices experienced by Hong Kong Chinese, Japan in Japan or the United States, Koreans in Korea or Belgium, and Inuit or American Indians.

I don't know what's going on with the Native Americans or exactly what "other Asian countries" includes (I'm betting it doesn't include Turks) but people from East Asia and the East Asian disapora have cultures that consistently value book learning. Japan, Hong Kong, Taiwan and mainland China eat similar foods, write in similar ways, (except, perhaps, for Korea) are all cultural descendants of the Tang Dynasty.

If Native Americans have high IQ and high IQ improves life outcomes then why aren't Native Americans overrepresented in the tech sector? I was so suspicious of the Native Americans connection that I looked up the their IQ test scores. According to this website, Native American IQ is below average for the US and Canada. Native Americans seem to me like the odd ones out of this group. Sure, they might have relatively high visualspatial abilities compared to linguistic abilities. But Native American IQs are below East Asians'. I think Charles Murray is once again using too big of a bucket. East Asians and Native Americans should not be lumped together.

We are not so rash as to assert that the environment or the culture os wholly irrelevant to the development of verbal and visuospatial abilities, but the common genetic history of racial East Asians and their North American or European descendents on the one hand, and the racial Europeans and their North American descendents, on the other, cannot plausibly be dismissed as irrelevant.

I think the common history of East Asians and Native Americans can (in this context[4]) be totally dismissed as irrelevant. Just look at alcohol tolerance. Native Americans were decimated when Europeans introduced alcohol. Meanwhile, East Asians have been drinking alcohol long enough to evolve the Asian flush. These populations have been separate for so long that one of them adapted to civilization in a way the other one didn't. Charles Murray proved that high visuospatial abilities help people rise to the top of a technically-advanced civilization. It would not surprise me one group that has competed against itself inside of the world's most technologically-advanced civilization for hundreds of generations had a higher visuospatial ability than another group which hasn't.

Race and Employment

Lots of (but not all) racial differences in life outcomes can be explained by controlling for IQ.

Projecting Demography

The higher the education, the fewer the babies.

Different immigrant populations have different IQs. Richard Lynn assigned "means of 105 to East Asians, 91 to Pacific populations, 84 to blacks, and 100 to whites. We assign 91 to Latinos. We know of no data for Middle East or South Asian populations that permit even a rough estimate." I like how the data here breaks Asians down into smaller groups. The average "works out to about 95" seems like a bad omeb but immigrants tend to come from worse places than the United States. I expect the Flynn effect will bring their descendents' average up.

So what if the mean IQ is dropping by a point or two per generation? One reason to worry is that the drop may be enlarging ethnic differences in cognitive ability at a time when the nation badly needs narrowing differences. Another reason to worry is that when the mean shifts a little, the size of the tails of the distribution changes a lot.

While this makes sense on paper, we need to acknowledge a technical point about statistics. The Bell Curve is named after the Gaussian distribution. IQ is a Gaussian distribution. But that doesn't necessarily reflect a natural phemonenon. IQ tests are mapped to a Gaussian distribution by fiat. Charles Murray never proved that IQ is actually a Gaussian distribution. Many real-world phenomena are long-tailed. (Though biological phenomena like height are often Gaussian.) It is a perfectly reasonable prior that small changes to the mean could result in large effects at the tails. Ashkenazi Jewish history small changes to the mean do cause a massive impact on the tails. But I don't think the evidence presented in The Bell Curve is adequate to prove that g is Gaussian distributed.

Raising Cognitive Ability

Recent studies have uncovered other salient facts about the way IQ scores depend on genes. They have found, for example, that the more general the measure of intelligence—the closer it is to g—the higher is the heritability. Also, the evidence seems to say that the heritability of IQ rises as one ages, all the way from early childhood to late adulthood…. Most of the traditional estimates of heritability have been based on youngsters, which means that they are likely to underestimate the role of genes later in life.

If better measures of g have higher heritability that's a sign that it's the worse measures of g are easier to hack. If the heritability of IQ goes up as one ages that suggests youth interventions are just gaming the metrics—especially when youth interventions frequently produce only short-term increases in measured IQ.

Once a society has provided basic schooling and eliminated the obvious things like malnutrition and lead in the water, the best way to increase g will be eugenics. (I am bearish on AI parenting.) I am not advocating a return to the unscientific policies of the 20th century. Forcibly imposing eugenic policies is horrific and counterproductive. Rather, I predict that once good genetic editing technology is available, parents will voluntarily choose the best genes for their children. There will[5] come a day when not giving your kids the best genes will be seen by civilized people as backwards and reactionary. The shift in societal ethics will happen no later than a two generations (forty years) after the genetic editing of human zygotes becomes safe and affordable.

Besides cherry-pick our descendants' genotypes, is there anything else we can do? Improved nutrition definitely increases cognitive ability, but there is diminishing returns. Once you have adequate nutrition, getting more adequate nutrition doesn't do anything.

Having school (verses no school) does raise IQ. Thus, "some of the Flynn effect around the world is explained by the upward equalization of schooling, but a by-product is that schooling in and of itself no longer predicts adult intelligence as strongly…. The more uniform a country's schooling is, the more correlated the adult IQ is with childhood IQ." Increasing access to schooling increases the strength of natural differences on IQ because when you eliminate societally-imposed inequality all that's left is natural variation.

A whole bunch of programs purport to increase IQ but none of them show a significant long-term effect after many years. It seems to me like they're just gaming the short-term metrics. "An inexpensive, reliable method of raising IQ is not available."

Affirmative Action College Affirmative Action

I'm not going to dive deep into Charles Murrays thoughts on affirmative action because they're incontrovertible. Affirmative action in college admissions prioritizes affluent blacks over disadvantaged whites. It's also anti-Asian.

The edge given to minority applicants to college and graduate school is not a nod in their favor in the case of a close call but an extremely large advantage that puts black and Latino candidates in a separate admissions competition. On elite campuses, the average black freshman is in the region of the 10th to 15th percentile of the distribution of cognitive ability among white freshman. Nationwide, the gap seems to be at least that large, perhaps larger. The gap does not diminish in graduate school. If anything, it may be larger.

In the world of college admissions, Asians are a conspicuously unprotected minority. At the elite schools, they suffer a modest penalty, with the average Asian freshman being at about the 60th percentile of the white cognitive ability distribution. Our data from state universities are too sparse to draw conclusions. In all of the available cases, the difference between white and Asian distributions is small (either plus or minus) compared to the large differences separating blacks and Latinos from whites.

The edge given to minority candidates could be more easily defended if the competition were between disadvantaged minority youths and privileged white youths. But nearly as large a cognitive difference separates disadvantaged black freshmen from disadvantaged white freshmen. Still more difficult to defend, blacks from affluent socioeconomic backgrounds are given a substantial edge over disadvantaged whites.

Racist admissions harm smart blacks and Latinos.

In society at large, a college degree does not have the same meaning for a minority graduate and a white one, with consequences that reverberate in the workplace and continue throughout life.

Workplace Affirmative Action

[A]fter controlling for IQ, it is hard to demonstrate that the United States still suffers from a major problem of racial discrimination in occupations and pay.


Charles Murray ends with a chapter on where we're going, which he followed it up later with an entire book on class stratification among white Americans.

What worries us first about the emerging cognitive elite is its coalescence into a class that views American society increasingly through a lens of its own.

The problem is not simply that smart people rise to the top more efficiently these days. If the only quality that CEOs of major corporations and movie directors and the White House inner circle had in common were their raw intelligence, things would not be so much different now than they have always been, for some degree the most successful have always been drawn disproportionally from the most intelligent. But the invisible migration of the twentieth century has done much more than let the most intellectually able succeed more easily. It has also segregated them and socialized them. The members of the cognitive elite are likely to have gone to the same kinds of schools, live in similar neighborhoods, go to the same kinds of theaters and restaurants, read the same magazines and newspapers, watch the same television programs, even drive the same makes of cars.

They also tend to be ignorant of the same things.

  1. I was robbed at gunpoint last weekend. ↩︎

  2. Interracial marriage was illegal in nearly every state before 1888. It remained illegal in 15 states all the way until 1967 when the laws were overturned by the Supreme Court ruling Loving v. Virginia. ↩︎

  3. Unless Charles Murray believes that forces outlined in Jared Diamond's Guns, Germs and Steel (which, ironically, was written in opposition to race-based heritable theories of achievement differences) caused Eurasians to evolve higher g than their African forbears. While writing this footnote, I realized that the hypothesis is worth considering. Rice-based peoples evolved alcohol intolerance. Indian, Iraqi, Chinese and Japanese men evolved small penises. Software advances faster than hardware. It would be weird if civilization didn't cause cognitive adaptations too. I want to predict that cognitive adaptations to happen faster than physiological adaptations but I don't know how they can be compared. ↩︎

  4. In Jared Diamond's Guns, Germs, and Steel context, Native American's sister relationship to East Asians does matter. ↩︎

  5. As usual, this prediction is conditional on neither the singularity nor a civilizational collapse occurring. ↩︎


2020 PhilPapers Survey Results

2 ноября, 2021 - 08:00
Published on November 2, 2021 5:00 AM GMT

In 2009, David Bourget and David Chalmers ran the PhilPapers Survey (results, paper), sending questions to "all regular faculty members" at top "Ph.D.-granting [philosophy] departments in English-speaking countries" plus ten other philosophy departments deemed to have "strength in analytic philosophy comparable to the other 89 departments".

Bourget and Chalmers now have a new PhilPapers Survey out, run in 2020 (results, paper). I'll use this post to pick out some findings I found interesting, and say opinionated stuff about them. Keep in mind that I'm focusing on topics and results that dovetail with things I'm curious about (e.g., 'why do academic decision theorists and LW decision theorists disagree so much?'), not giving a neutral overview of the whole 100-question survey.

The new survey's target population consists of:

(1) in Australia, Canada, Ireland, New Zealand, the UK, and the US: all regular faculty members (tenuretrack or permanent) in BA-granting philosophy departments with four or more members (according to the PhilPeople database); and (2) in all other countries: English-publishing philosophers in BA-granting philosophy departments with four or more English-publishing faculty members.

In order to make comparisons to the 2009 results, the 2020 survey also looked at a "2009-comparable departments" list selected using similar criteria to the 2009 survey:

It should be noted that the “2009-comparable department” group differs systematically from the broader target population in a number of respects. Demographically, it includes a higher proportion of UK-based philosophers and analytic-tradition philosophers than the target population. Philosophically, it includes a lower proportion of theists, along with many other differences evident in comparing 2020 results in table 1 (all departments) to table 9 (2009-comparable departments).

Based on this description, I expect the "2009-comparable departments" in the 2020 survey to be more elite, influential, and reasonable than the 2020 "target group", so I mostly focus on 2009-comparable departments below. In the tables below, if the row doesn't say "Target" (i.e., target group), the population is "2009-comparable departments".

Note that in the 2020 survey (unlike 2009), respondents could endorse multiple answers.


1. Decision theory

Newcomb's problem: The following groups (with n noting their size, and skipping people who skipped the question or said they weren't sufficiently familiar with it) endorsed the following options in the 2020 survey:

Groupn     one box two boxesdiffPhilosophers (Target)107131%39%8%Decision theorists (Target)4821%73%58%Philosophers47028%43%15%Decision theorists2223%73%50%

5% of decision theorists said they "accept a combination of views", and 9% said they were "agnostic/undecided".

I think decision theorists are astonishingly wrong here, so I was curious to see if other philosophy fields did better.

I looked at every field where enough surveyed people gave their views on Newcomb's problem. Here they are in order of 'how much more likely are they to two-box than to one-box':

Groupn     one box two boxesdiffPhilosophers of gender, race, and sexuality1323%54%31%20th-century-philosophy specialists2313%43%30%Social and political philosophers6219%48%29%Philosophers of law1414%43%29%Phil. of computing and information (Target)1822%50%28%Philosophers of social science2433%58%25%Philosophers of biology2030%55%25%General philosophers of science5326%51%25%Philosophers of language10025%47%22%Philosophers of mind9625%45%20%Philosophers of computing and information520%40%20%19th-century-philosophy specialists1010%30%20%Philosophers of action3228%47%19%Logic and philosophy of logic5826%43%17%Philosophers of physical science2528%44%16%Epistemologists12932%47%15%Meta-ethicists7229%43%14%Metaphysicians12332%44%12%Normative ethicists10232%42%10%Philosophers of religion1533%40%7%Philosophers of mathematics1932%37%5%17th/18th-century-philosophy specialists3931%36%5%Metaphilosophers2335%39%4%Applied ethicists5034%38%4%Philosophers of cognitive science5032%36%4%Aestheticians1429%21%-8%Greek and Roman philosophy specialists1741%29%-12%

(Note that many of these groups are small-n. Since philosophers of computing and information were an especially small and weird group, and I expect LWers to be extra interested in this group, I also looked at the target-group version for this field.)

Every field did much better than decision theory (by the "getting more utility in Newcomb's problem" metric). However, the only fields that favored one-boxing over two-boxing was ancient Greek and Roman philosophy, and aesthetics.

After those two fields, the best fields were philosophy of cognitive science, applied ethics, metaphilosophy, philosophy of mathematics, and 17th/18th century philosophy (only 4-5% more likely to two-box than one-box), followed by philosophy of religion, normative ethics, and metaphysics.

My quick post-hoc, low-confidence guess about why these fields did relatively well is (hiding behind a spoiler tag so others can make their own unanchored guesses):

 My inclination is to model the aestheticians, historians of philosophy, philosophers of religion, and applied ethicists as 'in-between' analytic philosophers and the general public (who one-box more often than they two-box, unlike analytic philosophers). I think of specialists in those fields as relatively normal people, who have had less exposure to analytic-philosophy culture and ideas and whose views therefore tend to more closely resemble the views of some person on the street.

This would also explain why the "2009-comparable departments", who I expected to be more elite and analytic-philosophy-ish, did so much worse than the "target group" here.

I would have guessed, however, that philosophers of gender/race/sexuality would also have done relatively well on Newcomb's problem, if 'analytic-philosophy-ness' were the driving factor.

I'm pretty confused about this, though the small n for some of these populations means that a lot of this could be pretty random. (E.g., network effects: a single just-for-fun faculty email thread about Newcomb's problem could convince a bunch of  philosophers of sexuality that two-boxing is great. Then this would show up in the survey because very few philosophers of sexuality have ever even heard of Newcomb's problem, and the ones who haven't heard of it aren't included.)

At the same time, my inclination is to treat philosophers of cognitive science, mathematics, normative ethics, metaphysics, and metaphilosophy as 'heavily embedded in analytic philosophy land, but smart enough (/ healthy enough as a field) to see through the bad arguments for two-boxing to some extent'.

There's also a question of why cognitive science would help philosophers do better on Newcomb's problem, when computer science doesn't. I wonder if the kinds of debates that are popular in computer science are the sort that attract people with bad epistemics? ('Wow, the Chinese room argument is amazing, I want to work in this field!') I really have no idea, and wouldn't have predicted this in advance.

Normative ethics also surprises me here. And both of my explanations for 'why did field X do well?' are post-hoc, and based on my prior sense that some of these fields are much smarter and more reasonable than others.

It's very plausible that there's some difference between the factors that make aestheticians one-box more, and the factors that make philosophers of cognitive science one-box more. To be confident in my particular explanations, however, we'd want to run various tests and look at various other comparisons between the groups.

The fields that did the worst after decision theory were philosophy of gender/race/sexuality, 20th-century philosophy, philosophy of language, philosophy of law, political philosophy, and philosophy of biology, of social science, and of science-in-general.

A separate question is whether academic decision theory has gotten better since the 2009 survey. Eyeballing the (small-n) numbers, the answer is that it seems to have gotten worse: two-boxing became even more popular (in 2009-comparable departments), and one-boxing even less popular:

n=31 for the 2009 side of the comparison, n=22 for the 2020 side. The numbers above are different from the ones I originally presented because Bourget and Chalmers include "skip" and "insufficiently familiar" answers, and exclude responses that chose multiple options, in order to make the methodology more closely match that of the 2009 survey.


2. (Non-animal) ethics

Regarding "Meta-ethics: moral realism or moral anti-realism?":

Groupn               moral realism  moral anti-realismPhilosophers (Target)171962%26%Philosophers63062%27%Applied ethicists6467%23%Normative ethicists13274%16%Meta-ethicists9468%22%

Regarding "Moral judgment: non-cognitivism or cognitivism?":

Groupn               cognitivism  non-cognitivismPhilosophers (Target)163669%21%Philosophers59470%20%Applied ethicists6276%23%Normative ethicists13282%14%Meta-ethicists9376%16%

Regarding "Morality: expressivism, naturalist realism, constructivism, error theory, or non-naturalism?":

Groupn        non-natnat realismconstructexpresserrorPhilosophers (Target)102427%32%21%11%5%Philosophers38625%33%19%10%5%Applied ethicists4020%35%38%5%0%Normative ethicists9034%36%21%8%1%Meta-ethicists6837%29%18%13%7%

Regarding "Normative ethics: virtue ethics, consequentialism, or deontology?" (putting in parentheses the percentage that only chose the option in question):

Groupn               deontology consequentialismvirtue ethicscombinationAll (Target)174132% (20%)31% (21%)37% (25%)16%All63137% (23%)32% (22%)31% (19%)17%Applied...6356% (33%)38% (22%)33% (8%)27%Normative...13246% (27%)32% (20%)36% (17%)25%Meta...9444% (28%)32% (22%)24% (13%)19%

Excluding responses that endorsed multiple options, we can see that normative ethicists have moved away from deontology and towards virtue ethics since 2009, though deontology is still the most popular:

30 normative-ethicist respondents also wrote in "pluralism" or "pluralist" in the 2020 survey.

Regarding "Trolley problem (five straight ahead, one on side track, turn requires switching, what ought one do?): don't switch or switch?":

Groupn               switch  don't switchPhilosophers (Target)173663%13%Philosophers63568%12%Applied ethicists6371%13%Normative ethicists13270%13%Meta-ethicists9470%13%

Regarding "Footbridge (pushing man off bridge will save five on track below, what ought one do?): push or don't push?":

Groupn               push  don't pushPhilosophers (Target)174022%56%Philosophers63623%58%Applied ethicists6324%63%Normative ethicists13217%70%Meta-ethicists9418%66%

Regarding "Human genetic engineering: permissible or impermissible?":

Groupn               permissibleimpermissiblePhilosophers (Target)105962%19%Philosophers37968%13%Applied ethicists3985%8%Normative ethicists8369%12%Meta-ethicists5968%12%

Regarding "Well-being: hedonism/experientialism, desire satisfaction, or objective list?":

Groupn               hedonismdesire satisfactionobjective listPhilosophers (Target)96710%19%53%Philosophers3489%19%54%Applied ethicists4316%26%56%Normative ethicists9011%21%63%Meta-ethicists6211%24%55%

Moral internalism "holds that a person cannot sincerely make a moral judgment without being motivated at least to some degree to abide by her judgment". Regarding "Moral motivation: externalism or internalism?":

Groupn               internalism  externalismPhilosophers (Target)142941%39%Philosophers52838%42%Applied ethicists5753%37%Normative ethicists12834%51%Meta-ethicists9235%47%

One of the largest changes in philosophers' views since the 2009 survey is that philosophers have somewhat shifted toward externalism. In 2009, internalism was 5% more popular than externalism; now externalism is 3% more popular than internalism.

(Again, the 2009-2020 comparisons give different numbers for 2020 in order to make the two surveys' methodologies more similar.)


3. Minds and animal ethics

Regarding "Hard problem of consciousness (is there one?): no or yes?":

Groupn         yesnoPhilosophers (Target)146862%30%Philosophers of computing and information (Target)1844%50%Philosophers36658%33%Philosophers of cognitive science4048%50%Philosophers of mind9162%34%

Regarding "Mind: non-physicalism or physicalism?":

Groupn         physicalismnon-physicalismPhilosophers (Target)173352%32%Phil of computing and information (Target)2564%24%Philosophers63059%27%Philosophers of cognitive science7378%12%Philosophers of computing and information560%20%Philosophers of mind13560%26%

Regarding "Consciousness: identity theory, panpsychism, eliminativism, dualism, or functionalism?":

GroupndualismelimfunctionidentitypanpsychismPhilosophers (Target)102022%5%33%13%8%Phil of computing (Target)1625%19%31%0%0%Philosophers36222%4%33%14%7%Phil of cognitive science4314%12%40%19%2%Philosophers of mind9524%2%38%12%10%

Regarding "Zombies: conceivable but not metaphysically possible, metaphysically possible, or inconceivable?" (also noting "agnostic/undecided" results):

+impossiblepossibleagnosticPhilosophers (Target)161016%37%24%11%Phil of computing (Target)2429%25%17%8%Philosophers58215%42%22%11%Phil of cognitive science7224%50%11%6%Phil of computing520%40%20%0%Philosophers of mind13220%51%17%3%

My understanding is that the "psychological view" of personal identity more or less says 'you're software', the "biological view" says 'you're hardware', and the "further-fact view" says 'you're a supernatural soul'. Regarding "Personal identity: further-fact view, psychological view, or biological view?":

Groupn  biologicalpsychologicalfurther-factPhilosophers (Target)161519%44%15%Phil of computing... (Target)2322%70%4%Philosophers59820%44%13%Philosophers of cognitive science6926%55%6%Philosophers of computing...540%60%20%Philosophers of mind13026%47%12%

Comparing this to some other philosophy subfields, as a gauge of their health:

Groupn  biologicalpsychologicalfurther-factDecision theorists1817%67%17%Epistemologists14422%35%17%General philosophers of science5829%55%7%Metaphysicians15023%39%15%Normative ethicists12714%46%15%Philosophers of language12018%38%17%Philosophers of mathematics1817%61%6%Philosophers of religion2524%16%44%

Decision theorists come out of this looking pretty great (I claim). This is particularly interesting to me, because some people diagnose the 'academic decision theorist vs. LW decision theorist' disagreement as coming down to 'do you identify with your algorithm or with your physical body?'.

The above is some evidence that either this diagnosis is wrong, or academic decision theorists haven't fully followed their psychological view of personal identity to its logical conclusions.

Regarding "Mind uploading (brain replaced by digital emulation): survival or death?" (adding answers for "the question is too unclear to answer" and "there is no fact of the matter"):

Groupn      survivaldeathQ too unclearno factPhilosophers (Target)101627%54%5%4%Phil of computing... (Target)1947%42%11%0%Philosophers36928%54%5%4%Decision theorists1242%8%8%17%Philosophers of cognitive science3735%51%5%3%Philosophers of mind9134%52%4%2%

From my perspective, decision theorists do great on this question — very few endorse "death", and a lot endorse "there is no fact of the matter" (which, along with "survival", strike me as good indirect signs of clear thinking given that this is a kind-of-terminological question and, depending on terminology, "death" is at best a technically-true-but-misleading answer).

Also, a respectable 25% of decision theorists say "agnostic/undecided", which is almost always something I give philosophers points for — no one's an expert on everything, a lot of these questions are confusing, and recognizing the limits of your own understanding is a very positive sign.

Regarding "Chinese room: doesn't understand or understands?" (adding "the question is too unclear to answer" responses):

Groupn  understandsdoesn'tQ too unclearPhilosophers (Target)103118%67%6%Phil of computing... (Target)1822%56%17%Philosophers38118%66%6%Philosophers of cognitive science4434%50%7%Philosophers of mind9115%70%8%

Regarding "Other minds (for which groups are some members conscious?)" (looking only at the "2009-comparable departments", except for philosophy of computing and information because there aren't viewable results for that subgroup):

(Options: adult humans; cats; fish; flies; worms; plants; particles; newborn babies; current AI systems; future AI systems.)

(Respondent groups: philosophers; applied ethicists; decision theorists; meta-ethicists; metaphysicians; normative ethicists; philosophy of biology; philosophers of cognitive science; philosophers of computing and information; philosophers of mathematics; philosophers of mind.)

 nadult hcatfishflywormplantparticbaby hAIAI futPhil40497%93%68%36%24%7%2%89%2%43%App35100%94%77%31%17%3%0%89%0%63%Dec1486%86%71%36%21%0%0%64%0%57%MtE6498%92%72%38%23%6%2%91%0%47%MtP10898%97%71%42%32%7%3%95%4%49%Nor8599%94%73%35%22%5%0%88%4%41%Bio13100%85%62%38%15%8%0%69%8%54%Cog4498%98%73%39%25%11%2%95%2%50%Com19100%89%68%32%26%5%0%89%11%58%Mat14100%93%86%57%43%7%0%93%7%50%Min9299%97%79%45%30%10%4%96%0%43%

I am confused, delighted, and a little frightened that an equal (and not-super-large) number of decision theorists think adult humans and cats are conscious. (Though as always, small n.)

Also impressed that they gave a low probability to newborn humans being conscious — it seems hard to be confident about the answer to this, and being willing to entertain 'well, maybe not' seems like a strong sign of epistemic humility beating out motivated reasoning.

Also, 11% of philosophers of cognitive science think PLANTS are conscious??? Friendship with philosophers of cognitive science ended, decision theorists new best friend.

Regarding "Eating animals and animal products (is it permissible to eat animals and/or animal products in ordinary circumstances?): vegetarianism (no and yes), veganism (no and no), or omnivorism (yes and yes)?":

Groupn          omnivorismvegetarianismveganismPhilosophers (Target)176448%26%18%Philosophers64346%27%21%Applied ethicists6433%23%41%Normative ethicists13140%27%31%Meta-ethicists9441%23%24%Philosophers of mind13448%31%12%Philosophers of cognitive science7348%29%14% 4. Metaphysics, philosophy of physics, and anthropics

Regarding "Sleeping beauty (woken once if heads, woken twice if tails, credence in heads on waking?): one-half or one-third?" (including the answers "this question is too unclear to answer," "accept an alternative view," "there is no fact of the matter," and "agnostic/undecided"):

Groupn1/31/2unclearaltno factagnosticPhilosophers (Target)42928%19%8%1%3%40%Philosophers19127%20%4%2%5%42%Decision theorists1354%8%0%15%0%23%Epistemologists7033%20%3%3%1%40%Logicians and phil of logic2836%14%4%7%4%36%Phil of cognitive science1828%22%6%0%0%44%Philosophers of mathematics650%17%0%0%0%33%

Regarding "Cosmological fine-tuning (what explains it?): no fine-tuning, brute fact, design, or multiverse?":

Groupndesignmultiversebrute factno fine-tuningPhilosophers (Target)80717%15%32%22%Philosophers28914%16%35%22%Decision theorists138%23%23%38%General phil of science339%21%48%24%Metaphysicians9328%18%32%13%Phil of cognitive science...326%19%50%13%Phil of physical science1613%25%38%6%

Regarding "Quantum mechanics: epistemic, hidden-variables, many-worlds, or collapse?":

Groupn    collapsehidden-varmany-worldsepistemicPhilosophers (Target)55617%22%19%13%Philosophers19713%29%24%8%Decision theorists838%13%63%13%General phil of science2631%23%27%4%Metaphysicians7812%35%31%5%Phil of cognitive science...166%31%31%19%Phil of physical science1513%33%33%0%

From SEP:

What is the metaphysical basis for causal connection? That is, what is the difference between causally related and causally unrelated sequences?

The question of connection occupies the bulk of the vast literature on causation. [...] Fortunately, the details of these many and various accounts may be postponed here, as they tend to be variations on two basic themes. In practice, the nomological, statistical, counterfactual, and agential accounts tend to converge in the indeterministic case. All understand connection in terms of probability: causing is making more likely. The change, energy, process, and transference accounts converge in treating connection in terms of process: causing is physical producing. Thus a large part of the controversy over connection may, in practice, be reduced to the question of whether connection is a matter of probability or process (Section 2.1).

Regarding "Causation: process/production, primitive, counterfactual/difference-making, or nonexistent?": 

Groupn       counterfactualprocessprimitivenonPhilosophers (Target)89237%23%21%4%Philosophers34239%22%21%3%Decision theorists1471%7%7%7%Metaphysicians10332%28%28%5%Philosophers of cognitive science3858%21%13%5%Philosophers of physical science1663%19%0%6%

Regarding Foundations of mathematics: constructivism/intuitionism, structuralism, set-theoretic, logicism, or formalism?:

GroupconstructivformallogicismstructuralsetPhilosophers (Target)60015%6%12%21%15%Philosophers22912%4%12%24%17%Philosophers of mathematics1513%7%7%40%33%


5. Superstition

Regarding "God: atheism or theism?" (with subfields ordered by percentage that answered "theism"):

Groupn          theism     Philosophy (Target)177019%Philosophy64513%Philosophy of religion2774%Medieval and Renaissance philosophy1060%Philosophy of action4221%17th/18th century philosophy6420%20th century philosophy3619%Metaphysics15316%Normative ethics13216%19th century philosophy2214%Asian philosophy714%Decision theory2214%Philosophy of mind13513%Aesthetics3013%Ancient Greek and Roman philosophy3113%Applied ethics6413%Epistemology15313%Logic and philosophy of logic7013%Philosophy of language12812%Philosophy of social science2711%Meta-ethics9410%Philosophy of law2110%Philosophy of mathematics2010%Social and political philosophy10010%Philosophy of gender, race, and sexuality239%Philosophy of biology248%Philosophy of physical science268%General philosophy of science656%Philosophy of cognitive science735%Philosophy of computing and information50%Philosophy of the Americas50%Continental philosophy150%Philosophy of computing and information (Target)250%Metaphilosophy290%

This question is 'philosophy in easy mode', so seems like a decent proxy for field health / competence (though the anti-religiosity of Marxism is a confounding factor in my eyes, for fields where Marx is influential).

The "A-theory of time" says that there is a unique objectively real "present", corresponding to "which time seems to me to be right now", that is universal and observer-independent, contrary to special relativity. The "B-theory of time" says that there is no such objective, universal "present".

This provides another good "reasonableness / basic science literacy" litmus test, so I'll order the subfields (where enough people in the field answered at all) by how much more they endorse B-theory over A-theory. Regarding "Time: B-theory or A-theory?":

Groupn          A-theoryB-theorydiffPhilosophy (Target)112327%38%11%Philosophy44922%44%22%19th century philosophy1331%8%-23%Philosophy of religion2245%27%-18%Medieval and Renaissance philosophy933%22%-11%Philosophy of law1030%20%-10%Aesthetics1822%17%-5%Philosophy of social science1724%24%0%Social and political philosophy4229%33%4%Philosophy of gender, race, and sexuality1225%33%8%Ancient Greek and Roman philosophy2124%33%9%Philosophy of action3129%39%10%20th century philosophy2223%36%13%Normative ethics7831%44%13%Philosophy of cognitive science4918%31%13%Meta-ethics6027%43%16%Philosophy of mathematics1712%29%17%Asian philosophy520%40%20%Applied ethics2931%52%21%Epistemology11321%42%21%Philosophy of mind11120%41%21%17th/18th century philosophy4414%36%22%Metaphilosophy2317%39%22%Metaphysics14428%51%23%Logic and philosophy of logic6128%52%24%Phil of computing and information (Target)2025%50%25%Philosophy of language10722%54%32%General philosophy of science4819%56%37%Philosophy of biology1619%63%44%Decision theory1613%63%50%Philosophy of physical science2612%62%50%

Decision theorists doing especially well here is surprising to me! Especially since they didn't excel on theism; if they'd hit both out of the park, from my perspective that would have been a straightforward update to "wow, decision theorists are really exceptionally reasonable as analytic philosophers go, even if they're getting Newcomb's problem in particular wrong".

As is, this still strikes me as a reason to be more optimistic that we might be able to converge with working decision theorists in the future. (Or perhaps more so, a reason to be relatively optimistic about persuading decision theorists vs. people working in most other philosophy areas.)

(Added: OK, after writing this I saw decision theorists do great on the 'personal identity' and 'mind uploading' questions, and am feeling much more confident that productive dialogue is possible. I've added those two questions earlier in this post.)

(Added added: OK, decision theorists are also unusually great on "which things are conscious?" and they apparently love MWI. How have we not converged more???)


6. Identity politics topics

Regarding "Race: social, unreal, or biological?":

Groupn          biologicalsocialunrealPhilosophy (Target)164919%63%15%Philosophy59121%67%12%

(Note that many respondents said 'yes' to multiple options.)


7. Metaphilosophy

Regarding "Philosophical progress (is there any?): a little, a lot, or none?":

Group                                n            nonea littlea lotPhilosophers (Target)17754%47%42%Philosophers6453%45%46%

Regarding "Philosophical knowledge (is there any?): a little, none, or a lot?":

Group                                n            nonea littlea lotPhilosophers (Target)11104%33%56%Philosophers3974%30%58%

Another interesting result is "Philosophical methods (which methods are the most useful/important?)", which finds (looking at analogous-to-2009 departments):

  • 66% of philosophers think "conceptual analysis" is especially important, 14% disagree.
  • 60% say "empirical philosophy", 12% disagree.
  • 59% say "formal philosophy", 10% disagree.
  • 51% say "intuition-based philosophy", 27% disagree.
  • 44% say "linguistic philosophy", 23% disagree.
  • 39% say "conceptual engineering", 23% disagree.
  • 29% say "experimental philosophy", 39% disagree.


8. How have philosophers' views changed since 2009?

Bourget and Chalmers' paper has a table for the largest changes in philosophers' views since 2009:

As noted earlier in this post, one of the larger shifts in philosophers' views was a move away from moral internalism and toward externalism.

On 'which do you endorse, classical logic or non-classical?' (a strange question, but maybe this is something like 'what kind of logic is reality's source code written in?'), non-classical logic is roughly as unpopular as ever, but fewer now endorse classical logic, and more give answers like "insufficiently familiar with the issue" and "the question is too unclear to answer":


Epistemic contextualism says that the accuracy of your claim that someone "knows" something depends partly on contextual features — e.g., the standards for "knowledge" can rise "as the stakes rise or the skeptical doubts become more serious".

Here, it was the less popular view (invariantism) that lost favor; and the view that lost favor again lost it via an increase in 'other' answers (especially  "insufficiently familiar with the issue" and "agnostic/undecided") more so than increased favor for its rival view (contextualism):


Humeanism (a misnomer, since Hume himself wasn't a Humean, though his skeptical arguments helped inspire the Humeans) say that "laws of nature" aren't fundamentally different from other observed regularities, they're just patterns that humans have given a fancy high-falutin name to; whereas anti-Humeans think there's something deeper about laws of nature, that they in some sense 'necessitate' things to go one way rather than another.

(Maybe Humeans = 'laws of nature are program outputs like any other', non-Humeans = 'laws of nature are part of reality's source code'?)

Once again, one view lost favor (the more popular view, non-Humeanism), but the other didn't gain favor; instead, more people endorsed "insufficiently familiar with the issue", and "agnostic/undecided", etc.:


Philosophers in 2020 are more likely to say that "yes", humans have a priori knowledge of some things (already very much the dominant view):


'Aesthetic value is objective' was favored over 'subjective' (by 3%) in 2009; now 'subjective' is favored over 'objective' (by 4%). "Agnostic/undecided" also gained ground.


Philosophers mostly endorsed "switch" in the trolley dilemma, and still do; but "don't switch" gained a bit of ground, and "insufficiently familiar with the issue" lost ground.


Moral realism also became a bit more popular (was endorsed by 56% of philosophers, now 60%), as did compatibilism about free will (was 59% compatibilism, 14% libertarianism, 12% no free will; now 62%, 13%. and 10%).

The paper also looked at the individual respondents who answered the survey in both 2009 and 2020. Individuals tended to update away from switching in the trolley dilemma, away from consequentialism, and toward virtue ethics and non-cognitivism. They also updated toward Platonism about abstract objects, and away from 'no free will'.

These are all comparisons across 2009-target-population philosophers in general, however. In most (though not all) cases, I'm more interested in the views of subfields specialized in investigating and debating a topic, and how the subfield's view changes over time. Hence my earlier sections largely focused on particular fields of philosophy.