Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 21 час 24 минуты назад

A conversation about cooking and creativity

27 июля, 2021 - 08:00
Published on July 27, 2021 5:00 AM GMT

My girlfriend is a creative cook, and worked in the Portland food industry - renowned for its innovation by small kitchens - for ten years. I'm interested in creativity in scientific research. Where does it come from?

Furthermore, what is the relationship between scientific creativity and scholarship - the immense amount of background material that scientists-in-training are expected to absorb? Over the last year, I have spent a lot of time focused on learning how to do effective scholarship. I wanted to know how to learn the theory of differential equations, immunology, biochemistry, and computer science.

In our conversation about culinary creativity, we started off with a simple question. What is the difference between a competent chef and a creative one?

She answered with an example. A competent chef can follow a recipe. A creative chef can predict what the recipe will taste like, and knows in advance how to modify it to suit her own taste. She can tell from the recipe whether the soup will be thick or thin, or have a good flavor, and knows how to adjust the broth or the spices. Some of that happens while she cooks, by tasting and adjusting, but some of it happens in her imagination, during the planning phase.

Let's break this down into components, and see if we can draw an analogy between culinary and scientific creativity. We have:

  • A recipe
  • A cook, who has:
    • Her own preferences, needs and goals
    • Some imagination and technical knowledge for how to transform the recipe to achieve those goals
    • The resources to turn her cooking plans into reality

What's the analogy with science?

  • The recipe is like the prior research in the lab or subfield
  • The cook is the scientist, who has:
    • A research problem, with its own requirements and goals
    • Some imagination and technical knowledge for how to transform the prior research to move closer to achieving those goals
    • The resources to turn her scientific plan into an experiment

This helps illuminate my intuition that scholarship and creativity feel like related, but separate tasks, and that scholarship doesn't directly lead to creativity. If scholarship is akin to the technical knowledge that helps a chef modify a recipe to her taste, then it's what helps a scientist build on prior research to advance closer to the goals of the research program. You need to have "scientific taste," a sense of where you're trying to go, and a vision for how to get there. The technical knowledge helps you figure out how to accomplish that, but you don't really derive your vision and "taste" from the mechanical details of your subject.

In my experience, classwork is almost entirely about developing that technical knowledge, and involves very little of cultivating scientific taste, or going through the process of building on prior experiments and imagining new ones. So this feels like a great area for educational outsiders to create exercises and prompts, which I welcome in the comments.

But it also invites curiosity about what's required to carry out a creative process in scientific research. To see why, let's turn back to our cooking/science analogy. The cook has some mysterious faculties allowing her to have a sense of "scientific taste:" a tongue, a nose, and a set of associations with memory and food. These somehow allow her to generate a sense of what she finds delicious, or disgusting, or just bland and disappointing. The tongue and nose are marvelously sensitive (in many people) to subtleties of flavor and intensity. And we know pretty clearly what effect we will have if we add more salt, or water, or potatoes into the soup.

In scientific research, we can't directly "taste the soup," and we can't as easily know what effect our additions had. Techniques for determining causality, measuring effect sizes,  and operationalizing concepts, are the researcher's "scientific tongue," the basis of their sense of "taste." Learning how to use those techniques, get information out of them, and have a reaction that makes you intuit that this is how we need to "modify the recipe," seems like a good analogy for how to become a creative scientist.

And note that this is different from just being able to pull out and explain that information from a scientific paper. I can look up an effect size in a paper, no problem. But in the culinary world, you don't get nearly the same kind of practice being a creative chef by accepting another cook's opinion on the soup as you do by forming your own opinion on it. Can we do better? How?

That's not to say that there's no role for exchange of opinions, in the kitchen or the laboratory. Frequently, after cooking a meal for her family, my girlfriend will ask everyone for their critical opinions on the flavor and texture. They'll share their honest opinions. I only ever notice how delicious it was, so I'm pretty much useless! But she seems to benefit from this peer review process, often just by having her own opinion reinforced, or getting a better understanding of the taste of her family members.

Likewise, absorbing the author's stated opinion of their own work, or the feedback from a formal or informal peer review process, should be a chance to compare your own opinion with that of another expert. But in order for that to be most useful, you need to have a "scientific tongue" of your own, and form your own evaluation.

Here are a few skills that seem like the "nerve endings" of the scientific tongue:

  1. Understanding the pros and cons of various methods and measurements.
  2. Being able to understand the statistics and figures used to measure and report the data, and having a sense of how intensely good, bad, or "meh" the findings are.
  3. Knowing what the overall goal of the research is, and the intermediate goals that are its more immediate agenda - including the ones that didn't get mentioned in the paper. Also, being able to imagine new intermediate goals that still fit within the field (sort of like being able to do fusion cuisine that still has recognizable influences).
  4. Being able to compare it to other studies on a similar topic, and describe the value the diversity in their approaches.
  5. Knowing, big picture, what sorts of things we'd like to be able to do in order to "taste" and "improve" the "recipe."


Neuroscience things that confuse me right now

27 июля, 2021 - 00:01
Published on July 26, 2021 9:01 PM GMT

(Quick poorly-written update, probably only of interest to neuroscientists.)

It’s not that I’m totally stumped on these; mostly these are things I haven’t looked into much yet. Still, I’d be very happy and grateful for any pointers and ideas.

Most of these are motivated by one or more of my longer-term neuroscience research interests: (1) “What is the “API” of the telencephalon learning algorithm?” (relevant to AGI safety because maybe we’ll build similar learning algorithms and we’ll want to understand all our options for “steering” them towards trying to do the things we want them to try to do), (2) “How do social instincts work, in sufficient detail that we could write the code to implement them ourselves?” (relevant to AGI safety for a couple reasons discussed here), (3) I’d also like to eventually have good answers to the “meta-problem of consciousness” and “meta-problem of suffering”, but maybe that’s getting ahead of myself.

1. Layout of autonomic reactions / “assessments” in amygdala, mPFC, ACC, etc.

In Big Picture of Phasic Dopamine, I talked about these areas under “Dopamine category #3: supervised learning”. In the shorter follow-up A Model of Decision-making in the Brain I talked about them as “Step 2”, the step where possible plans are "assessed" in dozens of genetically-hardcoded categories like "If I do this plan, would it be a good idea to raise my cortisol levels?".

Anyway, at a zoomed-out level, I think I have a good story that explains a lot. At a zoomed-in level, I'm pretty unclear on exactly what's happening where.

What I have so far is:

  • I think the outputs in question are coming from (1) agranular prefrontal cortex, (2) agranular anterior cingulate cortex, (3) there’s a little piece of insular cortex which is also agranular; it’s right next to PFC (more specifically OFC) and for all intents and purposes we should lump it in with agranular PFC (see Wise 2017—"Although the traditional anatomical literature often treats the orbitofrontal and insular cortex as distinct entities, a detailed analysis of their architectonics, connections, and topology revealed that the agranular insular areas are integral parts of an “orbital prefrontal network”"), and (4) the amygdala, or at least part of it. My main interest / confusion is the division of labor among these things.
  • There’s one very important special “assessment calculation”, namely the “Reward Prediction”. (Again see here.) The O’Reilly PVLV model says that this signal comes from vmPFC somewhere, but I’m not sure exactly where.
  • I just started reading Bud Craig’s book and he says: Y’know how motor cortex & somatosensory cortex are the output and feedback input (respectively) for the normal (musculoskeletal) motor control system? Well by the same token, we should think of cingulate cortex & insular cortex as the output and feedback input (respectively) of the autonomic control system. Or something like that. That’s an interesting idea for how to think about ACC specifically.
  • I sometimes think that I should think of neocortical assessment areas as being more “sophisticated” than amygdala assessments—more dependent on abstract context, less dependent on low-level sensory input. I’m not sure if that’s correct though. The original reason I was thinking this was (at least partly) because neocortex has six layers and the amygdala doesn’t, it’s a lot simpler. But as I noted here, that’s not necessarily right, it may actually be correct to think of the amygdala as a “neocortex layer 6b” that happened to have physically separated from the other layers. (What are the other layers? Answer: The lateral nucleus of amygdala is layer 6b of “ventral temporal cortex”, and the basomedial and posterior nuclei of the amygdala is layer 6b of “amygdalar olfactory cortex”. Or so says Swanson 1998.)
  • I sometimes think that neocortical assessments are projecting a bit farther into the future than amygdala assessments (e.g. maybe ACC says “it be appropriate to raise cortisol levels one second from now”, while amygdala says “it will be appropriate to raise cortisol levels 0.2 seconds from now”), but I’m not sure if that’s right. Well, I'm pretty confident that it's right for the vmPFC “reward prediction” assessment I mentioned above, but I'm not sure it generalizes to other assessments.
1.1 What’s with S.M.?

As I mentioned here, S.M., a person supposedly missing her whole amygdala and nothing else, seems to have more-or-less lost the ability to have (and to understand in others) negative emotions, but not positive emotions. This seems to suggest that the amygdala triggers negatively-valenced autonomic outputs, and not positively-valenced ones. But my impression from other lines of evidence is that the amygdala can do both. So I'm confused by that.

1.2 The lesions that cause pain asymbolia are in the wrong place

I was reading a book about pain asymbolia (the ability to be intellectually aware of pain inputs, without caring about them or reacting to them). On a quick skim, I got the impression that this condition is caused by lesions of the insular cortex. Unless I’m confusing myself, that’s backwards from what I would have expected: I would have thought a lesion of the insular cortex should make a person intellectually unaware of the pain input (since the “primary interoceptive cortex” in the insula should presumably be what feeds that information into higher-level awareness / GNW), but still motivated by and reacting to the pain input (since that comes from these assessment areas, like maybe ACC, working in loops with the hypothalamus / brainstem that ultimately feeds into dopamine-based motivation signals).

I can kinda come up with a story that hangs together, but it has a lot of implausible-to-me elements.

…But then I saw a later paper that early studies didn’t reproduce and maybe pain asymbolia is not caused by insula lesions after all. But their evidence isn’t that great either. Also, their proposed alternative lesion sites wouldn't make it any easier for me to explain.

Well anyway, I guess I’m hoping that things will clear up when I read more about the insula, survey the literature better, etc. But for now I'm confused.

2. A few things about the superior colliculus

We have two sensory-processing systems, one in the cortex and one in the brainstem. I have a nice little story about how they relate:

I think the brainstem one needs to take incoming sensory data and use it to answer a finite list of genetically-hardcoded questions like “Is there something here that looks like a spider? Is there something here that sounds like a human voice? Am I at imminent risk of falling from a great height? Etc. etc.” And it needs to do that from the moment of birth, using I guess something like hardcoded image classifiers etc.

By contrast, the cortex one is a learning algorithm. It needs to take incoming sensory data and put it into an open-ended predictive model. Whatever patterns are in the data, it needs to memorize them, and then go look for patterns in the patterns in the patterns, etc. Like any freshly-initialized learning algorithm, this system is completely useless at birth, but gets more and more useful as it accumulates learned knowledge, and it’s critical for taking intelligent actions in novel environments.

Well anyway, that’s a neat story, but there are other things going on with the superior colliculus too, and I’m hazy on the details of what they are and why.

2.1 Connections from neocortex sensory processing to superior colliculus

Let’s just talk about the case of vision, although I believe there are analogs for auditory cortex, somatosensory, etc.

As far as I understand, there are connections from primary visual cortex (V1) to the superior colliculus (SC), arranged topographically—i.e. the parts that analyze the same part of the visual field are wired together.

One theory is that these connections are cortical motor control (superior colliculus is involved in moving the eyes / saccades, in addition to sensory processing). I heard the "motor control" theory from Jeff Hawkins (he didn't really defend it in the thing I read, he just claimed it). I think Hawkins likes that theory because it fits in neatly with “cortical uniformity”—every part of the cortex is a sensorimotor processing system, he says. A new paper from S. Murray Sherman and W. Martin Usrey also says that these connections are motor commands. I don’t know who else thinks that, those are the only two places I’ve seen it.

I generally don’t like the “motor control” theory. For one thing, my understanding is that V1 is not set up with the cortico-basal ganglia-thalamo-cortical loops that the brain uses for RL, and I normally think you need RL to learn motor control. For another thing, aren’t the frontal eye fields in charge of saccades?? (At least, in charge at the cortical level.) For yet another thing, it seems to me that “V1 cortical column #832” is not in a good position to know whether saccading to the corresponding part of the visual field is a good or bad idea. The decision of where and when to saccade needs to incorporate things like “what am I trying to do”, “what’s going on in general”, “what has high value-of-information”, etc.—information that I don’t think a particular V1 column would have.

The closest thing to motor control theory that kinda makes sense to me is a “Confusing things are happening here” message. More specifically, each V1 column ought to “know” if it's the case that higher-level models keep issuing confident predictions about what’s gonna happen at that part of the visual field, and those predictions keep being falsified. So when that happens, it could send a "Confusing things are happening here" message to SC.

Those messages would not be exactly a motor command per se, but the SC could reasonably act on the information by saccading to the confusing area. So then the messages wind up being more-or-less a motor command in effect.

That's not bad, but I'm still not entirely happy about this theory. For one thing, it seems not to match which neocortical layer these messages are coming out of. Also, I think that "the saccade target that best resolves a confusion" is not necessarily "the saccade target where incorrect predictions keep happening", and my introspection tentatively says that I would tend to saccade to the former, not the latter, when they disagree.

So here's one more theory I was thinking about. There’s a thing where if there’s a sudden flashing light, we immediately saccade to it, and maybe do other orienting reactions like move our head and body (and maybe also release cortisol etc.). My impression is that it’s SC that decides that this reaction is appropriate, and that orchestrates it.

But if we expect the flashing light, we’ll be less likely to orient to it.

So maybe the V1 → SC axons are saying: “Hey SC, there’s about to be motion in this particular part of the visual field. So if you see something there, it’s fine, chill out, we don’t have to orient to it.”

I don’t know which of those ideas (or something else entirely) is the real explanation, and haven’t looked into it too much.

2.2 Connections from superior colliculus to neocortex sensory processing

I think these exist too. Why?

I guess I always have my go-to cop-out answer of "They provide "context" that the neocortical learning algorithm can exploit to make better predictions". But maybe there's something else going on.

2.3 Learning in the superior colliculus

Contrary to my neat theorizing, there does seem to be some learning that happens in SC. I mean, I guess there kinda has to be, insofar as SC has some role in orchestrating motor commands, and the body keeps changing as it grows. I’m just generally hazy on what is being learned and where the ground truth comes from. I'll return to this in a later section below.

3. Why are there (a few) dopamine receptors in primary visual cortex?

Dopamine receptors are stereotypically used for RL, although I happen to think they're used for supervised-learning too. But (see here), V1 doesn't seem to me to have use for either of those things. Predictive learning (augmented by top-down attention and variable learning rates) seems like the right tool for the visual-processing job, and I don’t see what could be missing.

Yet there are in fact dopamine receptors in V1, apparently. Very few of them! But some! That makes it even weirder, right?

This paper found that mice with no D2 receptors (anywhere, not just V1) had close-to-normal vision. The differences were small, and I presume indirect; in fact the D2-knockout mice had slightly sharper vision!

…So anyway, I'm at a loss, this doesn’t make any sense to me. I’m tempted to just shrug and say “there’s some process tangentially related to vision processing, and the circuits doing that thing happen to be intermingled with the normal V1 visual-processing circuits, and that’s what the dopamine is there for.” I’m not happy about this. :-/

As with everything else here, I haven't looked into it much.

4. Learning-from-scratch-ism in motor cortex

(For definition of “learning-from-scratch-ism” see here.)

I’ll start by saying that I really like Michael Graziano’s grand unified theory of motor cortex. He argues (e.g. here and his book, and see here for someone arguing against) that the textbook division of motor-related cortex into “primary motor cortex”, “premotor cortex”, “supplementary motor area”, “frontal eye field”, “supplementary eye field”, etc. etc., is all kinda arbitrary and wrongheaded. Instead all those areas are basically doing the same kinds of thing in the same way, namely orchestrating different species-typical actions. If you think about mapping a discontinuous multi-dimensional space of species-typical actions onto a 2D piece of cortical sheet, you’re gonna get some sharp boundaries, and that’s where those textbook divisions come from.

Anyway, all that is kinda neat, but the part I’m confused about is how the motor cortex learns to do this. Like, what are the training signals, and how are those signals calculated?

One hint is that the midbrain can apparently also perform species-typical actions. I’m very unclear on what’s the difference between when the midbrain orchestrates a species-typical action versus the cortex orchestrating (nominally) the same action. I doubt they’re redundant; that would be a big waste of space, compared to having a much smaller area of cortex that merely “presses go” on the midbrain motor programs. Or does motor cortex do a better job somehow? How do these two regions talk to each other? Does the cortex teach the midbrain? Does the midbrain teach the cortex? Does the midbrain “initialize” the cortex and then the cortex improves itself by RL? Does the midbrain motor system learn, and if so, how does it get ground truth?

I don’t know, and I haven’t really looked into it, I’m just currently confused about what’s going on here.

And certainly I can’t feel good about advocating the truth of “learning-from-scratch-ism” if I’m not confident that the theory is compatible with everything we know about motor cortex.

5. Every brainstem-to-telencephalon neuromodulator signal besides dopamine and acetylcholine: what do they do?

I feel generally quite happy about my big-picture understanding of dopamine (see here) and acetylcholine (see here), even if I have a few confusions around the edges. But I haven’t gotten a chance to look at serotonin, norepinephrine, and so on, or at least not much. I’ve tried a little bit and nothing I read made any sense to me at all. So I remain confused.


Refactoring Alignment (attempt #2)

26 июля, 2021 - 23:12
Published on July 26, 2021 8:12 PM GMT

I've been poking at Evan's Clarifying Inner Alignment Terminology. His post gives two separate pictures (the objective-focused approach, which he focuses on, and the robustness-focused approach, which he mentions at the end). We can consolidate those pictures into one and-or graph as follows:

And-or graphs make explicit which subgoals are jointly sufficient, by drawing an arc between those subgoal lines. So, for example, this claims that either intent alignment + objective robustness or outer alignment + robustness would be sufficient for impact alignment.

The red represents what belongs entirely to the robustness-focused path. The yellow represents what belongs entirely to the objective-focused path. The blue represents what's on both paths.

Note, in particular, that both paths seek outer alignment + objective robustness + capability robustness. According to the above picture, the disagreement between the two paths is only one of which of these sub-goals are better grouped together.

But this doesn't seem to actually be true. Objective Robustness and Inner Alignment Terminology points out that, really, the two approaches want to define some of the terminology differently.

Putting together the revisions from my previous post on the subject, clarifications from Objective Robustness and Inner Alignment Terminology, and some other thoughts, I suggest this revised joint graph:

The and-or graph here has been supplemented with double-headed arrows, which indicate a looser relationship of pseudo-equivalence (more on this later).

  • Behavioral Alignment: This is just another way to say "impact alignment" that's more consistent with the rest of the terminology. Behavioral alignment means alignment in terms of what the system actually does. I don't want to delve into the definition of the term "alignment" itself in this post, so, that's about all I can say.
  • Inner Robustness: This means that the mesa-objective is efficiently pursued under a wide range of circumstances (ie, including distributional shift). In other words: whatever the mesa-optimizer wants, it is broadly capable of achieving it.
  • On-Distribution Alignment: Objective Robustness and Inner Alignment pointed out that the robustness-focused path re-defines "outer alignment" as "alignment on the training distribution" (so that we can then think of the rest of the alignment problem as a problem of generalization). I take this to mean both that the base objective is aligned on the training distribution, and that the behavior of the trained system is aligned on the training distribution. (One implies the other, if training succeeds.)
  • Robustness: Performing well on the base objective in a wide range of circumstances.
  • Intent Alignment: A model is intent-aligned if it has a mesa-objective, and that mesa-objective is aligned with humans. (Again, I don't want to get into exactly what "alignment" means.)
  • Capability Robustness: As elsewhere, I define this as performing well on a behavioral objective even off-distribution. The system is highly capable at something, but we say nothing about what that thing is.
  • Objective Robustness: The behavioral objective of the system is aligned with the base objective, even under distributional shift.
  • Inner Alignment: A system is inner-aligned if it has a mesa-objective, and that mesa-objective is aligned with the base objective.
  • Outer Alignment: The base objective is aligned with humans.
Yellow Lines:

These lines represent the objective-centric approach. I think this rendering is more accurate than Evan's, primarily because my definition of intent alignment seems truer to Paul's original intention, and secondarily because inner alignment and outer alignment now form a nice pair.

  • Inner Alignment + Outer Alignment →.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  Intent Alignment: This is by transitivity of alignment. If the mesa-objective is aligned with the base objective, and the base objective is aligned with humans, then the mesa-objective will be aligned with humans.
  • Intent Alignment + Inner Robustness → Behavioral Alignment: If something is intent-aligned, and also achieves its intent reliably, then it must be behaviorally aligned.

This path apparently implies building goal-oriented systems; all of the subgoals require that there actually is a mesa-objective. In contrast, I think researchers who identify with this path probably don't all think the end result would necessarily be goal-oriented. For example, my impression of what people mean by "solving the inner alignment problem" includes building systems which robustly avoid having inner optimizers at all. This is not well-represented by the proposed graph.

We could re-define "inner alignment" to mean "the mesa-objective aligns with the base objective, or the system lacks any mesa-objective" -- but this includes a lot of dumb things under "inner aligned", which seems intuitively wrong. 

A closer term is acceptability, which could plausibly be defined as "not actively pursuing a misaligned goal". However, I was not sure how to put anything like this into the graph in a nice way.

Red Lines:

These lines represent the robustness-focused approach.

  • Capability Robustness + Objective Robustness  Robustness: We perform well on the behavioral objective in a wide range of circumstances; and, the behavioral objective is aligned with the base objective in a wide range of circumstances; therefore, we perform well on the base objective in a wide range of circumstances.
  • Robustness + On-Distribution Alignment  Behavioral Alignment: We perform well on the base objective in training, and we generalize well, therefore we perform well in general.

This approach has some distinct advantages over the objective-focused approach. First, it does not assume the existence of inner optimizers at any point. It is possible that this approach could succeed without precisely defining "inner optimizer", identifying mesa-objectives and checking their alignment, or anything like that. Second, this approach can stand on the shoulders of existing statistical learning theory. If the whole problem boils down to generalization guarantees, then perhaps we just need to advance work on the same kinds of problems which machine learning has faced since its inception.

A subtlety here is that the base objective matters in two different ways. For "on-distribution alignment", we only care about how the base objective performs on the training data. This makes sense: that's the only way it effects training, so why would we care about correctly specifying outer alignment off-distribution? Instead, we rely on generalization to specify that part correctly. This seems like an advantage to the approach, because it greatly reduces the outer alignment problem.

However, objective robustness also depends on the base objective, and specifically depends on the off-distribution behavior of the base objective. This reflects the fact that to generalize correctly, the system does need to get information about the off-distribution base objective somehow. But how? In prosaic AI, only on-distribution behavior of the loss function can influence the end result.

I can see a few possible responses here.

  1. Double down on the "correct generalization" story: hope to somehow avoid the multiple plausible generalizations, perhaps by providing enough training data, or appropriate inductive biases in the system (probably both).
  2. Achieve objective robustness through other means. In particular, inner alignment is supposed to imply objective robustness. In this approach, inner-alignment technology provides the extra information to generalize the base objective appropriately.

Response #2 is consistent with how the robustness-focused path has been drawn by others; IE, it includes inner alignment as a subgoal of objective robustness. However, including this fully in the robustness-focused path seems unfortunate to me, because it adds mesa-objectives as a necessary assumption (since inner alignment requires them). Perhaps dealing directly with mesa-objectives is unavoidable. However, I would prefer to be agnostic about that for the time being.

In a comment to my previous post on this topic, Rohin responded to a related concern:

How do we ensure the model generalizes acceptably out of distribution?

Part of the problem is to come up with a good definition of "acceptable", such that this is actually possible to achieve. (See e.g. the "Defining acceptable" section of this post, or the beginning of this post.)

So we see this idea of "acceptability" crop up again -- still not well-represented by my graph.

Blue Lines:Inner Robustness and Capability Robustness

Inner robustness implies capability robustness, because we know there's a goal which the system performs well on in a broad variety of circumstances. (Inner robustness just tells us a bit more about what that goal is, while capability robustness doesn't care.) 

Capability robustness sort of implies inner robustness, if we assume a degree of agency: it would be pretty strange for the system to robustly pursue some other goal than its mesa-objective.

However, these implications require the presence of an inner optimizer. In particular, capability robustness obviously won't imply inner robustness in the absence of one of those.

Inner Alignment and Objective Robustness

Evan argued that inner alignment implies objective robustness. This argument requires that the agent is capable enough that its behavioral objective will match its mesa-objective, even under distributional shift.

We could also argue in the other direction: if something is behaviorally aligned with the base objective in a broad variety of circumstances, then (again assuming sufficient agency), surely it must not have a misaligned objective.

Again, these implications only make sense if there is a mesa-objective.

On-Distribution Alignment and Outer Alignment

Outer alignment implies on-distribution alignment trivially. On-distribution doesn't imply outer alignment by any means, but the pseudo-equivalence here is because outer alignment doesn't matter beyond the influence of the base objective on training; so, at least for prosaic AI, outer alignment shouldn't matter beyond on-distribution alignment.

Equating Pseudo-Equivalences

If we collapse all the pseudo-equivalent subgoals, we get an and-or graph which looks quite similar to the one we started out with:

Other remarks:Definition of "Alignment"

I've used the term "aligned" in several definitions where Evan used more nuanced phrases. For example, inner alignment:

  • Evan: A mesa-optimizer is inner-aligned if the optimal policy for its mesa-objective is impact aligned with the base objective.
  • Me: A system is inner-aligned if it has a mesa-objective, and that mesa-objective is aligned with the base objective.

Evan's definition seems more nuanced and useful. It puts some gears on the concept of alignment. It averts the mistake "aligned means equal" (if humans want to drink coffee, that should not imply that aligned robots want to drink coffee). It captures the idea that goal alignment has to do with high levels of performance (we don't want to label something as misaligned just because it makes dumb mistakes).

However, I'm not confident that the details of Evan's locutions are quite right. For example, should alignment be tested only in terms of the very best policy? This seems like a necessary condition, but not sufficient. If behavior is severely misaligned even for some very very high-performance policies (but technically sub-optimal), then the alignment isn't good enough; we don't expect training to find the very best policy.

So, I think it better to remain somewhat ambiguous for this post, and just say "aligned" without going further.

Other Nuances

Generally, I felt like if I had chosen more things to be careful about, I could have made the graph three times as big. It's tempting to try and map out all possible important properties and all possible approaches. However, the value of a map like this rapidly diminishes as the map gets larger. Which things to make perfectly clear vs muddy is a highly subjective choice. I would appreciate feedback on the choices I made, as this will inform my write-up-to-come. (This post will resemble the first major section of that write-up.)

Also, I'm not very committed to the terms I chose here. EG, using "behavioral alignment" rather than "impact alignment". I welcome alternate naming schemes.


Wanted: Foom-scared alignment research partner

26 июля, 2021 - 22:23
Published on July 26, 2021 7:23 PM GMT

Hi, my name is Mack Gallagher, I'm 20 years old, I have ZERO familiarity with computer architecture or [math or computer science technical jargon] [although I'm OK at reasoning], I have never attended a prestigious university, and I currently live in a 15,000-person meatpacking town, where I pack meat. I am terrified of impending foom as the most likely cause of my early death, I think my brain needs regular active reinforcement from [the presence of] people who share my beliefs-and-affect-toward-them to fuel sustained positive action on my part, and I am tired of sitting around waiting to die.

Are you, like me, 

  • Consistently apprehensive of impending foom
  • Allergic to high-status AI risk discussion
  • Currently doing something completely unrelated to AI for a day job
  • Tired of sitting around waiting for death

and preferably:

  • 15-25 years old
  • Low-status [have never attended a prestigious university]


My email is magallagher00@gmail.com. Please email me or message me on LessWrong. Maybe we can be less useless together.


What does knowing the heritability of a trait tell me in practice?

26 июля, 2021 - 19:29
Published on July 26, 2021 4:29 PM GMT

The concept of heritability gets misunderstood a lot, so there are several articles discussing what it doesn't mean. But reading through all of them leaves me confused about what it does mean in practical terms, outside the technical definition.

For example, as I myself wrote in an old comment about common misunderstandings:

Caution: heritability, as in the statistical concept, is defined in a way that has some rather counter-intuitive implications. One might think that if happiness is 50% heritable, then happiness must be 50% "hardwired". This is incorrect, and in fact the concept of heritability is theoretically incapable of making such a claim.

The definition of heritability is straightforward enough: the amount of genetic variance in a trait, divided by the overall variance in the trait. Now, nearly all humans are born with two feet, so you might expect the trait of "having two feet" to have 100% heritability. In fact, it has close to 0% heritability! This is because the vast majority of people who have lost their feet have done so because of accidents or other environmental factors, not due to a gene for one-footedness. So nearly all of the variance in the amount of feet in humans is caused by environmental factors, making the heritability zero.

Another example is that if we have a trait that is strongly affected by the environment, but we manage to make the environment more uniform, then the heritability of the trait goes up. For instance, both childhood nutrition and genetics have a strong effect on a person's height. In today's society, we have relatively good social security nets helping give most kids at least a basic level of nutrition, a basic level which may not have been available for everyone in the past. So in the past there was more environmental variance involved in determining a person's height. Therefore the trait "height" may have been less hereditary in the past than now.

The heritability of some trait is always defined in relation to some specific population in some specific environment. There's no such thing as an "overall" heritability, valid in any environment. The heritability of a trait does not tell us whether that trait can be affected by outside interventions.

Some articles that go deeper into the details and math of this include "Heritability is a ratio, not a measure of determinism" (dynomight.net) and "Heritability in the genomics era - concepts and misconceptions" (Nature Reviews Genetics).

However, all of these examples of what heritability doesn't mean have left me very confused about what it does mean. I know that if a trait is 80% heritable, I cannot determine that it is "80% genetically determined", but what can I determine? That 80% of the observed variance in that trait is genetic, yes, but what's the practical thing of interest that having this information allow me to predict, that I couldn't predict before? In particular, what does knowing the heritability of traits such as IQ, subjective well-being, or Big5 scores tell me?

Looking at the Wikipedia article for heritability, I see very little that would help answer this question; the closest that I can find is the "controversies" section, which says that there are people who think the concept shouldn't be used at all:

Heritability estimates' prominent critics, such as Steven Rose,[27] Jay Joseph,[28] and Richard Bentall, focus largely on heritability estimates in behavioral sciences and social sciences. Bentall has claimed that such heritability scores are typically calculated counterintuitively to derive numerically high scores, that heritability is misinterpreted as genetic determination, and that this alleged bias distracts from other factors that researches have found more causally important, such as childhood abuse causing later psychosis.[29][30] Heritability estimates are also inherently limited because they do not convey any information regarding whether genes or environment play a larger role in the development of the trait under study. For this reason, David Moore and David Shenk describe the term "heritability" in the context of behavior genetics as "...one of the most misleading in the history of science" and argue that it has no value except in very rare cases.[31] When studying complex human traits, it is impossible to use heritability analysis to determine the relative contributions of genes and environment, as such traits result from multiple causes interacting.[32] In particular, Feldman and Lewontin emphasize that heritability is itself a function of environmental variation.[33] However, some researchers argue that it is possible to disentangle the two.[34]

The controversy over heritability estimates is largely via their basis in twin studies. The scarce success of molecular-genetic studies to corroborate such population-genetic studies' conclusions is the missing heritability problem.[35] Eric Turkheimer has argued that newer molecular methods have vindicated the conventional interpretation of twin studies,[35] although it remains mostly unclear how to explain the relations between genes and behaviors.[36] According to Turkheimer, both genes and environment are heritable, genetic contribution varies by environment, and a focus on heritability distracts from other important factors.[37] Overall, however, heritability is a concept widely applicable.[9]

Out of those references, the one that sounded the most useful in telling me what heritability might actually mean was the one associated with the sentence "Overall, however, heritability is a concept widely applicable". This is the previously mentioned "Heritability in the genomics era - concepts and misconceptions" (Nature Reviews Neuroscience), which includes a section on "applications":

The parameter of heritability is so enduring and useful because it allows the meaningful comparison of traits within and across populations, it enables predictions about the response to both artificial and natural selection, it determines the efficiency of gene-mapping studies and it is a key parameter in determining the efficiency of prediction of the genetic risk of disease.

From reading this section, I gather that:

  • If I wanted to breed plants or animals that were high on a particular trait, having the heritability estimate for that trait could be useful
  • The heritability of a trait can be used to help infer how much statistical power gene-mapping studies targeting that trait need
  • If I was trying to predict the genetic risk of something like schizophrenia, then... I don't quite understand this part, but apparently having the heritability estimate would help me know how reliable my prediction was going to be

Usefulness for breeding programs is hopefully an irrelevant consideration when we're talking about humans, which leaves me with the two others; and those also seem to suggest that knowing the heritability of a trait isn't useful on its own, and will only be something that helps me do or evaluate a gene-mapping or genetic risk prediction study better.

This seems to suggest that knowing the heritability of a trait such as IQ, subjective well-being or a Big5 score tells me essentially nothing by itself; is this correct?

(cross-posted to the Psychology & Neuroscience Stack Exchange)


Is there a theoretical upper limit on the R0 of Covid variants?

26 июля, 2021 - 09:53
Published on July 26, 2021 6:53 AM GMT

Looking at the Wikipedia article on basic reproduction number, it looks like the most contagious virus (as of right now) is measles with an R0 (high estimate) of 18. I'm wondering if there is some asymptotic limit to how contagious viruses can get, and maybe measles is close there? If there is or isn't, do we have any ideas as to what biological mechanisms are related to this? 

I'm asking because it seems as the new variants emerge, it would be wise to aware of the worst case scenario. Just making a rough plot of the Covid rows in the basic reproduction article, and a rough stab on when they emerged, the wild-type, Alpha, Delta trend looks like this. 

Wild-type, Alpha and Delta R0s (upper limits) and approximate dates when they emerged

From Tomas Pueyo's excellent thread, it looks like as Covid gets more transmissible it's likely to get more deadly.

In 20 months Covid has went from an R0 (again, high estimate) of 3.4 to 8. I'm not an epidemiologist, but that seems like a really big jump in context. To be fair, again, I'm using the high estimates, but the general trend is concerning and I would sleep better knowing if there was some biological reason it would plateau.  

A scary, but I'm not sure how likely, scenario would be if another 20 months out (March 2023 or so), assuming there's large enough sections of the world where Covid is still spreading (either because vaccines haven't gotten there, or new vaccines are required for breakthroughs, or vaccines/infection immunity declines too quickly, etc), is there any reason to think it wouldn't jump another ~4.6 to ~12 or higher? Could it go above 18? 

 I'm just surprised I'm not seeing more discussion about this. Maybe I've missed it?


The biological intelligence explosion

26 июля, 2021 - 06:46
Published on July 25, 2021 1:08 PM GMT

Summary:  Human genetic engineering could lead to intelligence enhancement that leads to genetic engineers who are better at genetic engineering (and research on pathways to improving intelligence).  Which leads to a continuing process of greater and greater intelligence. This iterative process would be a human intelligence explosion.

There’s a view that AI will hit a point where it begins an intelligence explosion: an AI system will be designed that is better at designing AI systems than its designers were.  As such, it will be able modify its own design such that it, or a second generation version of it, will be created that is superior to it.  And this next version will thus be sufficiently advanced that it can create a more advanced version.  Etc.  You end up with an iterative process whose next generation progress is based on its current state, and as such an exponential growth, at least until some limiting factor is reached.  Hence, intelligence explosion.

This seems like a possible outcome, though the absolute rate of change isn’t clear.

But aside from computer intelligence, there’s another pathway to intelligences improving their own design: humans.  With current genome reading technology we are identifying a myriad of genes related to intelligence.  While each individual gene gives only a small effect, the interplay of many (hundreds) of such genes can be shown to have very large effects on IQ.

While gene-therapy is in it’s early stages, it’s a current emerging technology that’s undergoing rapid progress.  It is currently difficult to modify even individual genes in adult organisms: there are off target effects to worry about, it’s not possible to deliver the genes to every cell, the immune system will attack the viruses used for gene delivery, etc.  But there is already progress in solving all of these problems.  It’s not crazy to think that within a couple of decades we may be able to safely alter dozens or hundreds of genes in adult humans, and if not 100% of cells, a high enough percentage for effective therapy.

If we imagine such a world, we can see researches making use of such treatments to improve their own intelligence.  This then, can lead to clearer thinking, and more creativity, and both more and better ideas.  Which then could lead to the next wave of breakthoughs, perhaps in which genes to alter or in other avenues to improve intelligence.  And as those were developed and then implemented, said researchers reaping that benefit could then use their newly augmented intellect to iterate the next advances…

A biological intelligence explosion.

It would likely be much more limited than the AI intelligence explosion.  Human brains are constrained in various ways (size being an obvious one) that computers are not.  And AI could completely start from scratch and use a completely new sort of computing substrate in it’s next design, but that likely wouldn’t be an option for our human researchers who are manipulating already existent, living (so you don’t want to do something that risks killing them) human brains.  Nevertheless, even within the constraints, there still seems to be a lot of room for improvement, and each improvement should make the next one more likely.

Of course, maybe this will simply be taboo and not be done.  Or maybe AI will come along first.

But then, maybe not.

Now, there’s a question of whether or not worries about the emergence of an artificial super-intelligence might be mirrored in analogous worries about a resulting biological super intelligence.  I think that those worries are at least mitigated, though not  resolved, in this case, for a few reasons:

  1. As stated above, biological intelligence faces some hard to overcome constraints, like brain size given the human body, and the specific substrate of computation used being neurons.  These constraints seem unlikely to be overcome and thus impose hard limits on the max. Progress of a biological intelligence explosion.
  2. The Alignment problem is difficult in part because an AI system will be so alien to us.  Humans, on the other hand, are at least capable of understanding human values.  While this doesn’t mean that enhanced human intelligences will necessarily be aligned with unenhanced humans, it does mean that the problem may be more tractable.

However, that said, there still seem to be reasons for concern.  While there are hard limits on human intelligence, we don’t quite know where they are, and evolution certainly hasn’t reached them.  This is because the constraints faced in our ancestral environment have been severely loosened in a modern context.  Energy use, for instance, was a major constraint, but food today is very cheap and a brain using even 10 times as much energy could easily be supplied with enough calories for it’s computational work.  If that energy use reached 100 times current usage it might requite major changes to other organ systems, but that seems like a feasible late stage development in our intelligence explosion.  Survivability in our ancestral environment was also contained heavily by locomotion, but this is a much weaker constraint today.  So brain size, for instance, could get much larger before reaching a fundamental limit.  There are other things like which tasks brains are specialized for that could similarly be improved.  Mathematical aptitude, for instance, probably didn’t undergo very strong selection in the past but could be strongly favoured if it was seen as useful.  Etc.  All this suggests that while human intelligence would likely reach a limit far before AI did, that limit is quite far from the current level.

Similarly, while the alignment problem may be more tractable in humans, it’s certainly not solved.  We have elaborate political systems because we don’t simply trust that our neighbors share our goals, so it seems there’s little reason to assume that the super-intelligent would share the goals of the rest of society in general.  Moreover there is an actually harder problem with human-super-intelligence than with machine-super-intelligence, and that is that even at the beginning of the process we have no access to the source code.  There’s not chance to try to make sure the “machine’ (ie people) is aligned with us from the beginning.  To some extent it may be possible to do this with regulatory oversight of the enhancement process, but this seems a cruder tool than actually designing the system from scratch.

For these reasons I think there are similar concerns with a human intelligence explosion as have been discussed regarding an AI intelligence explosion.


When writing triggers memory reconsolidation

26 июля, 2021 - 01:10
Published on July 25, 2021 10:10 PM GMT

(Cross posted on my personal blog.)

Last night I read the post Working With Monsters. My response? "Holy shit. This post is amazing and I needed to hear it."

The idea in the post is that you have to be able to work with people who you find morally abhorent. You can't just start a war every time someone believes in a god you don't like. Oh wait... I guess you can.

But you shouldn't! And even before reading the post, I had that stance. If you asked me, I'd have said "of course". So why is it that I found Working With Monsters to be influential?

Well, even though I believed it before, I still felt something "snap into place" after reading the post.

Think of it like this. My mind is composed of subagents. And so is yours.

It's like there are different people living inside your head. In the above image from Wait But Why, the author identifieds the Higher Mind and the Primitive Mind. In reality there are many more "people"/subagents than that living in your head, but let's go with this for now.

When I felt it "snap into place", I think what happened was that my Primitive Mind finally "got it". Higher Mind already understood it, but Primitive Mind did not.

What do I mean by "got it"? I have memory reconsolidation in mind. Here's a hand wavvy description of how that works.

  1. When you initially form a memory/belief, like "you're a terrible person if you work with monsters", it "solidifies". It becomes "locked in". It consolidates.
  2. If you want to update it, you have to first "unlock it". You do this by really zeroing in on it. Making it extremely salient. "Re-living it", if you will.
  3. Once the memory is unlocked, it is available to be updated. In order to update it, you have to basically convince the subagent, eg. Primitive Mind, that the initial memory is false and instead something else is true. Picture a lawyer making an argument to a judge.
  4. After the memory is updated, it becomes "locked in" again. It reconsolidates. Now if you wanted to update it a second time, you'd have unlock it again.

Let's get back to the blog post Working With Monsters. I think that what happened to me last night is that the blog post successfully performed steps two and three on my Primitive Mind subagent. For step two, it painted a picture that made things salient enough for me. For step three, despite being fictional, I felt like it still implicitly was making an argument, and I found that argument to be highly convincing.

More generally, I suspect that a lot of good writing has to do the same thing: perform steps two and three successfully.

There are of course exceptions. Sometimes good writing triggers you to form a new belief instead of updating an existing one, for example. And even if you successfully trigger reconsolidation, you still want it to be about something important. If a blog post made me realize that cheerios taste more nutty than carb-y, that isn't an important enough thing. Robin Hanson has a post Write To Say Stuff Worth Knowing, and I don't think the point about cheerios meets that bar.

But don't let these caveats detract from the main point. I suspect that a lot of good writing is good in large part because it triggers memory reconsolidation. And if so, writers can take advantage of this.

How? Probably by spending more time on step two. I'm not sure what techniques/patterns would work well for step two though. Would it be enough to just say, "let's review the arguments for this belief that a lot of you probably believe right now"? Or maybe you need to take things further and come up with examples? Would they need to be compelling, real world examples? Or maybe examples aren't enough, and you'd need to tell stories in order to make things salient enough? I'm not sure. It probably depends a lot on the situation.

Here's another thought. Scott Alexander has a post called Non-Expert Explanation where he argues that it is helpful to have lots of different people take a stab at explaining something rather than everyone reading the same post by some expert. Because sometimes that post by the expert doesn't do it for you. You need to hear it from a different perspective in order for it to "click". I'm not sure, but I weakly suspect that the reason for this is related to the stuff I'm talking about here with memory reconsolidation. In particular, I expect that there's a decent amount more individual differences in step two than in step three. Activation vs updating. I think a convincing argument is going to be convincing to a wide variety of people, but for the part about unlocking the old memory, I expect that to depend more on the person.


Can I teach myself scientific creativity?

25 июля, 2021 - 23:15
Published on July 25, 2021 8:15 PM GMT

Textbooks and classes are more than just curated collections of established knowledge. They also suggest what your goal should be:

  • Read the textbook and go to lecture
  • Take notes, ask questions when you feel confused, and try to understand everything
  • Solve the assigned problems for which the material equips you
  • Do practical projects and labs for hands-on experience

These are all instrumental goals, pointed at the terminal goal of becoming familiar with the material.

In real-world scientific research, scholarship has a different terminal goal, which is to let you design and carry out useful experiments. Familiarity with the material is just an instrumental goal that is supposed to contribute to that end.

Since useful research is about carrying out novel experiments to chip away at unsolved problems, it's generally going to be harder to say which specific bits of knowledge are going to be most useful to help you become a better experimenter. Perhaps you'll hit on a great idea for an experiment, or have a novel theoretical insight, and be able to trace back which bits of prior knowledge were necessary and sufficient to achieve this illumination.

Unfortunately, now that you've had your eureka moment, the connection between that prior knowledge and your creative insight is no longer very useful. The causal connection between your prior knowledge and your eureka moment is a form of knowledge. Unfortunately, it's a form of knowledge that becomes obsolete as soon as it is created.

All you can do with it is bring others to the same eureka moment you just had, which is what a classroom education is for. You can't use the prior knowledge->eureka moment connection to teach other people how to crank out their own novel insights.

Yet I do believe that creativity can be taught.

I worked as a piano teacher for a decade before my turn into scientific research. Although I was classically trained, I looked for ways to teach improvisation and songwriting to interested students. Occasionally, I would find two types of mechanical methods for developing their creativity:

  1. Processes for improvising that force students to make creative decisions, even when they don't have any clear ideas.
  2. Tools that let students generate clear ideas, even when they're not ready to make any creative decisions.

For example, when I taught songwriting, I created a songwriting process (verse/chorus structure, figuring out a chord progression first, then writing lyrics, then singing them while playing the chords). No particular chord progression is better than any other. You just have to choose some arbitrary chords. Any lyrics are fine, and any melody. It would be fine to generate every facet of the song via a random number generator. It's only important that there be some content.

I also discovered a set of tools for when a student just couldn't come up with any content for the songwriting process. I'd ask them for basic attributes of a character for their lyrics (gender, name, hair color, mood). Then I'd ask them to describe a problem that character had. Finally, we'd do the Five Whys exercise. Why does the character have that problem? Well, why are they in that situation? And why did that happen? Why? Why?

Then we'd just write a song that communicated that information in the lyrics. This worked great. Between the basic songwriting process and tools like this, we never had a day when my student couldn't write at least one song during her lesson.

Occasionally, we would engage in something akin to scholarship. This might involve listening to a song from the radio, reading through the lyrics, and discussing what they mean. Or we might learn a bit of music theory. Sometimes, she would learn songs from the radio by ear. I believe that these activities were also helpful and necessary on some level. But I also never tried very hard to directly connect these "scholarly" activities to her creative songwriting, in the sense of "now that you've learned X, you can create Y." When I tried, the results were almost always lackluster.

Hypothetically, it seems like we could approach scientific creativity in a similar fashion. We'd try to come up with a reliable process for project planning, where literally any experimental procedure is valid. To complement it, we'd want a set of tools to generate more interesting ideas for the project planning process.

Scholarship would need to happen alongside the creative side of the work. But analogously to my work with my piano students, I wouldn't try to force a connection between the book learning and the creative aspects of my work.

This seems like a reasonable strategy. I'll optimize for scholarly learning and experimental creativity separately. The former I'll approach using the suite of methods I've already developed for classroom learning. The latter will require developing  some new processes and tools for scientific creativity. I won't worry too much about trying to force a connection between these two aspects of my relationship with science, except by tailoring my choice of scholarly topics to align with my creative work. Just as with my piano students, it'll be most important to keep this fun, creative, and sustainable over the long-term.


When Arguing Definitions is Arguing Decisions

25 июля, 2021 - 19:45
Published on July 25, 2021 4:45 PM GMT

(epistemic status: experimental new format! Optimized for memetic power. Fun and useful refactorings of classic ideas about language.)

(note: this post was originally made as a slide deck and lives as a pdf here. Color coding of ideas was inspired by abramdemski and turntroat. Since this is a bunch of images, the links don't work, and I've collected them all at the bottom of the post)



  1. "Love that energy Jaynes!"
  2. Necessary and Sufficient 
  3. Family Resemblance 
  4. Words as Hidden Inferences
  5. How an Algorithm Feels from the Inside
  6. A Human's Guide to Words
  7. View from nowhere
  8. "Al Capone has a point"
  9. Reality tunnels
  10. Blind Men and the elephant
  11. Formal Logic
  12. Implication
  13. Proof trees
  14. Rwandan Genocide
  15. Radio address given on April 30th, 1994
  16. Ghosts of Rwanda
  17. Machete Season
  18. "I didn’t succeed in tracking down the original docs, but this interview has a lot of context and quotes that lay out a pretty solid case."
  19. "From the Genocide Convention of 1948, there are several more articles specifying things like how international courts are supposed to work, and what “punishment” entails."
  20. General Romeo Dallaire, on the ground in Rwanda.
  21. "Just a year earlier the U.S had been badly burned with an attempted intervention in a Somalian civil war."
  22. Conflict Is Not Abuse


Come Build Affordable Housing!

25 июля, 2021 - 17:10
Published on July 25, 2021 2:10 PM GMT

With the new Affordable Housing Overlay District ( ordinance 2020-27) here in Somerville, affordable housing construction can be very profitable. For example, below I estimate someone could spend ~$220/sqft and sell for ~$550/sqft, all while helping fix the housing crisis. So: come build!

The new affordable housing rules, which went into effect in December, allow higher density construction on the condition that the housing be permanently affordable. This is solid politics: affordable housing is very popular here. Below, I'm going to try to estimate how much you could make if you invested in creating additional housing as efficiently as possible.

The first question is how much you would be able to earn from creating additional units. The ordinance defines "affordable dwelling unit" (ADU) as a "dwelling unit sold, leased, or rented at a price affordable to a specific household income specified by this Ordinance or other Federal, State, or local affordable housing program." (2.1) For expediency I'm just going to look at the definition of "affordable to a specific household income" in Somerville zoning, and not also dig into Federal/State regulations. This means there's potentially an option for less restrictive pricing that I'm not getting into.

Section 12.1 sets caps on rents and purchase prices defined by three tiers of affordability:

There are normally requirements for how many of each tier you are required to build, if you're building ADUs as part of a market rate development, but 8.1.3.e says "development subject to this section is exempt from Section 12.1 Affordable Housing". If you're trying to maximize return, you build units entirely targeted at the highest tier.

The next step is to convert these multipliers into a maximum monthly payment. The Median Family Income for Boston-Cambridge-Quincy is $120,800 for 2021, so this gives us:

studio 1br 2br 3br 4br 5br 6br 7br 8br Rental $1422 $1739 $2041 $2343 $2645 $2947 $3249 $3551 $3853 Sale $1805 $2190 $2556 $2923 $3289 $3656 $4022 $4389 $4755

Since it looks like the sale numbers are a bit more favorable, let's assume the units will be sold. The maximum sale price is 103% of the size of the mortgage you could get with the maximum monthly payment. Since the current rate is 2.78%, per FRED, we have:

studio 1br 2br 3br 4br 5br 6br 7br 8br Price $454k $550k $642k $735k $827k $919k $1,011k $1,103k $1,195k

Note that mortgage rates are historically low right now, and higher rates give lower sale prices. For example, at 5% a 2br would have to sell for $490k, down 24% from $642k.

These are still more expensive than I'd like, but they're much cheaper than anything on the market today, let alone anything in the excellent condition that new construction would be.

Working from the table you can see how much units of each size are allowed to sell for. Since this is the same all over the city, you make the most by building where land is cheapest, but as you'll see below land costs are actually a very small portion of the total.

The next question is, how many units can you build and how large are they? The big incentive in the affordable housing overlay is that you can build much taller. This gain is largest for Mid-Rise 3 (MR3), where you can build seven stories instead of three, as long as your property does not abut anything zoned Neighborhood Residential (NR), and Mid-Rise 4 (MR4) where you can build seven stories instead of four regardless. Abutting is "to physically touch or share a contiguous boundary or border, such as a common lot line, or to be separated only by an alley or shared driveway," which ends up excluding almost every MR3 lot in the city. So let's look at MR4.

You are allowed to build either an apartment building or a general building (with 40% residential and the rest from a list of approved uses). Since there are no restrictions on rents for the non-residential portion of a general building you very likely earn more if you build something that's 60% commercial (health care or daycare) and 40% residential, especially in a high-demand area. I'm going to ignore this, though, and assume you just build ADUs.

There are small setback restrictions, but I think in most cases the limiting factor will just be lot coverage. For Mid-Rise 3-5 this is 90%. If a lot is 5k sqft (say, 50x100) then seven stories at 90% coverage gives you 32k sqft. This is a lot!

If you follow the requirements for a Net Zero Ready Building you're allowed a minimum space per unit of 850 sqft. This is "has no on-site combustion for HVAC system operation and cooking equipment (all electric systems) [...] and is certifiable as Zero Carbon or higher from the International Living Future Institute, or PHIUS+ from the Passive House Institute US." Otherwise you need 1,125 sqft for larger lots and 1,500 for smaller ones.

Let's work two example lots, one small and one large, and see how this works out.

230 Pearl St is a small single-family that Zillow estimates at $1M:

It's on a 2380 sqft (34x70) lot zoned MR4:

There's a requirement for a 10ft rear setback, which on a 70ft lot has a larger effect than the 90% max coverage requirement. This won't be the case for most lots in the city, but I chose a tiny one here. A 34x60 building at the front of the lot would give 2040ft per floor. Lose 64 sqft from each floor for a pair of staircases and another 20 for a small elevator and you get seven floors of 1956 sqft each. Build each floor two 2brs (978 sqft each) and the city allows you to sell them as $640k condos. Note that you do not need to provide any parking. With revenue of $9M and a purchase cost of $1M, break-even cost on design, permitting, demolition, construction is $557/sqft. Even if you sold them at Tier 1, $345k/unit, you'd still be profitable, with a break-even cost of $268/sqft.

Taking a larger example, 67 Broadway is a house that's been converted into a marijuana dispensary, with a parking lot:

It's on a 12,000 sqft lot zoned MR4:

No Zillow estimate, but it's assessed at $1.2M. Let's say $4M to be cautious. With lot coverage, stairs and a 40sqft elevator this is leaves 10,900 sqft for each floor. Seven floors with twelve 2br units on each floor (850 sqft each) and you have 84 units with 700 sqft per floor for hallways. You can sell each unit for $640k, so $54M revenue, $4M purchase, leaving a break-even cost of $661/sqft.

Housing construction here is typically $150 - $200 per sqft and demolition is ~$10/sqft (on a much smaller structure). This leaves a lot of room for me being too optimistic and you still walk away with a substantial profit.

One way to find properties that might be a good fit is to look for parcels zoned MR4-MR5 with a large lot area for their valuation. While valuations aren't all that accurate, they're still reasonably correlated with what a sale would be. I made a table of these parcels, sorted by valuation per squre foot of land: parcels.tsv. While there are definitely lots it identifies that wouldn't work, it also has many that would.


Academic Rationality Research

25 июля, 2021 - 16:59
Published on July 25, 2021 1:59 PM GMT

There are now at least two academic research groups on rationality, in the sense we use the word in here, in Germany that seem to be little known in the US rationality community. The point of this post is telling you they exist if you didn't already know.

There's the Rationality Enhancement Group lead by Falk Lieder in the Max-Plank Insititute in Tübingen and there's a group on Adaptive Rationality in the Max-Plank Insititute in Berlin.

When Falk Lieder was in at our European community weekend he repeatidly said that he's interested in collaborating with the wider rationality community. There's a list of publications of his group and also a Youtube channel that presents a few ideas. 

The adaptive rationality group has decided to speak of rationality techniques we would likely call applied rationality techniques as "boosting decision-making" in contrast to the academic literature on nudging. I think it's worth exploring whether we should also apply their term for the cluster of techniques like Double Crux. 


Jack's Productivity Potpourri

25 июля, 2021 - 15:52
Published on July 25, 2021 12:52 PM GMT

[Read bolded words to skim]

“Things 3 is great [...] you should do a writeup for LessWrong and EAF" 

--Kuhan Jeyapragasan, Stanford EA President

Below are a bunch of tips, systems, and devices for improving productivity. I don't mean to claim I know a lot about productivity—I think there is likely a lot of useful advice I am missing. If you are new to thinking about productivity, a lot of this stuff might be useful, though it is probably better to instead aim for the mindset that can generate these tips and habits for yourself. For this, I recommend attending a Center for Applied Rationality workshop, reading the CFAR handbook, checking out Neel Nanda's blog, and/or reading Rationality: A to Z (podcast form here).

My recommendations are in very rough order of how much I recommend them, based on how excited I am/would expect someone else to be to know about the tip (my top recommendation being Things 3 + Apple Watch). Though I didn't try that hard to order things in this way.

Table of Contents
  • TODO List: Things 3 + Apple Watch
  • Computer Restrictions: Cold Turkey
  • Sleep: Tips
  • Exercise: Tips
  • Phone Restrictions: ScreenTime or Freedom
  • Reading list: Instapaper
  • Device: Computer Monitor
  • Internet speed: Ethernet cable and good internet service
  • Device: Mouse
  • Miscellaneous tips
  • Tracking device usage: RescueTime or Toggl or Forest
  • Useful programs/apps/sites
  • Useful browser extensions
  • Device: MacBook (borrow first)
  • Password Manager: Dashlane (free for Stanford) or NordPass (free?)
  • Other products


TODO List: Things 3 + Apple Watch
  • Things 3 (ETA: Mac required, making me recommend a Mac more than I would otherwise)
    • Can add todos, blog post ideas, projects ideas, any other useful notes from my apple watch with my voice very easily through my watch (takes two watch screen taps)
    • On the computer, can add stuff to your todo list with a keyboard shortcut from any app
      • E.g. when reading an email, I can press ctrl + space, and this pops up in the middle of my screen. There is even a hyperlink to the email in the description of the todo
    • Move TODOs to specific dates in the future such that they appear in your inbox on that date
      • I use this at least once per week, surprisingly useful
      • Useful for things like things you want to apply to but the application doesn't open for months, people you want to reach out to when traveling to a new place, bumping an important email after a week if there is no reply, etc
    • Have lists for things other than todos
      • Mistakes or bugs
      • Blog post ideas
      • Project ideas
      • People to talk to
      • Things to learn
      • "Someday" todos
  • Apple Watch (ETA: iPhone required, making me recommend an iPhone more than I would otherwise)
    • Be reminded of all calendar events with wrist buzz
    • Add to-dos, blog post ideas, project ideas, etc. easily with voice
    • See to-do list from watch
    • See the time
    • Easy access timer
      • Laundry timer
      • Time blocking tasks
    • Music/podcasts/books on the go, during exercise
Computer restrictions: Cold Turkey
  • You can create "Blocks" which are sets of websites and programs to block
  • For each block, you can create a schedule
  • For each block, you can create a system for turning off the block
    • Default — just turn it off
    • Password
    • Type in N random characters
      • Surprisingly effective, allows you to create barriers of variable lengths to accessing certain websites and programs
  • My computer blocks (ask me and I can email you an importable file)
    • Communications Block (5 random chars, usually unlocked)
      • Facebook Messenger app and site
      • Mail app
      • Slack app and site
      • iMessage app
    • Distractions (20 random chars, usually unlocked)
    • EA Forum and LessWrong (10 random chars, unlock ~once per week)
      • EA forum
      • Lesswrong
    • Facebook (unlocked 7-10 am, 10 random chars)
      • Facebook
    • Youtube and Games (250 random chars)
      • Youtube
      • Addicting Games
      • Sudoku
      • Chess
      • Skyrim
      • Minecraft
      • Other computer games
    • Non-FB Social Media (unlocked 7-10 am, 50 random chars)
      • Twitter
      • Instagram
      • LinkedIn
      • Reddit
      • Tiktok
      • Tumblr
      • GroupMe
    • Miscellaneous
      • Anything else that distracts you
Sleep: buy anything that can improve your sleep
  • Silicone earplugs
  • Eye mask
  • Fluffy Pillows
  • A duvet or nice comforter
  • Mattress topper
Exercise: Find a fulfilling form of exercise that you will do consistently
  • Try lots of different forms of exercise
    • Game with friends (basketball, soccer, frisbee)
    • Dancing to music alone
    • Beat saber
    • Rock climbing
    • Ultimate frisbee
  • Try only listening to your favorite podcast when you exercise
  • Try buying things to make exercise more likable
    • VR headset for Beat Saber or other exercise games (borrow first)
    • Running shoes that you think look really good (maybe somewhat works, not sure)
    • Very fast swimming suit
    • Basketball shoes and nice basketball
    • GameCube for Dance Dance Revolution
    • Hula hoops, weighted juggling set, cyr wheel, balance board, unicycle
Phone restrictions: Screentime (iOS) or Freedom
  • My system
    • I don’t know the ScreenTime passcode, and it is written in a notebook that I never look at
    • When I need to edit the ScreenTime settings, I have someone else type in the passcode and show them the notebook
    • App store blocked
    • Deleted all apps except utilities, reading list, and podcasts
      • Calendar
      • Todo list (Things 3)
      • Safari (heavily restricted)
      • Google Maps
      • Uber
      • Instapaper (reading list)
      • Podcast App
      • Uber Eats
      • Financial apps
      • Audible
      • Music
      • Other utility apps
    • Blocked all websites except necessities
      • Wikipedia
      • Google searches
      • Health-advice websites
      • Car repair websites
      • Restaurant review websites
      • Airline related websites
      • Financial websites
    • Advice: when traveling or doing anything other than very routine work-life, unblock everything
Reading list: Instapaper
  • Instapaper app, program, and extension
    • Easily add any URL to your reading list
    • Read from phone, iPad, computer
  • Read at meals
  • Instapaper is one of the only "fun" things on my phone
Device: Computer Monitor
  • Makes a number of tasks faster and easier
  • Less cluttered screen
  • According to a study (sponsored by a monitor company), people using a 24-inch screen were 52% more productive than those using an 18-inch screen (link).
Internet speed: Ethernet cable and good internet service
  • Ethernet cable for computer plugged into your router
  • Get better internet service — I've never actually bought service, so would be happy for someone to give a recommendation
Device: Mouse
  • Significantly faster than a trackpad (for me)
  • Even if it isn't at first, it should eventually be faster I think?
Miscellaneous tips
  • Using bookmarks and bookmark folders in google chrome
  • Move your phone to a hard to reach place to make it less distracting
  • For tasks you are avoiding, instead of doing the task, just think about doing it in detail
    • Eg "First I would find webpage A, contact information B, write the introduction sentence of an email, ..."
  • Accountability bets
    • Make commitments with your friends of the form "If I succeed/fail at X, then I have to/get to Y"
    • Try leveraging an irrational fear
      • Leveraging a rational fear means that the losing condition is actually bad, but an irrational fears coming true is likely not "actually bad"
      • Eg I found that making the losing condition be "eat a bowl of cereal" was much more motivating than "pay $300 dollars" because I am irrationally afraid of cereal
  • Your environment shapes your actions a lot
    • Have a different place for different activities:
      • Sleeping
      • Working
      • Eating
      • Reading
      • Socializing
    • Wearing different clothes when you work is an example of another thing that might help you be more in the productive mindset
      • Your brain infers what behavior it should employ based on the environment
      • "I'm wearing nice clothes must be time to work"
  • Vitamins
    • Iron, B12, Omega 3, creatine (might improve cognition, apparently. Feel free to source in comments)
    • Have a sugary multivitamin so you actually take all of your vitamins
    • Practice taking all of them at once
      • Faster and no time cost to adding a new pill
      • Though micromorts from the possibility of choking?
  • Shortcuts
  • Break reminders and mantras: Timeout
    • I display "rationality" mantras like
    • Display things like "have you stood up in the past 30 minutes?" "have you had water recently?"
Tracking device usage: RescueTime or Toggl or Forest
  • It can be really useful to see how your time is actually spent
  • For example, a year ago I realized I was spending 3-5 hours on messaging apps
    • Since then I have made a conscious effort to minimize time on those apps as much as possible
  • Or maybe you spend 45 minutes on social media per day
    • Imagine replacing that with conversations or a fun kind of exercise instead
Useful programs/apps/sites
  • Google Calendar
  • x.ai (instead of Calendly)
    • The free version of x.ai allows having multiple meeting types simultaneously
    • If you are willing to pay, Calendly or SavvyCal might be better, haven't looked into it much
    • Free version of SavvyCal also might be better than x.ai or Calendly, haven’t looked into it
  • iOS Automator App
    • For example, can make a keyboard shortcut to close/open all messaging apps
    • Or keyboard shortcut for opening a new google doc in your browser
  • RemNote (free version of Roam Research)
Useful browser extensionsDevice: MacBook (borrow first)
  • I'm still unsure if it is worth it
  • Things seem faster and smoother
  • I would not have switched to a Mac if someone hadn't given me one for free
  • I am now more in favor of using a Mac than I used to be, though still not sure
Password Manager: Dashlane or NordPass (free?)
  • Security: allows for all passwords to be long and different
  • Time and energy saved: some saved, probably not that much though
Other products


What are some triggers that prompt you to do a Fermi estimate, or to pull up a spreadsheet and make a simple/rough quantitative model?

25 июля, 2021 - 09:47
Published on July 25, 2021 6:47 AM GMT

I'm currently viscerally feeling the power of rough quantitative modeling, after trying it on a personal problem to get an order of magnitude estimate and finding that having a concrete estimate was surprisingly helpful. I'd like to make drawing up drop-dead simple quantitative models more of a habit, a tool that I reach for regularly. 

But...despite feeling how useful this can be, I don't yet have a good handle on in which moments, exactly, I should be reaching for that tool. I'm hoping that asking others will give me ideas for what TAPs to experiment with.

What triggers, either in your environment or your thought process, incline you to start jotting down numbers on paper on in a spreadsheet?

Or as an alternative prompt: When was the last time you made a new spreadsheet, and what was the proximal cause?


Failing safely is the anomaly

25 июля, 2021 - 07:56
Published on July 25, 2021 4:56 AM GMT

It's hard to view the Flat Earth Society as a civilizational success. I can't understand how their minds can believe this in a world where telephones can cross time zones. But to see my point, compare it to Boko Haram. Among their beliefs are that the world is flat and that water doesn't evaporate (Allah creates the rain each time). Also, their popularity and death counts exceed that of ISIS.

This isn't just because they're Muslim. If a group of Sunni Muslims murders lots of other Sunni Muslims, "religion" isn't a sufficient explanation. There are over three million Muslims in the US, this has been true since before 9/11, and they aren't constantly murdering people. 

This kind of evil self-destructive insanity happens for secular reasons too. The Hutu and Tutsi in Rwanda are so similar that people need to look at government-issued identification to find out which group someone belongs to, if they don't say themselves. This racial categorization was arbitrary from the beginning; Belgian colonizers measured nose shape and a couple other physical features, and assigned groups based on that. The sane thing would have been to reject the colonizer's absurd tool to keep them divided. Instead, the people in Rwanda internalized the division, eventually murdering each other until about a tenth of the population was killed (about a million). This was in 1994.

Sometimes the insane factions get political power. Lysenkoism is a theory of biology that in many points directly contradicts farmers' observations. It taught, for example, that seeds of the same type must be planted as close together as possible since members of the same class never compete, and that fertilizer is bad for plants. Enforcing Lysenko's methods wasn't the only reason that 30 million in Stalin's Soviet Union and another 30 million during China's Great Leap Forward died of starvation, but it was a major cause.

These were all as insane as the Flat Earth belief is today, and happened during the past century. Before the enlightenment this was more common; I don't have statistics on how many people were martyred or massacred because people thought that their perfect and powerful god needed humans to fix it's mistakes for it, or something else, but it sure happened a lot.

Eliezer talks about how some pseudoscience has appeared where people think personality or diet is determined by blood types. This is wrong of course, but it makes as much sense as racism or India's caste system. Blood types would have resulted in those instead if it was discovered a thousand years ago. "Observe how, whenever our blood mixes with that of each other, it remains peaceful, but when our blood mixes with our enemies, it fights us. Our enemies are evil to the individual cells."

"See why the races must not mix?"

I bet this would have happened even if the original scientists urged against it.

Humans have a tendency of destructive insanity. I don't think this can be fixed without superintelligent brain modification. But the great civilizational accomplishment of liberalism (in the freedom and tolerance sense, not left-wing policy) was to mostly limit this to individual harm. People might inject themselves with harmful drugs, go on stupid diets, or spend their money on church buildings with golden crosses, but they don't start inquisitions or execute people for using fertilizer. The Flat Earth society looks stupid trying to explain how the sun is above the horizon in some places but below it in others, but they aren't Boko Haram.

Even when a liberal society has a brutal conflict, people are far safer than usual. The Irish Troubles killed about 3,500 in a population of 1,500,000 during a three decade period; it was still very bad of course, but when compared to humans' normal behavior a 0.2% death rate during that much hatred is amazing. If enough people go crazy in the same way, a country with thousands of nukes might invade Iraq because they oppose weapons of mass destruction, but most of the Iraqi were left alive and not enslaved. Default human behavior is something more like: "they utterly destroyed all in the city, both men and women, young and old, oxen, sheep, and asses, with the edge of the sword" (Joshua 6:21).

The natural tendency of humans is to have insane beliefs and to murder others for not sharing them. When you look at how insane people often are now, keep in mind that this is still a major improvement.


A Contamination Theory of the Obesity Epidemic

25 июля, 2021 - 05:39
Published on July 25, 2021 2:39 AM GMT

This is a summary of a paper that I found open in a browser tab; I don't recall where I came across it. I think it's a nice paper, but it's also 63 pages long and seemed worth a synopsis for those who wouldn't otherwise tackle it.

Scott concluded in For, then Against, High-saturated-fat diets that the obesity crisis seemed to imply one of three answers:

  1. That weight less is really hard and people in previous centuries had really hard lives and that's why there was so little obesity back then.
  2. That it's “being caused by plastics or antibiotics affecting the microbiome or something like that”.
  3. That there is hysteresis—once you become overweight it's semi-permanent.

This paper argues for the second answer, and against the other two.

At the outset, there are reasons to be wary of this paper: neither author (who share a family name) appear to have expertise in applicable fields, and it appears to be set in Computer Modern Roman, hardly the style of a journal submission. So it's coming from outside of traditional expertise. (I don't have any expertise here either.)

With that in mind, the paper starts by posing a series of challenging facts about obesity. (References in the original:)

  1. It's new. One hundred years ago obesity was very rare (~1% of the population) but there were plenty of people who had enough to eat and, from our point of view, ate a lot of fattening foods.
  2. It's not just new, it seemed to suddenly kick off around 1980. “Today the rate of obesity in Italy, France, and Sweden is around 20%. In 1975, there was no country in the world that had an obesity rate higher than 15%”.
  3. It's still getting worse. It's less in the news but if anything it's accelerating in the US. This is despite Americans significantly cutting back on sugars and carbs since 2000.
  4. It's not just humans: lab animals and wild animals appear to be getting fatter over time too. (A surprise to me, but casual inspection seems to confirm that this is really a thing that reviewed papers are noting.)
  5. Junk food from a supermarket fattens rats far more than giving them more of any macro-nutrient does. Somehow junk food is more than the sum of its sugars, proteins, and fats.
  6. Across several countries, living at sea-level seems to increase obesity.
  7. Diets produce modest reductions in weight over the span of weeks or months, but the weight comes back over time. There's been a lot of searching for effective diets, but they're all about the same in large populations.

The next section answers some of the competing explanations for obesity:

“It's from overeating!”, they cry. But controlled overfeeding studies (from the 1970's—pre-explosion) struggle to make people gain weight and they loose it quickly once the overfeeding stops. (Which is evidence against a hysteresis theory.)

“It's lack of exercise”, they yell. But making people exercise doesn't seem to produce significant weight loss, and obesity is still spreading despite lots of money and effort being put into exercise.

“It's from eating too much fat”, rings out. But Americans reduced their fat intake in response to messaging about the evils of fat some decades ago and it didn't help. Nor are low-fat diets very effective.

“It's too much sugar / carbs”, you hear. But Americans reduced their sugar and (more generally) carb intakes over recent years and that didn't help either. Gary Taubes's study was a bit of a damp squib.

In this section there's a hunter-gatherer tribe for everything. I'm a little suspicious of this line of evidence because these small human populations could plausibly have evolved to tolerate their specific environment but, if you want a group of humans with zero-percent obesity who eat 60%+ carbs, or 60%+ fat, this paper has one for you. They have plenty of food, they just live happily and remain thin.

Next the paper establishes that there is clearly some degree of homeostatic regulation of weight by the brain. You can damage a specific part of the brain and cause obesity. Or you can have a genetic flaw that results in fat cells not producing leptin, which results in an insatiable appetite. (But adding leptin to overweight people doesn't work.)

Now the paper presents its thesis: it's all caused by a subtle poison! Manufacture of which really took off slightly before 1980, and is increasing or is bio-accumulative. Diets don't work because it's not a diet problem. Supermarket food fattens rats much more than any macro-nutrient chow because supermarket food contains more of the contamination. Wild animals are getting fatter because they're consuming it too. Living at sea-level means that your water supply has traveled much further and picked up more of it, which is why altitude is anti-correlated with obesity.

There are many drugs that cause weight gain and that appear to do so by acting on the brain, so these things can exist. This is hardly the first paper to suggest that certain chemicals contribute to the problem, but this paper is distinguishing itself by saying that it's the dominant factor.

Three specific families of chemicals are detailed for consideration: antibiotics; per-, and poly-fluoroalkyls (PFAS); and lithium. All have ambiguous evidence.

Antibiotics certainly make animals fatter, but do they do the same to humans in the amounts consumed? If so, why aren't places that use a lot in livestock fatter than those which use less? Why aren't vegan diets magic for weight loss?

PFAS is a family of thousands of under-studied chemicals, but they doesn't clearly cause weight gain in humans at plausible levels. But they are certainly getting everywhere.

Lithium clearly does cause weight gain in humans, but are the amounts that people are exposed to increasing, and they are large enough to cause weight gain?

The paper also notes that things are even worse to think about because chemicals do complex things in the environment, and in animals. You don't just have to think about the chemicals that are made, you have to worry about everything they can become. The paper includes the example of a factory in Colorado that made “war materials” and released some chemicals into the ground around the factory. It took several years for the chemicals to travel through the ground water to farms several miles away. During that time they had reacted to form 2,4-D, a herbicide, which killed crops on those farms. (The unreacted chemicals were also pretty nasty.)

Switching to speculation that should be blamed entirely on me, not the paper: it seems that there might be a tendency for any chemical that affects the regulation of adiposity to do so in the direction of obesity. There are several drugs that target the brain and cause weight gain, but fewer safe drugs that cause significant down regulation of weight. (If there were several such drugs, lots of people would be taking them.) Rather, drugs that cause weight loss often cause energy to be wasted from the body rather than changing the regulation of weight: DNP causing heat-loss, or SGLT2 inhibitors causing glucose excretion. Thus we might be facing a situation where multiple minor factors affect adipose regulation, but the overall effect is towards obesity because any effect tends to be in that direction.

The obvious example of something that causes down-regulation of weight is smoking. (We wouldn't call it “safe” though.) I wonder whether the paper is overly focused on something that had a step change in prevalence shortly before 1980. It might have been building steadily in the prior decades but the 40%ish of American adults who smoked in the 60s/70s hid it for a while.

If we were to hypothesise that some environmental factor is causing a significant fraction of the obesity problem then how would we test it? It could well be the sum of multiple factors, some of which may be carried in water given the correlation with elevation. It seems that one would need groups of overweight people willing to consume exclusively provided water (from distillation) and a source of food that is somehow pristine. The half-life of PFAS, at least, is measured in years in humans, so the subjects would have to remain compliant with this proscribed diet for extended periods of time. In order to have control groups we would have to (double-blind) contaminate the pristine food with environmentally-plausible levels of candidate chemicals, I guess? Would that get past any IRB?

The environmental hypothesis is primarily one of exclusion, and this paper makes a good case. (Although one should have significant epistemological humility about any complex technical argument outside of one's expertise.) And I haven't even covered it all! The paper continues with arguments about paradoxical reactions and the occurrence of anorexia! There is much more within if this summary piqued an interest.


Social media: designed to be bad for you & for society

24 июля, 2021 - 21:59
Published on July 24, 2021 6:59 PM GMT

After writing this, I wasn't happy enough with the result to post it. However, reading spkoc's comment on a recent COVID update made me think that I should share it because it seems as relevant as it did when I typed it out.

This post is my second attempt to explain and check the worry I feel about social media. The feedback from the first one told me that instead of clarifying my thoughts, I only managed to convey a vague sense of unease. Let's give this one more shot then.

Mainstream opinion of services like Facebook, Twitter, etc. is trending negative. It is said that, on a personal level, they're addictive, wasting people's time and harming their self-esteem. At the group level, they create echo chambers, fuel conspiracy theories, and even enable ethnic cleansing. But I think all of this misses a crucial point–that all these things are connected, parts of a single system, operating within normal parameters. And the negative effects I described? Merely expected by-products, like car exhaust or traffic jams.

This error leads us to underestimate just how much these platforms degrade global epistemic conditions. To illustrate my point, let me describe three mechanisms that I consider core to social media.

As Digital Nicotine

I used to smoke back in high school. It made me feel good–calm but awake–especially between classes when I worried about my grades and my future. Cigarettes also helped connect with other people. Smoking signaled that I was alright, that I wasn't taking life too seriously and that I could be trusted.

Social media feels similar. Seeing photos of friends partying or people liking your post activates some ancient circuits in the brain. The ones that help us work together, build relationships with others, and track our position in hierarchies. The same ones which allowed our ancestors to form stable hunter-gatherer bands, which turned out to be such a good survival strategy that we've taken over the world. That's why it feels so good to to consume all that the social web has to offer: cute babies, pretty bodies, and outrageous news.

It would never have worked out without smartphones. They freed users from stationary, beige boxes that had to be shared with parents or siblings. Suddenly, everyone had a private device, a little portal into the social universe, so they could check in whenever and wherever they were. It was handy, instantaneous relief from boredom, kind of like having a quick smoke.

As Video Game

In games, players often create avatars, join groups, fight monsters, and collect experience points. On social media, they create profiles, join echo chambers, fight people from the outgroup, and collect followers. Both are fun and engaging and difficult to put down.

There's another genre of games that are hard to quit: slot machines. Casinos have greatly advanced the art of making them more addicting. For example, they play faster and with shorter breaks, increasing the number of games played per unit of time. That's clever, but still within the realm of acceptable practice. But how about this piece of psychological black magic: slot machines simulate the player almost winning, evoking the emotion of "almost getting it", tricking them to play another round. The only purpose of these techniques is to keep people playing, even if there's no end-game.

In a similar fashion, the most important metric for a social media platform is engagement–how much time users spend actively using their site or app. To increase that metric, platforms often introduce features like videos or payments. But sometimes, they go for something a little more clever, like algorithmic feeds. These track a user's interactions and use that to display content that's likely to generate more interactions, no matter if it makes users feel good or bad. What matters is if the user will come back and play some more. Luckily for the platforms, there is no end-game.

As Attention Harvester

In "The Attention Merchents", Tim Wu describes the story of modern advertising. It begins when one man, a printer by trade, noticed an opportunity in the daily paper market–if he could get people to pay him to print stories about their products, he could drop the price from 5-6 cents to just 1 cent. But to do that, he would have to ensure people were reading those stories. And he had just the idea to make that happen.

His paper printed scandals, crime reports, and sometimes complete fabrications. It was the first paper to hire a reporter dedicated solely to hanging around the courthouse and producing stories about all the wild and ugly cases that flowed through those walls. It worked so well, in fact, that the paper's circulation exceeded that of any other local competitor. As others copied his model and achieved success, the advertising industry was born.

Over time, the model was applied to other media like radio, TV, and finally the Web. And while it was refined and expanded every step of the way, the core remained the same: capture the attention of an audience and sell it. There are three parties to this transaction: someone who wants to sell a product or service; an audience seeking entertainment; and the middleman who captures the attention of the second group and sells it to the first.

(This is where the phrase "If you are not paying for it, you're not the customer; you're the product being sold" originates from.)

Social media companies are the attention merchants. They've refined the harvesting process by collecting data about their users and using it to display ads that are most likely to engage them. When one describes this in a positive light, they point out that these are ads about things people really care about. But attention merchants are locked into competition over a finite resource. So there's no natural limit to how far they're willing to "optimize" their harvesting. That is, at least in Wu's take, until the "product" revolts, like it has a few times before.

The Crux

This then is the crux of my worry: social media is an inherently exploitative medium. There's a lot of systematic effort–researcher/engineer hours–that goes into making it grab as much of people's attention as possible at any cost. Because we ignore this point, we look at social media's side-effects as separate, contained externalities, which in turn suppresses any society-level response to this problem. Until we find an alternative, basically a new set of schelling points for online presence and advertising, I expect these problems to continue getting worse.


Believing vs understanding

24 июля, 2021 - 06:39
Published on July 24, 2021 3:39 AM GMT

(Cross posted on my personal blog.)

Every year before the NBA draft, I like to watch film on all of the prospects and predict how good everyone will be. And every year, there ends up being a few guys who I am way higher on than everyone else. Never fails. In 2020 it was Precious Achiuwa. In 2019 it was Bol Bol. In 2018 it was Lonnie Walker. In 2017 it was Dennis Smith. In 2016 it was Skal Labissiere. And now this year in 2021, it is Ziaire Williams.

I have Ziaire Williams as my 6th overall prospect, whereas the consensus opinion is more in the 15-25 range. If I'm being honest, I think I probably have him too high. There's probably something that I'm not seeing. Or maybe something I'm overvaluing. NBA draft analysis isn't a perfectly efficient market, but it is somewhat efficient, and I trust the wisdom of the crowd more than I trust my own perspective. So if I happened to be in charge of drafting for an NBA team (wouldn't that be fun...), I would basically adopt the beliefs of the crowd.

But at the same time, the beliefs of the crowd don't make sense to me.

Upside is really important for NBA players. Check this out:

It shows the Championship Odds over Replacement Player (CORP) for Michael Jordan. Role players and sub-all-stars have something like a 2-4% CORP, whereas All-NBA and MVP types have something like a 10-20% CORP. And I don't even think that fully captures the value that these unicorn-type players have. I believe it's easier to build a roster around a unicorn, as one example. So if you can hit on a unicorn, it's huge.

Ziaire Williams is one of the few guys in this draft who I think has that unicorn potential. He's 6'9 with a 6'11 wingspan: great size for a wing. He's got the shake as a ball handler to break his man down off the dribble and create. His shooting isn't there yet, but the fluidity, FT% and tough shot making ability all make me think he can develop into a really good shooter. He doesn't have the ability to get to the rim yet, but his size, explosiveness, fluidity, and shake as a ball handler all make me think that he can totally get there one day. Once he develops physically and becomes a little more savvy, I could see it happening. And then as a passer and defender I think he has the knack and can develop into a moderate plus in both of those areas. Overall, I think he's got All-NBA upside for his ceiling, but also isn't one of those complete boom or bust guys either. I could see him just developing into a solid contributor.

Anyway, there is this discrepancy between what I believe, and what my intuition is. If you put me in charge of drafting for an NBA team, I would adopt the belief of the crowd and punt on drafting him this highly. But at the same time, I wouldn't understand the why. It wouldn't make sense to me. My intuition would still be telling me that he's a prospect worthy of the 6th overall pick. I would just be trusting the judgement of others.

Let me provide a few more examples.

In programming, there is a difference between static typing and dynamic typing. My intuition says that with static typing, the benefit of catching type errors at compile time wouldn't actually be that large, nor the benefit of being able to hover over stuff in your IDE and see eg. function signatures. On the other hand, my intuition says that the extra code you have to write for static typing is "crufty" and will make it take longer to read code. And with that, it seems like dynamic typing wins out. However, I hear differently from programmers that I trust. And I personally have very little experience with static typing, so it's hard to put much weight on my own intuition. So if I was in charge of choosing a tech stack for a company (where I expect the codebase to be sufficiently large), I would choose a staticly typed language. But again, there is a conflict between what I believe and what I intuit. Or what I understand.

I'm moving in 8 days. I have a decent amount of packing to do. It seems like it's pretty manageable, like I can get it done in a day or two. But that's just what my intuition outputs. As for what I believe, I believe that this intuition is a cognitive illusion. With a visual illusion, my eyes deceive me. With a cognitive illusion, my brain does. In this case, I expect that there is actually something like 2x more packing to do than what I'd intuitively think. Let's hope my willpower is strong enough to act on this belief.

I'm a programmer. At work we have these deadlines for our projects. I guess the sales and marketing people want to be able to talk to customers about new features that are coming up. I don't understand this though. We've been building the app we have for seven years. A new feature is maybe four weeks of work. So it's like an iceberg: the bulk of the work is what we've already done, the part that is underwater, whereas the new feature is the itsy bitsy part of the iceberg that is above water. Under what conditions is that itsy bitsy part above the water ever going to bring a customer from a "no" to a "yes"? It doesn't make sense to me. Then again, I'm not a salesperson. This practice seems common in the world of sales and marketing, so there's probably something that I'm missing.

I run into this situation a lot, where I believe something but don't understand it. And I feel like I always fail to communicate it properly. I end up coming across as if my belief matches my understanding. As if I actually believe that release dates are useless to salespeople.

One failure mode I run into is that if you have a long back and forth discussion where you keep arguing for why it seems that your understanding makes sense, people will think that your belief matches your understanding. I guess that is logical. I don't blame the other person for making that assumption. I suppose I need to be explicit when my belief doesn't match my understanding.

Another failure mode I run into is, I feel like I don't have the right language to express belief vs understanding, and so I don't attempt to explain it. It feels awkward to describe the difference.

In a perfect world, I'd talk about how beliefs are basically predictions about the world. They are about anticipated experiences. And then I'd talk about mental models. And gears-level models.

I'd also talk about the post A Sketch of Good Communication. In particular, the part about integrating the evidence someone shares with you, updating your model of the world, and then updating your beliefs.

Step 1: You each have a different model that predicts a different probability for a certain event.

Step 2: You talk until you have understood how they see the world.

Step 3: You do some cognitive work to integrate the evidences and ontologies of you and them, and this implies a new probability.

But all of this is impractical to do in most situations, hence the failure mode.

I'm not sure if there is a good solution to this. If there is, I don't think I'm going to figure it out in this blog post. In this blog post my goal is more to make some progress in understanding the problem, and also maybe to have something to link people to when I want to talk about belief vs understanding.

One last point I want to make in this post is about curiosity. When you are forced to make decisions, often times it makes sense to adopt the beliefs of others and act on them without understanding them. And furthermore, sometimes it makes sense to move on without actually trying to understand the thing. For example, I trust that it makes sense to get vaccinated for covid, and I don't think it's particularly important for me to understand why the pros outweigh the cons. But I fear that this type of thinking can be a slippery slope.

It can be tempting to keep thinking to yourself:

I don't actually need to understand this.

I don't actually need to understand this.

I don't actually need to understand this.

And maybe that's true. I feel like we live in a world where adopting the beliefs of experts you trust and/or the crowd can get you quite far. But there are also benefits to, y'know, actually understanding stuff.

I'm not sure how to articulate why I think this. It just seems like there are a lot of more mundane situations where it would be impractical to try to adopt the beliefs of others. You kinda have to think about it yourself. And then there is also creativity and innovation. For those things, I believe that building up a good mental model of the world is quite useful.

Here is where the first virtue of rationality comes in: curiosity.

The first virtue is curiosity. A burning itch to know is higher than a solemn vow to pursue truth. To feel the burning itch of curiosity requires both that you be ignorant, and that you desire to relinquish your ignorance. If in your heart you believe you already know, or if in your heart you do not wish to know, then your questioning will be purposeless and your skills without direction. Curiosity seeks to annihilate itself; there is no curiosity that does not want an answer. The glory of glorious mystery is to be solved, after which it ceases to be mystery. Be wary of those who speak of being open-minded and modestly confess their ignorance. There is a time to confess your ignorance and a time to relinquish your ignorance.

I suspect that curiosity is necessary. That in order to develop strong mental models, it's not enough to just be motivated for more practical and instrumental reasons, as a means to some end. You have to enjoy the process of building those models. When your models are messy and tangled up, it has to bother you. Otherwise, I don't think you would spend enough time developing an understanding of how stuff works.

When I think of the word "curiosity", I picture a toddler crawling in some nursery school playing with blocks.

And when I think of serious, adult virtues, an image of an important business executive at their desk late at night thinking hard about how to navigate some business partnership is what comes to mind. Someone who is applying a lot of willpower and effort.

And I don't think I'm alone in my having these stereotypes. I think that is a problem. I fear that these stereotypes make us prone to dismissing curiosity as more of a "nice to have" virtue.

How can we fix this? Well, maybe by developing a stronger gears level understanding of why curiosity is so important. How's that for a chicken-egg problem?


Housing Without Street Parking: Implemented

24 июля, 2021 - 05:40
Published on July 24, 2021 2:40 AM GMT

In 2017, I wrote:

What if, in places like Somerville where all parking is already by-permit-only, we added a new category of housing unit, one that didn't come with any rights to street parking? It turns out this was included in the 2019 zoning overhaul (I missed this among all the other great changes): 11.2.7: On-Street Parking in Transit Areas
  1. Upon the adoption of an official policy limiting on-street residential parking permits in transit areas, the review boards shall require the following as a condition(s) of any discretionary or administrative permit:
    1. that the applicant complete and file formal acknowledgment that all dwelling unit(s) are ineligible to participate in the Somerville Residential Permit Parking program with the with the Middlesex South Registry of Deeds or Land court prior to the issuance of a building permit;
    2. that all dwelling units are advertised as ineligible to participate in the Somerville Residential Permit Parking program; and
    3. that all buyers grantees, lessees, renters, or tenants are informed that all dwelling unit(s) is ineligible to participate in the Somerville Residential Permit Parking program.
By transit area they mean everything within half a mile from a subway stop, which with the Green Line makes most of the city:

I'm very excited to see this! While housing without parking is not what everyone wants or needs, this change (a) makes it an option for people who do want it and (b) should help reduce opposition to construction.


AXRP Episode 10 - AI’s Future and Impacts with Katja Grace

24 июля, 2021 - 01:10
Published on July 23, 2021 10:10 PM GMT

Google Podcasts link

This podcast is called AXRP, pronounced axe-urp and short for the AI X-risk Research Podcast. Here, I (Daniel Filan) have conversations with researchers about their papers. We discuss the paper and hopefully get a sense of why it’s been written and how it might reduce the risk of artificial intelligence causing an existential catastrophe: that is, permanently and drastically curtailing humanity’s future potential.

When going about trying to ensure that AI does not cause an existential catastrophe, it’s likely important to understand how AI will develop in the future, and why exactly it might or might not cause such a catastrophe. In this episode, I interview Katja Grace, researcher at AI Impacts, who’s done work surveying AI researchers about when they expect superhuman AI to be reached, collecting data about how rapidly AI tends to progress, and thinking about the weak points in arguments that AI could be catastrophic for humanity.

Topics we discuss:

Daniel Filan: Hello everybody. Today, I’ll be speaking with Katja Grace, who runs AI Impacts, a project that tries to document considerations and empirical evidence that bears on the longterm impacts of sophisticated artificial intelligence. We’ll be talking about her paper, “When Will AI Exceed Human Performance? Evidence From AI Experts, coauthored with John Salvatier, Allan Dafoe, Baobao Zhang, and Owain Evans, as well as AI Impacts’ other work on what the development of human level AI might look like and what might happen afterwards. For links to what we’re discussing you can check the description of this episode, and you can read a transcript at axrp.net. Katja, welcome to AXRP.

Katja Grace: Hey, thank you.

AI Impacts and its research

Daniel Filan: Right. So I guess my first question is, can you tell us a bit more about what AI Impacts is and what it does?

Katja Grace: Yeah, I guess there are basically two things it does where one of them is do research on these questions that we’re interested in, and the other is try to organize what we know about them in an accessible way. So the questions we’re interested in are these high level ones about what will the future of, well humanity, but involving artificial intelligence in particular, look like? Will there be an intelligence explosion? Is there a huge risk of humanity going extinct? That sort of thing. But also those are questions we know about. It seems like there’s a vaguer question of, what does the details of this look like? Are there interesting details where if we knew exactly what it would look like we’d be like, “Oh, we should do this thing ahead of time.” Or something.

Katja Grace: We’re interested in those questions. They’re often pretty hard to attack head-on. So what we do is try to break them down into lower level sub-questions and then break those down into the lower level sub-questions again until we get to questions that we can actually attack, and then answer those questions. So often we end up answering questions like, for cotton gin technology were there fairly good cotton gins just before Eli Whitney’s cotton gin or something. So it’s very all over the place as far as subject matter goes, but all ultimately hoping to bear on these other questions about the future of AI.

Daniel Filan: And how do you see it fitting in to the AI existential risk research field? If I’m someone else in this research field, how should I interact with AI Impacts outputs?

Katja Grace: I think having better answers to the high-level questions I think should often influence what other people are doing about the risks. I think that there are a lot of efforts going into avoiding existential risk and broadly better understanding what we’re up against seems like it might change which projects are a good idea. It seems like we can see that for the questions that are most clear, like is there likely to be a fast intelligence explosion? Well, if there is, then maybe we need to prepare ahead of time for anything that we would want to happen after that. Whereas if it’s a more slow going thing that would be different.

Katja Grace: But I think there’s also just an intuition that if you’re heading into some kind of danger and you just don’t really know what it looks like at all, it’s often better to just be able to see it better. I think the lower level things that we answer, I think it’s harder for other people to make use of except to the extent that they are themselves trying to answer the higher level questions and putting enough time into that to be able to make use of some data about cotton gins.

Daniel Filan: I guess there’s this difference between the higher level and the lower level things with the higher level of things perhaps being more specific, or more relevant rather, but in terms of your output, if somebody could read a couple things on the website or just a few outputs, I’m wondering specifically what work do you think is most relevant for people trying to ensure that there’s no existential catastrophe caused by AI?

Katja Grace: I think the bits where we’ve got closest to saying something at a higher level, and so more relevant, is we have a page of arguments for thinking that there might be fast progress in AI at around the point of human level AI. Sort of distinct from an intelligence explosion, I think often there’s a thought that you might just get a big jump in progress at some point, or you might make a discovery and go from a world similar to the world we have now to one where fairly impressive AI is either starting an intelligence explosion at that point, or just doing really crazy things that we weren’t prepared for.

Katja Grace: So we tried to make a list of all the arguments we’ve heard around about that and the counter arguments we can think of. I think that’s one that other people have mentioned being particularly useful for them. I guess, relatedly, we have this big project on discontinuous progress in history, so that’s feeding into that, but I guess separate to the list of arguments we also have, just historically when have technologies seen big jumps? What kinds of situations cause that? How often they happen, that sort of thing.

Daniel Filan: Well we will be talking about that later, so I’m glad it’s relevant. So it seems like you have a pretty broad remit of things that are potentially items that AI Impacts could work on. How do you, or how does AI Impacts choose how to prioritize within that and which sub questions to answer?

Katja Grace: I think at the moment the way we do it is we basically just have a big spreadsheet of possible projects that we’ve ever thought of that seemed like they would bear on any of these things. And I guess the last time we went to a serious effort to choose projects from this list, what we did was first everyone put numbers on things for how much they liked them, intuitively taking into account how useful they thought they would be for informing anything else and how well equipped to carry them out they thought we were, or they were in particular. And then we had a sort of debate and we looked at ones where someone was excited and other people weren’t or something like that and discussed them, which was quite good I think.

Katja Grace: I would like to do more of that. I think it was pretty informative. And in the end we’re basically going with an intuitive combination of this is important, we think it’s tractable for someone currently at the org to do it well. And maybe some amount of either we have some obligation to someone else to do it, or we know that someone else would like us to do it. Sometimes people give us funding for something.

Daniel Filan: What do you think the main features are that predict your judgments of importance of one of these sub-questions?

Katja Grace: I think how much it might change our considered views on the higher level questions.

Daniel Filan: If you had to extract non-epistemic features. So an epistemic feature, it might be that this would change my opinions, and a non-epistemic feature, it might be that this involves wheat in some capacity.

Katja Grace: My sense is that projects are better when they’re adding something empirical to the community’s knowledge. I think you can do projects that are sort of more vague and what are the considerations about this? And I think that can be helpful. I think it’s harder for it to be helpful or especially if it’s, and what do I think the answer is to this given all the things? Because I think it’s harder for someone else to trust that they would agree with your opinions about it. Whereas if you’re like, “Look, this thing did happen with cotton gins and other people didn’t know that and it’s relevant.” Then however it is that they would more philosophically take things into account, they have this additional thing to take into account.

Daniel Filan: I guess that’s kind of interesting - so you said that the thing that is most relevant perhaps for outsiders is this big document about why we might or might not expect fast takeoff. And yeah, it’s mostly a list of arguments and counter arguments. I’m wondering what do you think about that relationship?

Katja Grace: I think considering that as a project I wouldn’t class it with the unpromising category of more vague, open-ended, non-empirical things. So I guess my thing is not empirical versus not. I think the way that these arguments seem similar to an empirical thing is they’re in some sense a new thing that the other people reading it might not have known and now they have it and can do what they want with it. It’s somehow like you’ve added a concrete thing that’s sort of modular and people can take away and do something with it rather than the output being somehow more nebulous, I guess.

Katja Grace: I guess the thing that I try to do with AI Impacts at least that I think is less common around in similar research is the pages are organized so that there’s always some sort of takeaway at the top, and then the rest of the page is just meant to be supporting that takeaway, but it should be that you can basically read the takeaway and do what you want with that, and then read the rest of the page if you’re curious about it.

How to forecast the future of AI

Daniel Filan: So when you’re trying to figure out what the future of AI holds, one thing you could do is you could think a lot about progress within AI or facts about the artificial intelligence research field. And another thing you could do is you could try to think about how economic growth has accelerated over time, or how efficient animals are at flying, or progress in cotton gin technology. I’m wondering, these feel like different types of considerations one might bring to bear, and I’m wondering how you think we should weigh these different types?

Katja Grace: I guess thinking about how to weigh them seems like a funny way to put it. I guess there’s some sort of structured reason that you’re looking at either of them. There was some argument for thinking that AI would be fast, it would be dangerous or something. And I guess in looking at that argument, if you’re like, “Oh, is this part of it correct?” Then like in order to say, “Is it correct?” There are different empirical things you could look at. So if a claim was, “AI is going to go very fast because it’s already going very fast.” Then it seems like the correct thing to do is to look at is it already going very fast? What is very fast? Or something. Whereas if the claim is it’s likely that this thing will see a huge jump, then it seems like a natural way to check that is to be like, “All right, do things see huge jumps? Is there some way that AI is different from the other things that would make it more jumpy?”

Katja Grace: So there, I think it would make sense to look at things in general and look at AI and the reason to look at things in general, as well as AI, I guess maybe there are multiple reasons. One of them would just be there’s a lot more of other things. Especially if you’re just getting started in AI, in the relevant kind of AI or something, it’s nice to look at broader things as well. But there I agree that looking at AI is pretty good.

Daniel Filan: I guess maybe what I want to get at is I think a lot of people when trying to think about the future of AI, a lot of people are particularly focused on details about AI technology, or their sense of what will and won’t work, or some theory about machine learning or something, whereas it seems to me that AI Impacts does a lot more of this latter kind of thing where you’re comparing to every other technology, or thinking about economic growth more broadly. Do you think that the people who are not doing that are making some kind of mistake? And what do you think the implicit, I guess disagreement is? Or what do you think they’re getting wrong?

Katja Grace: I think it’s not clear to me. It might be a combination of factors where one thing where I think maybe I’m more right is just, I think maybe people don’t have a tendency when they talk about abstract things to check them empirically. I guess a case where I feel pretty good about my position is, well maybe this isn’t AI versus looking at other things, it’s more like abstract arguments versus looking at a mixture of AI and other things. But it seems like people say things like probably AI progress goes for a while and then it reaches mediocre human level, and then it’s rapidly superhuman. And that also this is what we see in other narrow things. As far as I can tell, it’s not what we see in other narrow things. You can look at lots of narrow things and see that it often takes decades to get through the human range. And I haven’t systematically looked at all of them or something, but at a glance it looks like it’s at least quite common for it to take decades.

Katja Grace: Chess is a good example where as soon as we started making AI to play chess it was roughly at amateur level and then it took, I don’t know, 50 years or something to get to superhuman level, and it was sort of a gradual progress to there. And so as far as I can tell people are still saying that progress tends to jump quickly from subhuman to superhuman and I’m not quite sure why that is, but checking what actually happens seems better. I don’t know what the counter argument would be, but I think there’s another thing which is, I’m not an AI expert in the sense that my background is not in machine learning or something. I mean I know a bit about it, but my background is in human ecology and philosophy. So I’m less well equipped to do the very AI intensive type projects or to oversee other people doing them. So I think that’s one reason that we tend to do other more historical things too.

Results of surveying AI researchers

Daniel Filan: So now I’d like to talk a little bit about this paper that you were the first author on. It’s called, When Will AI Exceed Human Performance? Evidence From AI Experts. It came out in 2017 and I believe it was something like the 16th most discussed paper of that year according to some metric.

Katja Grace: Yeah. Altmetric.

Daniel Filan: So yeah, that’s kind of exciting. Could you tell us a bit about what’s up with this? Why this survey came into existence and what it is and how you did it? Let’s start with the what. What is this survey and paper?

Katja Grace: Well the survey is a survey of, I think we wrote to - I think it was everyone who presented at NIPS and ICML in 2015 we wrote to, so I believe it would have been all the authors, but I can’t actually remember, but I think our protocol is just to look at the papers and take the email addresses from them. So we wrote to them and asked them a bunch of questions about when we would see human level AI described in various different ways and using various framings to see if they would answer in different ways, like how good or bad they thought this would be, whether they thought there would be some sort of very fast progress or progress in other technologies, what they thought about safety, what they thought about how much progress was being driven by hardware versus other stuff. Probably some other questions that I’m forgetting.

Katja Grace: We sort of randomized it so they only got a few of the questions each so that it didn’t take forever. So the more important questions lots of people saw and the ones that were more like one of us thought it would be a cool question or something fewer people saw. And I guess we got maybe - I think it was a 20% response rate or something in the end? We tried to make it so they wouldn’t see what the thing was about before they started doing it, just in terms that it was safety and long-term future oriented to try and make it less biased.

Daniel Filan: And why do this?

Katja Grace: I think knowing what machine learning experts think about these questions seemed pretty good I think both for informing people, informing ourselves about what people actually working in this field think about it, but also I think creating common knowledge. If we publish this and they see that yeah, a lot of other machine learning researchers think that pretty wild things happening as a result of AI pretty soon is likely, that that might be helpful. I think another way that it’s been helpful is often people want to say something like this isn’t a crazy idea for instance. Lots of machine learning researchers believe it. And it’s helpful if you can point to a survey of lots of them instead of pointing to Scott Alexander’s blog post that several of them listed or something.

Daniel Filan: So there’s a paper about the survey and I believe AI Impacts also has a webpage about it. And one thing that struck me reading the webpage was first of all, there was a lot of disagreement about when human level AI might materialize. But it’s also the case that a lot of people thought that basically most people agreed with them, which is kind of interesting and suggests some utility of correcting that misconception.

Katja Grace: Yeah. I agree that was interesting. So I guess maybe the survey didn’t create so much common knowledge, or maybe it was more like common knowledge that we don’t agree or something. Yeah. And I guess I don’t know if anyone’s attempted to resolve that much beyond us publishing this and maybe the people who were in it seeing it.

Daniel Filan: The first question I think people want to know is, when are we going to get human level AI? So when did survey respondents think that we might get human level AI?

Katja Grace: Well, they thought very different things depending on how you asked them. I think asking them different ways, which seemed like they should be basically equivalent to me just get extremely different answers. Two of the ways that we asked them, we asked them about high-level machine intelligence, which is sort of the closest thing to the question that people have asked in previous surveys, which was something like when will unaided machines be able to accomplish every task better and more cheaply than human workers?

Katja Grace: But we asked them this in two ways, one of them was for such and such a year, what is the probability you would put in happening by that year? And the other one was for such and such a probability in what year would you expect it? So then we kind of combine those two for the headline answer on this question by sort of making up curves for each person and then combining all of those. So doing all of that and the answers that we got from combining those questions were a 50% chance of AI outperforming humans in all tasks in 45 years. But if you ask them instead about when all occupations will be automated, then it’s like 120 years.

Daniel Filan: I think these are years from 2016?

Katja Grace: That’s right. Yeah.

Daniel Filan: Okay. So subtract five years for listeners listening in 2021. So there’s this difference between the framing of whether you’re talking about tasks that people might do for money versus old tasks. You also mentioned there was a difference in whether you asked probability by a certain year versus year at which it’s that probability that the thing will be automated. What was the difference? What kind of different estimates did you get from the different framings?

Katja Grace: I think that one was interesting because we actually tried it out beforehand on MTurkers and also we did it for all of the different narrow task questions that we asked. So we asked about a lot of things, like when will AI be able to fit Lego blocks together in some way or something. And this was a pretty consistent effect across everything. And I think it was much smaller than the other one. I think it usually came out at something like a decade of difference.

Daniel Filan: Does anyone know why that happens?

Katja Grace: I don’t think so.

Daniel Filan: Which one got the sooner answers?

Katja Grace: If you say, “When will there be a 50% chance?” That gets the sooner answer than if you say, “What is the chance in 2030?” or something. My own guess about this is that people would like to just put low probabilities on things. So if you say, “What is the chance in any particular year?” They’ll give you a low probability. Whereas if you say, “When will there be a 90% chance?” They have to give you a year.

Daniel Filan: And they don’t feel an urge to give large numbers of years as answers.

Katja Grace: Also somehow it seems unintuitive or weird to say the year 3000 or something. That’s just speculation though. I’d be pretty curious to know whether this is just a tendency across lots of different kinds of questions aren’t AI related. It seems like probably since it was across a lot of different questions here.

Daniel Filan: Yeah. You have these big differences in framing. You also have differences in whether the respondents grew up in North America or Asia right?

Katja Grace: I can’t remember if it was where they grew up, but yeah. Roughly whether they’re…

Daniel Filan: It was where their undergraduate institution was I think.

Katja Grace: Yes. It was like a 44 year difference. Where for those in Asia it was like 30 years. Asking about each HLMI again, for [those in North America] I guess it was 74.

Daniel Filan: Here as well do you have any guesses as to why there are these very large differences between groups of people? It’s pretty common for people to have their undergraduate in one continent and go to a different continent to study. So it’s not as though word hasn’t reached various places. Do you have any idea what’s going on?

Katja Grace: I think I don’t. I would have some temptation to say, “Well, a lot of these views are more informed by cultural things than a lot of data about the thing that they’re trying to estimate.” I think to the extent that you don’t have very much contact with the thing you’re trying to estimate, maybe it’s more likely to be cultural. But yeah, I think as you say, there’s a fair amount of mixing.

Katja Grace: Yeah. I think another thing that makes me think that it’s cultural is just that opinions were so all over the place anyway, I think. I guess I can’t remember within the different groups, how all over the place they were, but I think I would be surprised if it wasn’t that both cultural groups contain people who think it’s happening very soon and people who think it’s happening quite a lot later. So it’s not like people are kind of following other people’s opinions a great deal.

Daniel Filan: So given that there are these large framing differences and these large differences based on the continent of people’s undergraduate institutions, should we pay any attention to these results?

Katja Grace: I mean, I do think that those things should undermine our confidence in them quite a lot. I guess things can be very noisy and still some good evidence if you kind of average them all together or something. If we think that we’ve asked with enough different framings sort of varying around some kind of mean, maybe that’s still helpful.

Daniel Filan: So perhaps we can be confident that high-level machine intelligence isn’t literally impossible.

Katja Grace: There are some things that everyone kind of agrees on like that, or maybe not everyone, but a lot of people. I think also there are more specific things about it. You might think that if it was five years away it would be surprising if almost everyone thought it was much further away. You might think that if it’s far away people probably don’t have much idea how many decades it is, but if it was really upon us they would probably figure it out, or more of them would notice it. I’m more inclined to listen to other answers in the survey. I guess I feel better about asking people things they’re actually experts in. So to the extent can ask them about the things that they know about, and then turn that into an answer about something else that you want to know about. That sort of seems better.

Katja Grace: So I guess we did try to do that with human level AI in a third set of questions for asking about that, which were taken from Robin Hanson’s previous idea of asking how much progress you had seen in your field, or your subfield during however many years you’ve been in your subfield, then using that to extrapolate how many years it should take to get to 100% of the way to human level performance in the subfield. That’s the question. For that kind of extrapolation to work you need for there not to be acceleration or deceleration too much. And I think in Robin’s sort of informal survey a few years earlier there wasn’t so much, whereas in our survey a lot of people had seen acceleration perhaps just because the field had changed over those few years. So harder to make clear estimates from that.

Daniel Filan: I’m wondering why that result didn’t make it into the main paper because it seems like relatively compelling evidence, right?

Katja Grace: I think maybe people vary in how compelling they find it. I think also the complication with acceleration maybe it makes it hard to say much about. It seems like you can still use it as a bound. If you did this extrapolation it led to numbers much closer than the ones Robin had got. I think part of the reason we included it, or part of the reason it seemed interesting to me at least was it seemed like it led to very different numbers. If you ask people, “When will it be HLMI?” They’re like, “40 years.” Or something. Whereas if you say, “How much progress have you seen in however many years?” And then extrapolate, then I think it was ending up hundreds of years in the future in Robin’s little survey. So I was like, “Huh, that’s interesting.” But I think here it sort of came out reasonably close to the answer for just asking about HLMI directly. Though if you ignored everyone who hadn’t been in the field for very long, then it still came out hundreds of years in the future if I recall.

Daniel Filan: Yeah. I’m wondering why you think - because it seemed like there was a bigger seniority bias, or a seniority difference in terms of responses in that one. And I’m wondering, what do you think was going on there?

Katja Grace: I don’t know. I guess-

Daniel Filan: Acceleration could explain it, right? If progress had been really fast recently then the progress per year for people who have not been there very long would be very high, but progress per year for people who’ve been there for ages would be low.

Katja Grace: In the abstract that does sound pretty plausible. I have to check the actual numbers to see whether that would explain the particular numbers we had. But yeah, I don’t have a better answer than that. I could also imagine more psychological kinds of things where the first year you spend in an exciting field it feels like you’re making a lot of progress, whereas after you’ve spent 20 years in an exciting seeming field, maybe you’re like, “Maybe this takes longer than I thought.”

Daniel Filan: So is it typically the case that people are relatively calibrated about progress in their own field of research? How surprised should we be if AI researchers are not actually very good at forecasting how soon human level machine intelligence will come?

Katja Grace: I don’t know about other experts and how good they are forecasting things. My guess is not very good just based on my general sense of people’s ability to forecast things by default in combination with things like ability to forecast how long your own projects will take, I think it was clearly not good. Speaking for myself, I make predictions in a spreadsheet and I label them as either to do with my work or not. And I’m clearly better calibrated on the not work related ones.

Daniel Filan: Are the not work ones about things to do with your life or about things in the world that don’t have to do with your life?

Katja Grace: I think they’re a bit of both, but I think that the main difference between the two categories is the ones to do with work are like, “I’m going to get this thing done by the end of the week.” And the ones not to do with work are like, “If I go to the doctor, he will say that my problem is blah.” So is it to do with my own volition.

Daniel Filan: Yeah. So I’ve done a similar thing and I’ve noticed that I’m much less accurate in forecasting questions about my own life. Including questions like at one point I was wrong about what a doctor would diagnose me with. I lost a lot of prediction points for that, but I think part of the effect there is that if I’m forecasting about something like how many satellites will be launched in the year 2019 or something I can draw on a bunch of other people answering that exact same question. Whereas there are not a lot of people trying to determine what disease I may or may not have.

Katja Grace: I guess I almost never forecast things where it would take me much effort to figure out what other people think about it to put into the prediction I think. I don’t tend to do that. So I think it’s more between ones where I’m guessing.

Daniel Filan: Yeah, that seems fair. A different thing about your survey is that you ask people about the sensitivity of AI progress to various inputs, right? So if you’d halved the amount of computation available, what that would do to progress, or if you’d halved the training data available, or the amount of algorithmic insights. And it seems to be that one thing some enterprising listener could do is check if that concords with work done by people like Danny Hernandez on scaling laws or something.

Daniel Filan: I guess the final thing I’d like to ask about the survey, perhaps two final things. The first is aside from the things that I’ve mentioned, what do you think the most interesting findings were that didn’t quite make it into the main paper, but are available online?

Katja Grace: I guess the thing that we’ve talked about a little bit, but that I would just emphasize much more as a headline finding is just that the answers are just very inconsistent and there are huge framing effects. So I think it’s both important to remember that when you’re saying, “People say AI in however many years.” I think the framing that people usually use for asking about this is the one that gets the soonest answers out of all the ones we tried, or maybe out of the four main ones that we tried. So that seems sort of good to keep in mind. I think another important one was just that the median probability that people put on outcomes in the vicinity of extinction level of bad is 5%, which, I don’t know, seems pretty wild to me that among sort of mainstream ML researchers the median answer is 5% there.

Daniel Filan: Does that strike you as high or low?

Katja Grace: I think high. I don’t know, 5% that this project destroys the world, or does something similarly bad, it seems unusual for a field.

Daniel Filan: They also had a pretty high probability that it would be extremely good. Right? It’s unclear if those balance out.

Katja Grace: Yeah. That’s fair. I guess as far as should we put attention on trying to steer it toward one rather than the other, it at least suggests there’s a lot at stake, or a lot at risk, and more support for the idea that we should be doing something about that than I think I thought before we did the survey.

Work related to forecasting AI takeoff speeds

Daniel Filan: So now I’d like to move on to asking you some questions about AI Impacts’ work on, roughly, takeoff speeds. So takeoff speeds refers to this idea that in-between the time when there’s an AI that’s about as good as humans perhaps generally, or perhaps in some domain, and the time when AI is much, much smarter or more powerful or more dominant than humans, that might take a very long time, which would be a slow takeoff, or it might take a very short time, which would be a fast takeoff. AI Impacts has done, I think, a few bits of work relevant to this that I’d like to ask about.

How long it takes AI to cross the human skill range

Daniel Filan: So the first that seemed relevant to me is this question about how long it takes for AI to cross the human skill range of various tasks. So there’s a few benchmarks I found on the site. So for classification of images on the ImageNet dataset, it took about three years to go from beginner to superhuman. For English draughts [i.e. checkers] it took about 38 years to go from beginner to top human, 21 years for StarCraft, 30 years for Go, and 30 years for chess. And in Go, it’s not just that the beginners were really bad. Even if you go from what I would consider to be an amateur who’s good at the game to the top human, that was about 15 years roughly. Why do you think it takes so long for this to happen? Because there’s this intuitive argument that, look. Humans, they’re all pretty similar to each other relative to the wide range of cognitive algorithms one could have. Why is it taking so long to cross the human skill range?

Katja Grace: I’m not sure overall, but I think the argument that humans are all very similar so they should be in a similar part of the range. I think that doesn’t necessarily work because I guess if you have, consider wheelbarrows, how good is this wheelbarrow for moving things? Even if all wheelbarrows are basically the same design, you might think there might be variation in them just based on how broken they are in different ways.

Katja Grace: So if you think of humans as being, I don’t know if this is right, but if you think of them as having sort of a design and then various bits being more or less kind of broken relative to that in the sense of having mutations or something, or being more encumbered in some way, then you might expect that there’s some kind of perfect ability to function that nobody has, and then everyone just has going down to zero level of function, just different numbers of problems. And so if you have a model like that, then it seems like even if a set of things basically have the same design, then they can have an arbitrarily wide range of performance.

Daniel Filan: I guess there’s also some surprise that comes from the animal kingdom. So I tend to think of humans as much more cognitively similar to each other than to all of the other animals. But I would be very surprised if there’s a species of fish that was better than my friend at math, but worse than me, or if there was some monkey that had a better memory than my friend, but worse memory than the guy who is really good at memory. Do you think this intuition is mistaken? Or do you think that despite that we should still expect these pretty big ranges?

Katja Grace: I guess it seems like there are types of problems where other animals basically can’t do them at all and then humans often can do them. So that’s interesting. Or that seems like one where the humans are clearly above, though - I guess chess is one like that perhaps, or I guess I don’t know how well any animal can be taught to play chess. I assume it’s quite poorly.

Daniel Filan: Aren’t there chickens that can play checkers apparently? I have heard of this. I don’t know if they’re good.

Katja Grace: My impression is that as soon as we started making AI to play, I think checkers or chess, that it was similar to amateur humans, which sort of seems like amateur humans are kind of arbitrarily bad in some sense, but maybe there’s some step of do you understand the rules? And can you kind of try a bit or something? And both the people writing the first AIs to do this and the amateurs who’ve just started doing it have got those bits under control by virtue of being humans. And, I don’t know, the fish don’t have it or something like that. I think it’s interesting that, like you mentioned, what was it, ImageNet was only three years or something.

Katja Grace: I think that’s very vague and hard to tell by the way, because we don’t have good measures for people’s performance at ImageNet that I know of. And we did some of it ourselves and I think it depends a lot on how bored you get looking at dogs or something. But yeah, it seems like it was plausibly pretty fast. And it’s interesting that ImageNet or recognizing images is in the class of things where humans were more evolved for it. And so most of them are pretty good, unless they have some particular problem, and I’m not sure what that implies, but it seems like it wouldn’t surprise me that much if things like that were different. Whereas chess is a kind of thing where by default we’re not good at it. And then you can sort of put in more and more effort to be better at it.

Daniel Filan: So the first thing I wanted to say is that as could probably be predicted from passing familiarity with chickens, they’re able to peck a checkers board perhaps to move pieces, but they are not at all good at playing checkers. So I would like to say that publicly.

Daniel Filan: Maybe the claim of short, small human ranges is something like that there are people who are better or worse at chess, but like a lot of that variance can just be explained by, some people have bothered to try to get good at chess. And some people might have the ability to be really good at chess, but they’ve chosen something better to do with their lives. And I wonder if there’s instead a claim of, look, if everybody tried as hard as the top human players tried to be really good at chess, then that range would be very small and somehow that implies a short.

Katja Grace: I think that we probably have enough empirical data to rule that out for at least many activities that humans do where there are enthusiasts at them, but they never reach anywhere near peak human performance.

Daniel Filan: Yeah, and with the case of Go I think maybe good at Go - for Go enthusiasts, I mean like the rank of 1d - I think good at Go is roughly the range where maybe not everybody could get good at Go, or I wouldn’t expect much more out of literally everybody, but it still takes 15 years to go from that to top human range. It still seems like an extremely confusing thing to me.

Katja Grace: I was going to ask why it would seem so natural for it to be the other way that it’s very easy to cross the human range quickly?

Daniel Filan: Something like just the relative - humans just seem more cognitively similar to each other than most other things. So I would think well, it can’t be that much range. Or human linguistic abilities. I think that there’s no chicken that can learn languages better than my friend, but worse than me.

Katja Grace: Also chickens can’t learn languages at all, or I assume pretty close to nothing. So it seems like for the language range or something, depending on how you measured it, you might think humans cover a lot of the range at least up until the top humans. And then there are a bunch of things that are at zero.

Daniel Filan: I mean, there still are these African grey parrots or something, or this monkey can use sign language, or this ape I guess.

Katja Grace: There was GPT-3, which is not good in other ways.

Daniel Filan: Yeah. It still seems to me that like basically every adult can, or perhaps every developmentally normal adult, it seems to me can use language better than those animals, but I’m not certain. I wonder if instead my intuition is that humans are just - the range in learning ability is not so big? But then that’s further away from the question you want to predict, which is how long until AIs can’t destroy us until they can destroy us.

Katja Grace: Assuming that when there are AIs that are smarter than any human they can destroy us, which seems pretty non-obvious. But yeah.

Daniel Filan: No, I’m not assuming that. I think the reason that people are interested in this argument, or one part of this argument that I’m trying to focus on is how long until you have something that’s about a smart as a human, which can probably not destroy the human race, until you’re some degree of smartness at which you can just destroy all of humanity, whether or you might want to.

Katja Grace: Because it’s super, super human type thing.

Daniel Filan: Yeah. I still find this weird, but I’m not sure if I have any good questions about it.

Katja Grace: I agree it’s weird. I guess at the moment we’re doing some more case studies on this and I’m hoping that at the end of them to sit down and think more about what we have found and what to make of it.

Daniel Filan: Yeah. What other case studies are you doing? Because the ones that have been looked at there’s ImageNet, which is image classification, then there’s a bunch of board games and one criticism of this has been, well, maybe that’s too narrow a range of tasks. Have you looked at other things?

Katja Grace: Yeah. I forget which ones are actually up on the website yet. I think we do have StarCraft up probably. And maybe that was like 20 years or something.

Daniel Filan: 21 years.

Katja Grace: We have some that are further away from AI, like clock stability. How well can you measure a period of time using various kinds of clocks or your mind. Where that one was about 3000 years, apparently according to our tentative numbers.

Daniel Filan: 3000 years between…

Katja Grace: For our automated time measuring systems to go between the worst person at measuring time and the best, according to our own, somewhat flawed efforts to find out how good different people are at measuring time. I think it wasn’t a huge sample size. We just got different people to try and measure time in their head, and we tried to figure out, I think, how good professional drummers were at this or something.

Daniel Filan: That seems shockingly long.

Katja Grace: Well, this all happened quite a long time ago when everything was happening more slowly.

Daniel Filan: When did we hit superhuman timekeeping ability?

Katja Grace: I think in the 1660s when we got well adjusted pendulum clocks, according to my notes here, but we haven’t finished writing this and putting it up. So I’m not sure if that’s right.

Daniel Filan: So you’ve looked at timekeeping ability. Are there any other things that we might expect to see on the website soon?

Katja Grace: Frequency discrimination in sounds where it tentatively looks like that one’s about 30 years. Speech transcription, which is, I think quite hard because there aren’t very good comparable measures of human or AI performance.

Daniel Filan: I will say with my experience with making this podcast and using AI speech transcription services, it seems to me to be that the commercially available ones are not yet at the Daniel quality range, although they do it much faster.

Katja Grace: There are also things that we haven’t looked at, but it’s not that hard to think about, like robotic manipulation it seems has been within the range of not superhuman at all sorts of things, but probably better than some humans at various things for a while I think, though I don’t know a huge amount about that. And I guess creating art or something sort of within the human range.

Daniel Filan: Yeah. It could be that we’re just not sophisticated enough to realize how excellent AI art truly is. But yeah, my best guess is that it’s in the human range.

How often technologies have discontinuous progress

Daniel Filan: I’d next like to talk about discontinuities in technological progress and this question of arguments for and against fast takeoff. So how common are discontinuities in technological progress and what is a discontinuity?

Katja Grace: We were measuring discontinuities in terms of how many years of progress at previous rates happened in one go, which is a kind of vague definition. It’s not quite clear what counts as one go, but we’re also open to thinking about different metrics. So if it was like, “Ah, it wasn’t one go, because there were like lots of little gos, but they happened over a period of 10 minutes.” Then we might consider the metric of looked at every 10 minutes did this thing see very sudden progress. We’re basically trying to ask when is there a massive jump in technology on some metric? And we’re trying to explicate that in a way that we can measure. At a high level, how often do they happen? It’s sort of not never, but it’s not that common.

Katja Grace: We didn’t do the kind of search for them where you can easily recover the frequency of them. We looked for ones that we could find by asking people to report them to us. And we ended up finding 10 that were sufficiently abrupt and clearly contributed to more progress on some metric than another century would have seen on the previous trend. So 100 years of progress in one go, and it was pretty clear and robust. Where there were quite a few more where maybe there were different ways you could have measured the past trend or something like that.

Daniel Filan: Are there any cases where people might naively think that there was really big, quick progress or something where there wasn’t actually?

Katja Grace: I guess it’s pretty hard to rule out there being a discontinuity of some sort, because we’re looking for a particular trend that has one in. So if you’re like, the printing press, usually the intuition people have is this was a big deal somehow. So if we’re like, “Ah, was it a discontinuity in how many pages were printed per day or something?” And if it’s not it might still be that it was in some other nearby thing. I think a thing that was notable to me was Whitney’s cotton gin. Our best guess in the end was that it was a moderate discontinuity, but not a huge one. And it didn’t look to me like it obviously had a huge effect on the American cotton industry in that the amount of cotton per year being produced was already shooting up just before it happened. So that still does count as a discontinuity, I think. But yeah, it looks like much less of a big deal than you would have thought, much more continuous.

Katja Grace: I think maybe penicillin for syphilis didn’t seem discontinuous in terms of the trends we could find. And one reason it didn’t, I think, there were some things where we could get clear numbers for how many people were dying of syphilis and that sort of thing. And there it’s kind of straightforward to see it doesn’t look like it was a huge discontinuity. It sounds like it was more clearly good on the front of how costly was it to get treatment. It seems like for the previous treatment, people just weren’t even showing up to get it because it was so bad.

Daniel Filan: Bad in terms of side effects?

Katja Grace: Side effects. Yeah. But interestingly, it was nicknamed the magic bullet because it was just so much better than the previous thing. Over the ages syphilis has been treated in many terrible ways, including mercury and getting malaria because malaria, or I guess the fever, helps get rid of the syphilis or something.

Daniel Filan: And those are bad ways to treat syphilis?

Katja Grace: I think they’re both less effective and also quite unpleasant in themselves or they have bad side effects. So the thing prior to penicillin was kind of already quite incredible in terms of it working and not being as bad as the other things I think. And so then even if penicillin was quite good on that front comparably it didn’t seem like it was out of the distribution for the rate at which progress was already going at that sort of thing.

Daniel Filan: So I guess the reason that you would be interested in this is you want to know are big discontinuities the kind of thing that ever happens? And if they are the kind of thing that ever happens, then it shouldn’t take that much evidence to convince us that it’ll happen in AI. And so in assessing this I could look at the list of things that you guys are pretty sure were discontinuous on some metric. I think there’s this additional page which is things that might be discontinuities that you haven’t checked yet. And there are a lot of entries on that page, it kind of looked like to me. I’m wondering if you could forecast, what’s your guess as to the percentage of those that you would end up saying, “oh yeah, this seems quite discontinuous.”

Katja Grace: I guess among the trends that we did look into, we also carefully checked that some of them didn’t seem to have discontinuities in them. So I guess for trying to guess how many there would be in that larger set, I think maybe something like that fraction, where [inaudible 00:48:14] you’re looking at these very big robust discontinuities, we found 10 of them in 38 different trends where some of those trends were for the same technology, but there were different ways of measuring success or something like that. And some of the trends, they could potentially have multiple discontinuities in, but yeah so it’s roughly something like 10 really big ones in 38 trends. Though that’s probably not quite right for the fraction in the larger set, because I think it’s easier to tell if a thing does have a big discontinuity than to show that it doesn’t probably.

Katja Grace: Although, well, maybe that’s not right. I think showing that it does is actually harder than you might think, because you do have to find enough data leading up to the purported jump to show that it was worse just beforehand, which is often a real difficulty. Or something looks like a discontinuity, but then it’s not because there were things that just weren’t recorded that were almost as good or something. But yeah, I guess you could think more about what biases there are in which ones we managed to find. I think the ones that we didn’t end up looking into, I think are a combination of ones that people sent in to us after we were already overwhelmed with the ones we had, and ones where it was quite hard to find data, if I recall. Though some of this work was done more in like 2015. So I may not recall that well.

Daniel Filan: I think my takeaway from this is that discontinuities are the kind of thing that legitimately just do happen sometimes, but they don’t happen so frequently that you should expect them. For a random technology you shouldn’t necessarily expect one, but if you have a decent argument it’s not crazy to imagine there being a discontinuity. Is that a fair…

Katja Grace: I think that’s about right. I’m not quite sure what you should think over the whole history of a technology, but there’s also a question of suppose you think there’s some chance that there’s a discontinuity at some point, what’s the chance of it happening at a particular level of progress. It seems like that’s much lower.

Daniel Filan: Yeah that seems right. At least on priors.

Arguments for and against fast takeoff of AI

Daniel Filan: I guess I’d like to get a bit into these arguments about fast takeoff for artificial intelligence, and in particular this argument that it won’t take very long to hit the human range, to go through the human range to very smart. Somehow something about that will be quite quick. And my read of the AI impact page about arguments about this is that it’s skeptical of this kind of fast takeoff. Is that a fair summary of the overall picture?

Katja Grace: I think that’s right. Yeah. I think there were some arguments where they didn’t seem good and some where it seemed like they could be good with further support or something. And we don’t know of the support, but maybe someone else has it, or it’d be a worthwhile project to look into it and see if we could make it stronger.

Daniel Filan: I’d like to discuss a few of the arguments that I think I, and maybe some listeners might have initially found compelling and we can talk about why they may or may not be. I think one intuitive argument that I find interesting is this idea of recursive self-improvement. So we’re going to have AI technology, and the better it is, the higher that makes the rate of progress of improvement in AI technology, and somehow this is just the kind of thing that’s going to spiral out of control quickly. And you’ve got humans doing AI research and then they make better humans doing better AI research. And this seems like the kind of thing that would go quickly. So I’m wondering why you think that this might not lead to some kind of explosion in intelligence?

Katja Grace: I do think that it will lead to more intelligence and increasing intelligence over time. It seems like that sort of thing is already happening. I’d say we’re sort of already in an intelligence explosion that’s quite slow moving, or these kinds of feedback effects are throughout the economy. And so there’s a question of, should you expect this particular one in the future to be much stronger and take everything into a new regime? And so I think you need some sort of further argument for thinking that that will happen. Whereas currently we can make things that make it easier to do research and we do. And I guess there’s some question of whether research is going faster or slower or hard to measure perhaps the quality of what we learn, but this kind of feedback is the norm. And so I guess I think you could make a model of these different things, a quantitative model, and say, “Oh yeah. And when we plug this additional thing in here it should get much faster.” I haven’t seen that yet. But maybe someone else has it.

Daniel Filan: One reason that this seems a little bit different from other feedback loops in the economy to me is that with AI it seems like, I don’t know, one feedback loop is you get a bit better at making bread, then everybody grows up to be a little bit stronger. And everyone’s a little bit more healthy, or the population’s a little bit bigger and you have more people who are able to innovate on making bread. And that seems like a fairly circuitous path back to improvement. Whereas with machine learning, it seems like making machine learning better is somehow very closely linked to intelligence. Somehow. If you can make things generally smarter then somehow that might be more tightly linked to improving AI development than other loops in the economy, and that’s why this might - this argument is less compelling now that it’s out of my head than when it was in it. But I’m wondering if you have thoughts about that style of response.

Katja Grace: All things equal, it seems like tighter loops like that are probably going to go faster. I think, I guess, I haven’t thought through all of that, but I think also that there are relatively tight ones where for instance, people can write software that immediately helps them write better software and that sort of thing. And I think you could make a fairly similar argument about softwarey computery things that we already have. And so far that hasn’t been terrifying. Perhaps, but then you might say, and we’ve already seen the start of it and it seems like it’s a slow takeoff so far. I don’t necessarily think that it will continue to be slow or I think maybe continuing economic growth longterm has been speeding up over time. And if it hadn’t slowed down in the ’50s or something I guess, maybe we would expect to see a singularity around now anyway. So maybe yeah, the normal feedback loops in technology and so on, eventually you do expect them to get super fast.

Katja Grace: I think there’s a different question of whether you expect them to go very suddenly from much slower to much faster. I guess I usually try to separate these questions of very fast progress at around human level AI into fast progress leading up to, I guess maybe Nick Bostrom calls it crossover or something? So some point where the AI becomes good enough to help a bunch with building more AI. Do you see fast progress before that? And do you see fast progress following that where the intelligence explosion would be following that?

Katja Grace: And the discontinuity stuff that we’ve done is more about before that, where I think a strong intelligence explosion would be a sufficiently weird thing that, I wouldn’t just say, “Oh, but other technologies don’t see big discontinuities.” I think that would be a reasonable argument to expect something different, but for an intelligence explosion to go from nothing to like very fast that seems like there has already been a big jump in something. It seems like AI’s ability to contribute to AI research kind of went from meh to suddenly quite good. So part of the hope with the discontinuity type stuff is to address should we expect an intelligence explosion to appear out of nowhere or to more gradually get ramped up?

Daniel Filan: Yeah. And I guess that the degree of gradualness matters because you sort of have a warning beforehand.

Katja Grace: And maybe the ability to use technologies that are improving across the board perhaps to make the situation better in ways.

Daniel Filan: Yeah. Whereas if it were just totally out of the blue then on any given day, could be tomorrow, so we just better be constantly vigilant. I guess I’d like to talk a bit about the second argument that I’m kind of interested in here, which is sort of related to the human skill range discussion. So as is kind of mentioned, on an evolutionary timescale it didn’t take that long to get from ape intelligence to human intelligence. And human intelligence seems way better than ape intelligence. At least to me. I’m biased maybe, but we build much taller things than apes do. We use much more steel. It seems better in some objective ways or more impactful.

Daniel Filan: And you might think - from this, you might conclude, it must be the case that there’s some sort of relatively simple, relatively discrete genetic code or something. There’s this relatively simple algorithm that once you get it, you’re smart. And if you don’t get it, you’re not smart. And if there is such a simple intelligence algorithm that is just much better than all of the other things then maybe you just find it one day and on that day, before you didn’t have it and you were like 10 good. And now you do have it. And you’re like 800 good or something, or 800 smart. I’m wondering, so this is sort of related, it’s maybe exactly the same thing as one of the considerations that you have that there are counter-arguments too, but I’m wondering if you could say what seems wrong to you about that point of view?

Katja Grace: I guess I haven’t thought about this in a while, but off the top of my head at the moment, a thing that seems wrong is, I guess there’s kind of an alternate theory of how it is that we’re smart, which is more like we’re particularly good at communicating, and building up cultural things and other apes aren’t able to take part in that, and more gave us an ability to build things up over time, rather than the particular mind of a human being such that we do much better. And I guess I’m not sure what that says about the situation with AI.

Daniel Filan: Perhaps it means that you can’t make strong inferences of the type humans developed evolutionary quickly therefore there’s a simple smartness algorithm. Yeah. So it sounds like to the extent that you thought that human ability was because of one small algorithmic change, if it’s the case that it was actually due to a lot of accumulated experience and greater human ability to communicate and learn from other people’s experience that would block the inference from human abilities to quick takeoffs.

Katja Grace: That seems right. It seems like it would suggest that you could maybe quickly develop good communication skills, but if an AI has good communication skills it seems like that would merely allow it to join the human giant pool of knowledge that we share between each other. And if it’s a bit smarter than humans or even if it’s a lot smarter, I think on that model maybe it can contribute faster to the giant pool of knowledge, but it’s hard for it to get out ahead of all of humanity based on something like communication skills.

Katja Grace: But I also have a different response to this, which is you might think that whatever it is that the way that humans are smart is sort of different from what monkeys are trying to do or something. It wasn’t like evolution was pushing for better building of tall buildings or something. It seemed more like if we have to fight on similar turf to monkeys, doing the kinds of things that they have been getting better at, fighting in the jungle or something, it’s just not clear that individual humans are better than gorillas. I don’t know if anyone’s checked. So you might have a story here that’s more like, well, we’re getting gradually better at some cognitive skills. And then at some point we sort of accidentally used them for a different kind of thing.

Katja Grace: I guess an analogy is a place I do expect to see discontinuous progress is where there was some kind of ongoing progress in one type of technology and it just wasn’t being used for one of the things it could be used for. And then suddenly it was. If after many decades of progress in chess someone decides to write a program for shmess, which is much like chess but different, then you might see shmess progress to suddenly go from zero to really good. And so you might think that something like that was what happened with human intelligence to the extent that it doesn’t seem to be what evolution was optimizing for. It seems like it was more accidental, but is somehow related to what monkeys are doing.

Daniel Filan: But does that not still imply that there’s some like core insight to doing really well at shmess that you either have or you don’t?

Katja Grace: Maybe most of doing well at shmess is the stuff you got from doing well at chess. And so there’s some sort of key insight in redirecting it to shmess, but if the whole time you’re trying, if you’re starting from zero and trying to just do shmess, it would have also taken you about as long as chess. There, I guess in terms of other apes, it would be a lot of what is relevant to being intelligent they do have, but somehow it’s not well oriented toward building buildings or something.

Daniel Filan: So, one thing that I’m interested about with this work is it’s relationship to other work. So I think the most previous, somewhat comprehensive thing that was trying to answer this kind of question is Intelligence Explosion Microeconomics. It’s unfortunate in that it’s trying to talk very specifically about AI, but it was published I think right before deep learning became at all important, but I’m wondering if you have thoughts about that work by Eliezer Yudkowsky and what you see as the relationship between what you’re writing about and that?

Katja Grace: I admit that I don’t remember that well what it says at this point, though I did help edit it as a, I guess, junior MIRI employee at the time. What I mostly recall is that it was sort of suggesting a field of looking into these kinds of things. And I do think of some of what AI Impacts does at least as hopefully answering that to some extent, or being work in that field that was being called for.

Daniel Filan: I guess a related question that you are perhaps in a bad position to answer is that one thing that was notable to me is that you published this post, and I think a lot of people still continued believing that there would be fast progress in AI, but it seemed like the obvious thing to do would have been to write a response, and nobody really seemed to. I’m wondering, am I missing something?

Katja Grace: I guess I am not certain about whether anyone wrote a response. I don’t think I know of one, but there are really a lot of written things to keep track of. So that’s nice. Yeah. I think I don’t have a good idea of why they wouldn’t have written a response. I think maybe at least some people did change their minds somewhat as a result of reading some of these things. But yeah, I don’t think people’s minds change that much probably overall.

Coherence arguments

Daniel Filan: So I guess somewhat related to takeoff speeds, I guess I’d like to talk about just existential risk from AI more broadly. So you’ve been thinking about this recently a bit, for instance, you’ve written this post on coherence arguments and whether they provide evidence that AIs are going to be very goal directed and trying to achieve things in the world. I’m wondering what have you been thinking about recently, just about the broader topic of existential risk from AI?

Katja Grace: I’ve been thinking a bunch about this specific sort of sub-sub-question of, is this a reason to think that AI will be agentic where it’s not super clear that it needs to be agentic for there to be a risk, but in the most common kind of argument, it being agentic plays a role. And so I guess things I’ve been thinking about there are, I’m pretty ignorant about this area, I think other people who are much more expert, and I’m just sort of jumping in to see if I can understand it and write something about it quickly for AI Impacts basically. But then I got sidetracked in various interesting questions about it to me which seemed maybe resolvable.

Katja Grace: But I guess one issue is I don’t quite see how the argument goes that says that if you’re incoherent then you should become coherent. It seems like logically speaking if you’re incoherent, I think it’s a little bit like believing a false thing. You can make an argument that it would be good to change in a way to become more coherent. But I think you can also sort of construct a lot of different things being a good idea once you have circular preferences or something like that. So I think when I’ve heard this argument being made, it’s made in a sort of hand-wavy way, that’s like, “Ah.” I don’t know if listeners might need more context on what coherence arguments are.

Daniel Filan: Yeah let’s say that. What is a coherence argument?

Katja Grace: Yeah. Well, I think a commonly mentioned coherence argument would be something like, a way that you might have incoherent preferences is if your preferences are circular, say. If you want a strawberry more than you want a blueberry, and you want a blueberry more than you want a raspberry, and you want a raspberry more than you want a strawberry. And the argument that that is bad is something like, “Well, if you have each of these preferences, then someone else could offer you some trade where you pay some tiny sum of money to get you one more, and you will go around in a circle and end up having spent money. And so that’s bad, but the step where that’s bad, I think is kind of relying on you having a utility function that is coherent or something. Or I think you could equally say that having lost money there is good due to everything being similarly good or bad basically.

Daniel Filan: Well, hang on. But if you value money, it seems like this whole chain took you from some state and having some amount of money to the same state, but having less money. So doesn’t that kind of clearly seem just bad if your preferences are structured such that at any state you’d rather have more money than less?

Katja Grace: I guess I’m saying that if your basic preferences say are only these ones about different berries relative to each other, then you might say, “All right. But you like money because you can always buy berries with money.” But I’m saying yeah, you could go that way and say, “Money equals berries.” But you can also say, “Oh, negative money is the same as this cycle that I like.” Or whatever. I guess I’ve been thinking of it as, well you have indifference curves across different sets of things you could have. And if you have incoherent preferences in this way, it’s sort of like you have two different sets of indifference curves and they just cross and hit each other. And then it’s just a whole web of things that are all equivalent, or there’s some way of moving around the whole web. But these are somewhat incoherent thoughts. Which is like, “What have I been thinking about lately?”

Katja Grace: It seems like in practice, if you look at humans, say, I think they do become more coherent over time. They do sort of figure out the logical implications of some things that they like, or other things that they now think they like. And so I think probably that does kind of work out, but it would be nice to have a clearer model of it. So I guess I’ve been thinking about what is a better model for incoherent preferences since it’s not really clear what you’re even talking about at that point.

Daniel Filan: Do you have one?

Katja Grace: A tentative one, which is, I guess it sort of got tied up with representations of things. So my tentative one is, instead of having a utility function you have something like a function that takes representations of states, it takes a set of representations of states and outputs an ordering of them. And so the ways you can be incoherent here are, you don’t recognize that two representations are of the same state, and so you sort of treat them differently. Also because it’s just from subsets to orderings, it might be that if you had a different subset your response to that would not be coherent with your responses to other subsets, if that makes sense.

Daniel Filan: And then in, in this model why would you become more coherent? Or would you?

Katja Grace: I guess the kind of dynamic I’m thinking of where you become more coherent is - you’re continually presented with options and some of your options affect what your future representation to choice function is. Or I guess it doesn’t really matter whether you call it yours or some other creature’s in the future, but you imagine there are some choices that bear on this. And so if I’m deciding what will I do in this future situation? And I’m deciding it based on - if I currently have a preference, say currently I like strawberries more than raspberries or something, and I see that in the future when I’m offered strawberries, raspberries, or bananas, I’m going to choose raspberries instead. That if I can do the logical inference that says this means I’m going to get raspberries when I would have wanted strawberries or something.

Katja Grace: I guess there’s some amount of work going on where you’re making logical inferences that let you equate different representations with each other, and then having equated them when you have a choice to change your future behavior, or your future choice function, then some of the time you’ll change it to be coherent with your current one, or your one where you’re making a different set of choices. And so very gradually I think that would bring things into coherence with one another is my vague thought.

Daniel Filan: It’s relatively easy to see how that should iron out dynamic inconsistency, where your future self wants different types of things than your current self. One thing that seems harder to me is, suppose right now if you say, “Which do you prefer a strawberry or blueberry?” And I say, “I’d rather have a strawberry.” And then you say, “Okay, how do you feel about strawberries, blueberries, bananas?” And then I say, “Blueberries are better than strawberries, which are better than bananas.” Then when I’m imagining my future self it’s not obvious what the force is, pushing those to be the same. Right?

Katja Grace: I think I was imagining that when you’re imagining the future thing, there are different ways you can represent the future choice. And so you’re kind of randomly choosing one between it. So sometimes you do see it as, “oh, I guess I’m choosing strawberry over raspberry here.” Or whatever it was. Or sometimes you represent them in some entirely different way. Sometimes you’re like, “Oh, I guess I’m choosing the expensive one. Is that what I wanted?” And so sometimes you randomly hit one where the way you’re representing it, your current representations of things, or I guess, sorry, they’re all your current ones, but when you’re choosing what your future one will be, and you’re representing the future choice in a different way to the way you would have been at the time, then you change what it would be at the time, or change what the output of the function is on the thing that it would be at the time, which I think has some psychological realism. But yeah, I don’t know if other people have much better accounts of this.

Arguments that AI might cause existential catastrophe, and counter-arguments

Katja Grace: But I guess you originally asked what I’ve been thinking about lately in the area of arguments about AI risk, I guess this is one cluster of things. Another cluster is what is the basic argument for thinking that AI might kill everyone? Which is quite a wild conclusion, and is it a good argument? What are counter arguments against it? That sort of thing. So I could say more about that if you’re interested.

Daniel Filan: Yeah. I am looking to talk about this a bit more, so yeah. What are your thoughts on why AI might or might not do something as bad as killing everyone?

Katja Grace: I take the basic argument that it might to be something like, it seems quite likely that you could have superhuman AI, AI that is quite a lot better than any human. I could go into details. I guess I could go to details in any of these things, but I’ll say what I think the high level argument is. It seems pretty plausible that we could develop this relatively soon. Things are going well in AI. If it existed it’s quite likely that it would want a bad sort of future in the sense that there’s a decent chance that it would have goals of some sort. There’s a decent chance its goals would not be the goals of humans, and goals that are not the goals of humans are sort of bad often. Or that’s what we should expect if they’re not the same as our goals that we would not approve.

Katja Grace: And so I guess I would put those, it has goals, they’re not human goals. Non-human goals are bad, all sort of under superhuman AI would by default want a future that we hate. And then also if it wants a future that we hate, it will cause the future to be pretty bad, either via some sort of short-term catastrophe where maybe it gets smart very fast, and then it’s able to take over the world or something, or via longer term reduction of human control over the future. So that would be maybe via economic competition or stealing things well, but slowly, that sort of thing.

The size of the super-human range of intelligence

Katja Grace: So arguments against these things, or places this seems potentially weak. I think superhuman AI is possible. I guess in some ways it seems pretty clear, but I think there’s a question of how much headroom there is in the most important tasks. There are clearly some tasks where you can’t get much better than humans just because they’re pretty easy for us or something. Tic Tac Toe is an obvious one, but you might think, all right, well if things are more complex though, obviously you can be way better than humans. I think that just sort of seems unclear for particular things, how much better you can be in terms of value that you get from it or something.

Daniel Filan: I guess it depends on the thing. One way you could get evidence that there would be headroom above present human abilities is if humans have continued to get better at a thing and aren’t stopping anytime soon. So for instance, knowing things about math, that seems like a case where humans are getting better and it doesn’t look like we’ve hit the top.

Katja Grace: It seems like there you could distinguish between how many things can know, and how fast can you make progress at it or something, or how quickly can you accrue more things? It seems pretty likely you can accrue more things faster than humans. Though you do have to compete with humans with whatever kind of technology it is to use in some form, but not in their own brains potentially. AI has to compete with humans who have calculators or have the best software or whatever in a non-agentic form.

Daniel Filan: Yeah. And I guess another task at which I think there’s been improvement is, how well can you take over the world against people at a certain level of being armed? Or maybe not the whole world, I think that’s unusual, but at least take over a couple of countries. That seems like the kind of thing that happens from time to time in human history. And people have kind of got better armed, and more skilled at not being taken over, I think in some ways. And yet people still keep on occasionally being taken over, which to me suggests a degree of progress in that domain that I’m not sure is flattening out.

Katja Grace: That seems right. I definitely agree that there should be a lot more tech progress. It seems like, again, there’s a question of how much total progress is there ever, and there’s how fast - the thing that humans are doing in this regard is adding to the tech. And so there’s a question of how much faster than a human can you add to the tech? Seems pretty plausible to me that you can add to it quite fast, though again compared to humans who also have various non-agentic technology it’s less clear, and then it’s not clear how much of the skills involved in taking over the world or something are like building better tech. I guess you might think that just building better tech alone is enough to get a serious advantage, and then using your normal other skills maybe you could take the world. Yeah. And it seems plausible to me that you can knock out this counter argument, but it just sort of needs better working out I think, or it’s a place I could imagine later on being like, “Oh, I guess we were wrong about that. Huh?”

Daniel Filan: Yeah. I guess maybe moving on. So the first step in the argument was we can have smarter than human intelligences.

Katja Grace: I guess the next one was we might do it soon. But yeah, I guess that was just “might” anyway, but I don’t have any particularly interesting counter arguments to that except maybe not.

The dangers of agentic AI

Katja Grace: But yeah, the next one after that was, if superhuman AI existed it would by default threaten the whole future where that is made up of: it would by default want a future that we hate and it would get it, or it would destroy everything as a result. And so I guess there there’s “it has goals”. I think that’s one where again, I could imagine in 20 years being like, “Oh yeah, somehow we were just confused about this way of thinking of things.”

Katja Grace: In particular, I don’t know, it seems like as far as I know we’re not very clear on how to describe how to do anything without the destroying the world. It seems like if you describe anything as maximizing something maybe you get some kind of wild outcome. It seems like in practice that hasn’t been a thing that’s arisen that much. Or, you know, we do have small pigs or something, and they just go about being small pigs and it’s not very terrifying. You might think that these kinds of creatures that we see in the world, it’s possible to make a thing like that that is not aggressively taking over the world.

Daniel Filan: I think people might be kind of surprised at this claim that it’s very hard to write down some function that maximizing it doesn’t take over the world. Can you say a little bit more about that?

Katja Grace: I guess one thing to say is, “Well, what would you write down?”

Daniel Filan: I don’t know, like make me a burrito, please. It seems like a thing I might want a thing to do. It doesn’t seem naively perhaps likely to destroy the world. I often ask people to make burritos for me, and maybe they’re not trying hard enough but…

Katja Grace: Maybe the intuition behind this being hard would be something like, well, if you’re maximizing that in some sense, are you making it extremely probable that you make a burrito? Or making a burrito very fast? Or very many burritos or something? Yeah. I’m not sure what the strongest version of this kind of argument is.

Daniel Filan: Make the largest possible burrito maybe. I guess it sort of depends how you code up what a burrito is.

Katja Grace: I guess maybe if you tried to write it as some sort of utility function, is it one utility if you have a burrito and zero otherwise or something? Then it seems like maybe it puts a whole bunch of effort into making really sure that you have the burrito.

Daniel Filan: Yeah. To me, that seems like the most robust case where utility functions always love adding more probability. It’s always better.

Katja Grace: It seems like maybe in practice building things is more like, you can just say, “Give me a burrito.” And it turns out to be not that hard to have a thing get you a burrito without any kind of strong pressure for doing something else. Although you might think, “Well, all right, sure. You can make things like that. But surely we want to make things that are more agent-like.” Since there’s at least a lot of economic value in having agents it seems like, as in people who act like agents often get paid for that.

Daniel Filan: As long as they don’t destroy the world.

Katja Grace: Maybe even if they do slowly they can be paid for that. You could imagine a world where it’s actually quite hard to make a thing that’s a strong agent in some sense. And that really deep down, everything is kind of like a sphex that’s just responding to things. It’s like, yep. When I see a stoplight, then I stop. And when I see a cookie on a plate, I put my hand out to fit it in my mouth. And you can have more and more elaborate things like that that sort of look more and more agent-like, and you can probably even make ones that are more agent-like than humans. And that is destructive in ways or dangerous, but it’s not like they’re arbitrarily, almost magically seeming agent-like where they think of really obscure things instantly and destroy the world. It’s more like every bit of extra agenty-ness takes effort to get, and it’s not perfect.

Daniel Filan: Although on that view you could still think, “Well, we make things that are way more agent-like than people. And they have the agenty problems way more than people have them.” I guess in that case, maybe you just think that it’s quite bad, but not literally destroying the world. And then you move on to something else that might destroy the world actually.

Katja Grace: Yeah. Something like that. Or once you’re in the realm of this is a quantitative question of what it’s like and how well the forces that exist in society for dealing with things like that can fight against them, then it’s not like automatically the world gets destroyed at least. If looking back from the future I’m like, “Oh, the world didn’t get destroyed. And why was that?” I think that’s another class of things. The broader class being something like, I am just sort of confused about agency and how it works and what these partial agency things are like, and how easy it is to do these different things.

Daniel Filan: So, yeah, it seems to be that one inside view reason for expecting something like agency in AI systems, is that the current way we build AI systems is we are implicitly getting the best thing in a large class of models at achieving some task. So you have neural nets that can have various weights, and we have some mechanism of choosing weights that do really well at achieving some objective. And you might think, “well, okay, it turns out that being an agent is really good for achieving objectives.” And that’s why we’re going to get agents with goals. I’m wondering what you think about that.

Katja Grace: Do you think there is a particular reason to think that agents are the best things for achieving certain goals?

Daniel Filan: Yeah. I mean, I guess the reason is, what do you do if you’re an agent? You sort of figure out all the ways you could achieve a goal and do the one that achieves it best.

Katja Grace: Yeah. That seems right.

Daniel Filan: And that’s kind of a simple to describe algorithm, both verbally in English and in terms of if I had to write code to do it. It’s not that long. But to hard code the best way of doing it would take a really long time. It’s easier to program something like a very naive for-loop over all possible ways of playing Go, and say, “The best one, please.” I think that’s easier than writing out all the if statements of what moves you actually end up playing.

Katja Grace: I mean, I’m not really an expert on this, but it seems like there’s a trade-off where for the creature running in real time it takes it a lot longer to go through all of the possibilities and then to choose one of them. So if you could have told it ahead of time what was good to do, you might be better off doing that. In terms of maybe you put a lot of bits of selection initially into what the thing is going to do, and then it just carries it out fast. And I would sort of expect it to just depend on the environment that it’s in, where you want to go on this.

Daniel Filan: Yeah. Well, I mean, in that case, it’s like you’re-

Katja Grace: Maybe you’re right, even if I was right here there would be a lot of environments probably where the best thing is the agent-like one. Or maybe many more where it’s some sort of intermediate thing where there’re some things that get hard-coded because they’re basically always the same. You probably don’t want to build something that as soon as you turn it on, it just tries to figure out from first principles what the world is or something. If you just already know some stuff about the world, it’s probably better to just tell it.

Daniel Filan: Yeah. And you definitely don’t want to build a thing that does that every day when it wakes up.

Katja Grace: But you might think then that within the things that might vary, it’s good for it to be agentic and therefore it will be.

Daniel Filan: Yeah. Or also I kind of think that when you’re looking over possible specific neural networks to have, it’s just easier to find the relatively agent-like ones, because they are somehow simpler, or there are more of them roughly. And then once you get an agent-like thing you can tweak it a bit. Or more of them for a given performance level maybe. It’s a bit hard to be exact about this without a actual definition of what an agent is.

Katja Grace: Yeah. That seems right. So you were just saying that maybe agents are just much easier to find in program space than other things.

Daniel Filan: Yeah. I kind of think that.

Katja Grace: That seems plausible. I haven’t given it much thought. It seems like you would want to think more about [inaudible 01:25:38].

Daniel Filan: And then I think the next step in the argument was that the agent’s goals are typically quite bad or did we already talk about that?

The difficulty of human-compatible goals

Katja Grace: No, we were just talking about it has goals and then they’re quite bad, which I was organizing it as it’s goals are not human goals, and then by default non-human goals are pretty bad. Yeah, I guess I think it’s sort of unclear. Human goals are not a point. There’s sort of some larger cloud of different goals that humans have. It’s not very clear to me how big that is compared to the space of goals. And it seems like in some sense, there are lots of things that particular humans like having for themselves. It seems like if you ask them about utopia and what that should be like it might be quite different. At least to me, it’s like pretty unclear whether for a random person, if they got utopia for them, whether that would clearly be basically amazing for me, or whether it has a good chance of being not good.

Katja Grace: I guess there’s a question of AI, if you tried to make it have human goals, how close does it get, is it basically within that crowd of that cluster of human things, but not perfectly the human you are trying to get? Or is it far away such that it’s much worse than anything we’ve ever dealt with? Where if it’s not perfectly your goals, if we were trying to have it be your goals, but it’s way closer than my goals are to your goals, then you might think that this is a step in a positive direction as far as you’re concerned. Or maybe not, but it’s at least not much worse than having other humans, except to the extent that maybe it’s much more powerful or something and it isn’t exactly anyone’s goals.

Daniel Filan: So I think the idea here is twofold. Firstly, there’s just some difficulty in specifying what do I mean by human goals? If I’m trying to like load that into an AI, what kind of program would generate those? Okay there are three things. One of the things is that. The second thing is some concern that look, if you’re just searching over a bunch of programs to see which program does well on some metric, and you think the program you’re getting is doing some kind of optimization, there are probably a few different metrics that fit well with succeed. But if you have objective A, you would probably have objectives B, C, D and E that also motivate you to do good stuff in the couple of environments that you’re training your AI on. But perhaps B, C, D and E might imply very different things about the rest of the future. For instance, one example of a goal B would be play nice until you see that it’s definitely the year 2023, in which case go wild. I think the claim has to be firstly, that there is such a range.

Katja Grace: Sorry, such a range?

Daniel Filan: A range of goals such that if an optimizer had that goal it would still look good when you’re testing it. Which is basically how we’re picking AI algorithms. And you have to think that range is big, and that it’s much larger than the range you’d want, than an acceptable range of within human error tolerance or something. Yeah. I think there, listeners can refer back to episode 4 with Evan Hubinger for some discussion of this, but I’m wondering what do you think about that line of argument?

Katja Grace: So it sounds like you’re saying, “All right, there are sort of two ways this could go wrong, where one of them is we try to get the AI to have human goals, and we do it sort of imperfectly so then it’s bad.” Which is more like what people were concerned about further in the past. And then more recently there’s Evan’s line of argument that’s, “Well, whatever goals you’re even trying to give it, it won’t end up with those goals because you can’t really check which goals it has.” It will potentially end up deceiving you and having some different goals.

Daniel Filan: I see those as more unified than they are are often presented as actually. Now you’re interviewing me and I get to go on a rant. I feel like everyone keeps on talking about, “Oh, everybody’s totally changed their mind about how AI poses a risk” or something. And I think some people have, or I don’t know, I’ve changed my mind about some things, but both of the things there seem to me that they fit into like the difficulty of writing a program that gets an agent to have the motivations that you want it to have.

Katja Grace: That seems right.

Daniel Filan: There’s this problem of specification. Part of the problem of specification is you can’t write it down, and part of the problem of specification is there are a few blocks in learning it. Like, firstly, it’s hard to come up with a formalism that works on-distribution, and then it’s hard to come up with a formalism that also works in worlds that you didn’t test on. To me these seem more unified than they’re typically presented as. But when lots of people agree with me often I’m the one who’s wrong.

Katja Grace: To me it seems like they’re unified in the sense that there’s a basic problem that we don’t know how to get the values that we like into an AI, and they’re maybe different in that well, we considered some different ways of doing it and each one didn’t seem like it would work. One of them was somehow write it down or something. I guess maybe that one is also try to get it to learn it without paying attention to this other problem that would arise. Yeah. I don’t know. They seem like sort of slightly different versions of a similar issue I would say. In which case maybe I’m mostly debating the non-mesa optimizer one, and then maybe the mesa-optimizer one, being Evan’s one.

Daniel Filan: That being the one about optimizers found by search.

Katja Grace: Yeah. Originally you might think, “All right, if you find an optimizer by search that looks very close to what you want, but maybe it’s not exactly what you want then is it that bad?” If you make a thing that can make faces, do they just look like horrifying things that aren’t real human faces once you set them out making faces? No, they basically look like human faces except occasionally I guess. So that’s a counter argument.

Daniel Filan: I mean, sometimes they have bizarre hats, but that doesn’t seem like too bad.

Katja Grace: I think sometimes they’re kind of melting and green or something, but I assume that gets better with time. But that’s like, “Oh, you’re missing the mark a little bit. And how bad is it to miss the mark a little bit.” Where I guess there’s some line of argument about value is fragile and if you might miss the mark a little bit it’ll be catastrophic. Whereas the mesa-optimizer thing is more like, “No, you’ll miss the mark a lot because in some axis you were missing it a little in that it did look like the thing you wanted, but it turns out there are things that are just extremely different to what you wanted that do look like what you wanted by your selection process or something.” And I guess that seems like a real problem to me. I guess, given that we see that problem I don’t have strong views on how hard it is to avoid that and find a selection process that is more likely to land you something closer to the mark.

Daniel Filan: Yeah. I should say, we haven’t seen the most worrying versions of this problem. There have been cases where we train something to play Atari and it sort of hacks the Atari simulator. But there have not really been cases where we’ve trained something to play Atari, and then it looks like it’s playing Atari, and then a month later it is trying to play chess instead or something like that. And yeah, so it’s not quite the case that, “Oh this, I’m just talking about a sensible problem that everyone, that we can tell exists.”

Katja Grace: If it’s intrinsically a hard-to-know-it-exists problem, though. If the problem is like your machine would pretend it was a different thing to what it really is, it’s sort of naturally confusing.

Daniel Filan: I guess we’ve covered… We’ve now gone into both the questions of AI having different goals to humans and slight differences being extremely bad.

The possibility of AI destroying everything

Katja Grace: Yeah, then perhaps there’s the other prong here of the AI being terrible, which is, if it wanted a future that we hate, it would destroy everything.

Daniel Filan: Or somehow get a future that we hate.

Katja Grace: Where maybe I have more to say on the slow, getting a future that we hate… Where I guess, a counter argument to the first one is it’s just not obvious whether you should expect it to just rapidly be extremely amazing, but perhaps there’s some reason there’s some chance for that. But as far as counter arguments to the other thing goes: it seems like in the abstract, it kind of rests on a simple model that’s, more competent actors tend to accrue resources either via economic activity or stealing.

Katja Grace: And that seems true, but it’s all things equal and there are lots of other things going on in the world. And I think just a very common error to make in predicting things is you see that there is a model that is true, and you sort of don’t notice that there are lots of other things going on. I think in the world at the moment, it doesn’t seem the case that the smartest people are just far and away winning at things. I mean they’re doing somewhat better often, but it’s very random and there are lots of other things going on.

Daniel Filan: It does basically seem to me that people who are the most competent, not all of them are succeeding amazingly, but it seems like they have a way higher rate of amazing success than everyone else. And I don’t know, I hear stories about very successful people and they seem much more competent than me or anybody I know. And some people are really amazingly, seem to me to be really amazingly successful.

Katja Grace: I don’t know if that matches my experience? But also I guess it could be true. And also just a lot of very competent people also, aren’t doing that well, where yeah, it does increase your chance, but it’s such that the people who are winning the most, everything has gone right for them. They’re competent and also they have good social connections and things randomly went well for them. Where still, each of these, is not just going to make you immediately win.

Daniel Filan: Yeah. But I mean, if you think that there’s some sort of relationship here and that we might get AI that’s much more competent than everyone else. Then shouldn’t we just follow that line and be, “Ah, well it might just get all the stuff?”

Katja Grace: Okay. It seems like maybe that’s what the model should be, but there are sort of other relationships that it might be. You might imagine that you just have to have a certain amount of trust and support from other people in order to be allowed to be in certain positions or get certain kinds of social power. And that it’s quite hard to be an entity who no one is willing to extend that sort of thing to, who is quite smart and still to get very far.

Daniel Filan: Yeah. I mean, the hope is that you can trick people. Right?

Katja Grace: That’s true. It seems like for smart people trying to trick people in our world, I think they often do quite badly or, I mean, sometimes they do well for a while-

Daniel Filan: Yeah, I guess.

Katja Grace: We do try to defend against that sort of thing. And that makes it… it’s not just like the other things are not helping you do well, but people’s lack of trust of you is causing them to actively try to stop you.

Daniel Filan: Yeah. I mean, maybe this gets to the human range thing. To me, it seems like the question… the analogy that seems closer to me is not the most competent humans compared to the least competent humans; but the most competent humans compared to dogs or something, it’s not actually very hard to trick a dog. I’m not that good at tricking people and I’ve tricked dogs, you know?

Katja Grace: Yeah.

Daniel Filan: I mean, dogs are unusually trusting animals. But I don’t know, I think other animals can also be tricked.

Katja Grace: That’s fair. I guess when we were talking about the human range thing, it also seemed reasonable to describe some animals’ cognitive abilities of some sorts as basically zero or something on some scales. So I somewhat wonder whether yeah, you can trick all manner of animals that basically can’t communicate and don’t have much going on in terms of concepts and so on. And you can trick humans also, but it’s not just going to be trivially easy to do so. I mean, I think at some level of intelligence or something, or some level of capability and having resources and stuff, maybe it is. But also if you’re hoping to trick all humans or something, or not get caught for this… I guess clearly humans can trick humans pretty often, but if you want it to be like a long run, sustainable strategy-

Daniel Filan: Yeah, you need a combination of tricking some and out running the others, right?

Katja Grace: Something like that. I agree that it might be that with intelligence alone, you can quite thoroughly win. But I just think it’s less clear than it’s often thought to be or something, that the model here should have a lot more parts in it.

Daniel Filan: And so candidate parts are something like the relationship between people trying to not have all their stuff be stolen, and your agent that’s trying to steal everyone’s stuff and amass all the resources, or something.

Katja Grace: I think I’m not quite sure what you meant by that one.

Daniel Filan: Oh. Agents that are trying to trick everyone to accrue resources versus people who don’t like to be tricked sort of.

Katja Grace: Yeah. So I guess maybe I think of that as there are a bunch of either implicit or explicit agreements and norms and stuff that humans have for dealing with one another. And if you want to do a thing that’s very opposed to existing property rights or expectations or something, that’s often a lot harder than doing things that are legal or well looked upon by other people. I think, within the range of things that are not immediately going to ring alarm bells for people, it’s probably still easier to get what you want, if you’re more trusted or have better relationships with different people or are in roles where you’re allowed to do certain things.

Katja Grace: I’m not sure how relevant, or it seems like AI might just be in different roles that we haven’t seen so far, or treated in different ways by humans, but I think for me, even if I was super smart and I wanted to go and do some stuff, it would sort of matter a lot how other people were going to treat me. Or on what things they were just going to let me do them and what things they were going to shoot me, or just pay very close attention or ask me to do heaps of paperwork or-

Daniel Filan: Yeah. I think the argument has to be something like, look, there are some range of things that people will let you do, and if you’re really smart, you can find really good options within that range of things. And the reason that that’s true is that, somehow when human norms are constructing the range of things that we let people do, we didn’t think of all the terrible things that could possibly be done. And we aren’t just only allowing the things that we are certain are going to be fine.

Katja Grace: Yeah. That seems right. I guess it seems partly we do sort of respond dynamically to things. If someone finds a loophole, I think we’re often not just, oh, well I guess you’re right, that was within the rules. Often we change the rules on the spot or something is my impression, but you know, you might be smart enough to get around that also to come up with a plan that doesn’t involve anyone ever managing to stop you.

Daniel Filan: Yeah. You somehow need to go very quickly from things looking like you haven’t taken over the world to things looking like you have.

Katja Grace: Or at least very invisibly in-between or something. And maybe that’s possible, but it seems like a sort of quantitative question. How smart you have to be to do that and how long it actually takes for machines to become that smart. And I guess, different kind of counter argument here, or at least a different complication… I think the similar kind of model of, if you’re more competent, you get more of the resources and then you take over the world, it was sort of treating share of resources as how much of the future do you get, which is not quite right.

Daniel Filan: So yeah, the question is, if you have X percent of the stuff, do you get X percent of the future or something?

Katja Grace: It seems a way that’s obviously wrong is, suppose you’re just the only person in the universe. And so it’s sort of all yours in some sense, and you’re just sitting on earth and this asteroid coming toward you, you clearly just don’t have much control of the future. So there’s also, you could then maybe model it as, oh yeah, there’s a bunch of control of the future that’s just going to no one. And maybe you can kind of get more or something like that. It seems like it would be nice to be clearer about this model. It seems like also in the usual sort of basic economics model or the argument for say, having more immigrants and that not being bad for you or something… It’s sort of like, yeah, well you’ll trade with them and then you’ll get more of what you wanted.

Katja Grace: And so you might want to make a similar argument about AI. If it wasn’t going to steal stuff from you, if it was just going to trade with you and over time it would get more wealth, but you would get more wealth as well, it would be a positive sum thing. And you’d be like, “well, how could that possibly be bad?” Let’s say it never violates your property rights or anything. It’s more like, “well, there was a whole bunch of the future that was going to no one, because you were just sitting there probably going to get killed sometime, and then you both managed to get more control of the future, wasn’t that good for you?” And then it seems, I guess the argument would be something like, “Ah, but you were hoping to take the whole future at some point. You were hoping that you could build some different technology that wasn’t going to also have some of the future.” But yeah, and maybe that works out, but I guess this just seems like a more complicated argument that it seems good to be clear on.

Daniel Filan: Yeah. I mean, to me the response is - so there’s two things there: firstly, there’s a question of… I guess they’re kind of both about widening the pie, to use the classic economist metaphor here. So one thing I think is that, suppose I have all of the world’s resources, which are, a couple of loaves of bread and that’s all the world has sadly, then I’m not going to do very well against the asteroid. Right? Whereas right now, even though I don’t have all the world’s resources, I’m going to do better because there’s sort of more stuff around. Even I personally have more stuff. So in the AI case, yeah, I guess the thing is if AI technology in general has way more resources than me and is not cooperative, then it seems the case that that’s clearly going to be worse than a world in which there’s a similar amount of resources going around, but I, and my friends have all of it instead of them being under the control of an AI system.

Daniel Filan: Yeah, and as to the the second thing, I don’t know, my take is that the thing we should worry about is AI theft. I don’t know. I don’t have a strong reason to believe that these terrible AI systems are going to respect property rights, I don’t know. Some people have these stories where they do and it’s bad anyway, but I don’t know. It seems to me they’d just steal stuff.

Katja Grace: I guess you might think about the case where they respect property rights either because you are building them. So there’s some chance you’ll succeed at some basic aspect of getting them to act as you would want. Where respecting the law is a basic one, that being our usual way of dealing with creatures who are otherwise not value-aligned.

Daniel Filan: I mean, that’s not how we deal with dogs, right? Or it a little bit is, but they’re treated very differently by the law than most things. And if I think about what’s our control mechanism for dangerous animals or something, it mostly isn’t law. I guess maybe the problem is well, they just can’t understand law. But it seems to me that, “obey the law”, in my mind, once you’ve solved “get a thing that robustly obeys the law”, you’ll have solved most to all of the problem

Katja Grace: That seems plausible. It’s not super obvious to me.

Daniel Filan: And in particular, that seems like a hard thing to me.

Katja Grace: I agree it seems plausibly hard. I guess the other reason it seems maybe worth thinking about is just that, if you did get to that situation, it still seems like you might be doomed. I don’t know that I would be happy with a world where we trade with the AIs and they are great. And we trade with them a lot and they get a lot of resources and they’re really competent at everything. And then they go on to take over the universe and we don’t do much of it. Maybe I would prefer to have just waited and hoped to build some different technology in the future or something like that.

Daniel Filan: Yeah. I guess there’s this question of, is it a thing that’s basically like human extinction and even out of the AI outcomes that are not basically human extinction, perhaps some can be better than others and perhaps we should spend some time thinking about that.

Katja Grace: That’s not what I was saying, but yeah, that seems potentially good.

Daniel Filan: Oh, to me, that seems almost the same as what you were saying. Because in the outcome where we have these very smart AIs that are trading with us and where we end up slightly richer than we currently are, to me, that doesn’t seem very much like human extinction.

Katja Grace: I was thinking that it does, depending on how we… Or if we end up basically on one planet and we’re a bit richer than we currently are, and we last until the sun kills us. Maybe I wouldn’t describe it as human extinction in that we didn’t go extinct. However, as far as how much of the possible value of the future did we get supposing that AI is doing things that we do not value at all, it seems like we’ve basically lost all of it.

Daniel Filan: Yeah, that seems right. I think this is perhaps controversial in the wider world, but not controversial between you and I. I’m definitely in favor of trying to get AI systems that really do stuff that we think is good and don’t just fail to steal from us.

Katja Grace: Nice.

Daniel Filan: Yeah. I think maybe part of my equanimity here is thinking, if you can get “obey the law”, then maybe it’s just not too much harder to get, “be really super great for us” rather than just merely lawful evil.

Katja Grace: I really have to think more about that to come to a very clear view. I guess I am more uncertain. I guess we were talking earlier about the possibility that humans are mostly doing well because of their vast network of socially capable culture accumulators. I think in that world, it’s sort of less clear what to think of super AIs joining that network. It seems like the basic units then are more big clusters of creatures who are talking to each other. It’s sort of not clear what role super AI plays in it. Why doesn’t it just join our one? Is there a separate AI one that’s then fighting ours? I don’t know why would you think that?

Katja Grace: It’s just kind of - a basic intuition I think people have is like, ah, well, so far there have been the humans, the basic unit, and then we’re going to have this smarter thing, and so anything could happen. But if it’s more like, so far the basic unit has been this giant pool of humans interacting with each other in a useful way. And there’ve been like more and fewer of them and they have different things going on and now we’re going to add a different kind of node that can also be in a network like this.

Katja Grace: I guess it’s just like the other argument, I feel like it doesn’t… Or the other intuition, doesn’t go through for me at least.

The future of AI Impacts

Daniel Filan: I think I’m going to move on now. Yeah. So I guess I’d like to talk a bit more about you and AI Impacts’ research. Yeah, sort of circling back. Hopefully AI Impacts - I don’t know if you hope this, I hope that AI Impacts will continue for at least 10 more years. I’m wondering, what do you think the most important question is for AI Impacts to answer in those 10 years? And a bit more broadly, what would you like to see happen?

Katja Grace: I guess on the most important question, I feel like there are questions that we know what they are now: when will this happen? How fast will it happen? And it seems like if we could become more certain about timelines, say, such that we are, ah it’s twice as likely to happen in the next little while than we thought it was. And it’s probably not going to happen in the longer term. As far as guiding other people’s actions, I feel like it’s maybe not that helpful in that you’re sort of giving one bit of information about, what should people be doing? And I think that’s kind of true for various of these high level questions, except maybe, is this a risk at all? Where if you manage to figure out that it wasn’t a risk, then everyone could just go and do something else instead, which sounds pretty big.

Katja Grace: But I feel like there are things where it’s less clear what the question is, where it’s more like maybe there’s details of the situation where you could just see better. Like how exactly things will go down, where it’s currently quite vague or currently people are talking about fairly different kinds of scenarios happening: are the AIs all stealing stuff? Is there just one super AI that immediately stole everything? Is it some kind of long-term AI corporations competing one another into oblivion or something? It seems just getting the details of the scenarios down, that would be pretty good. Yeah. I guess this isn’t my long-term well considered answer. It’s more like right now, which thing seems good to me.

Daniel Filan: And I guess another question, what work do you think is the best complement to AI Impacts’ research? So what thing that is really not what AI Impacts does, is the most useful for what AI Impacts does?

Katja Grace: It’s hard to answer the most useful, but a thing that comes to mind is sort of similar questions or also looking around for empirical ways to try to answer these high level questions, but with more of a AI expertise background. I think that’s clearly a thing that we are lacking.

AI Impacts vs academia

Daniel Filan: All right. Sort of a meta question, is there anything else I should have asked about AI Impacts and your research and how you think about things?

Katja Grace: You could ask how it differs from academia? What is this kind of research?

Daniel Filan: I guess being in academia, that was very obvious to me.

Katja Grace: I’m curious how that seems different to you then?

Daniel Filan: Oh, well, to me it seems like academics - so probably there’s the deliverable unit, right? There are some pages on this website or one page in particular that I’m just remembering that was about the time to solve various math conjectures. And the title is, how long does it take to solve mathematical conjectures? And the body is, we used this estimator on this dataset, here’s the graph, the answer is this. And that’s the page. It seems to me that academics tend to like producing longer pieces of work that are sort of better integrated, that are fancier in some sense.

Katja Grace: Yeah. That seems right.

Daniel Filan: In machine learning, people love making sure that you know that they know a lot about math.

Katja Grace: I’m curious what you meant by better integrated? You mean into the larger literature?

Daniel Filan: Yeah. It seems to me AI Impacts has a bunch of pages and they’re kind of related to each other. But an academic is going to write a bunch of papers that are relatively, I think, more closely related to each other. Whereas, the kind of clusters of related stuff… It seems like there’s going to be more per cluster than maybe AI Impacts does. I think AI Impacts seems to me to be happier-

Katja Grace: Jumping around-

Daniel Filan: To just answer one question and then move on to something totally different.

Katja Grace: Yeah. I think that’s right.

Daniel Filan: It does occur to me that maybe… Normally in these shows, I don’t normally tell people what their research is.

Katja Grace: That’s helpful though, because I mean, I think it’s true that AI Impacts is different in those ways. And that is intentional, but I haven’t thought about it that much in several years. So it’s good to be reminded what other things are like.

Daniel Filan: I guess there’s also the people who do it. Academics tend to like research being done or supervised by people who have done a PhD in a relevant field and spend a lot of time on each thing. Whereas AI Impacts has more, I guess maybe more generalist is a way to summarize a lot of these things. More of it seems to be done by philosophy PhD students taking a break from their programs, which sound terrible to me [to be clear, I’m saying the programs sound terrible, not the students or their practice of taking a break to work at AI Impacts].

Katja Grace: That seems right. Yeah. Again, I think part of the premise of the project is something like, there is a bunch that you can figure out from kind of a back of the envelope calculation or something often that is not being made use of. And so often what you want to add value to a discussion is not an entire lifetime of research on one detailed question or something. And so it’s sort of, okay, having done a back of the envelope calculation for this thing, the correct thing to do is, now move on to another question. And if you ever become so curious about that first question again, or you really need to be sure about it or to get a finer-grained number or something, then go back and do another level of checking.

Katja Grace: Yeah. I think that was part of it. I think I don’t quite understand why it is that writing papers take so long compared to writing blog posts or something. It seems like in my experience it does, even with a similar level of quality in some sense. So I was partly hoping to just do things where at least the norm is something like, okay, you’re doing the valuable bit of some sort of looking things up or calculation, then you’re writing that down and then you’re not spending six months, whatever else it is I do when I spend six months straight writing a paper nicely.

Daniel Filan: Yeah. Yeah. It does seem easier to me. I think maybe I care a lot more about the… Let’s see, when I write a paper, I don’t know how it is in your subfield, but in my subfield in machine learning, papers sort of have to be eight pages. They can’t be eight pages and one line. So there’s some amount of time being spent on that. Yeah. When I write blog posts, they’re just worse than the papers I write and I just spend 30 minutes writing the first thing I thought of. And this is especially salient to me now because a few days ago I published one that got a lot of pushback that seems to have been justified. Yeah. There does seem to be something strange.

Katja Grace: I think another big difference is that we’re often trying to answer a particular question and then we’re using just whatever method we can come up with, which is often a bad method, because there’s just no good way of answering the question really well. Whereas, I think academia is more methods oriented. They know a good method and then they find questions that they can answer with that method. At least it’s my impression.

Daniel Filan: This is not how it’s supposed to work. My advisor Stuart Russell, has the take that, look, the point of a PhD, and I think maybe he would also say the point of a research career, is to answer questions about artificial intelligence and do the best job you can. And it’s true that people care more… I think people do care more about good answers to bad questions than bad answers to good questions. Stuart’s also a little bit unusual in his views about how PhD theses should be.

Katja Grace: I’m probably also being unfair. It might be that in academia, you have to both - you’re at least trying to ask a good question and answer it well, whereas we’re just trying to answer a good question.

Daniel Filan: Yeah. I think there’s also this thing of, who’s checking what? In academia, most of the checking is, did you answer the question well? When your paper goes through peer review, nobody says, the world would only be $10 richer if you answered this question. They sometimes talk about interest, which is kind of a different thing. So those are some differences between AI Impacts and academia. Do more come to mind?

Katja Grace: I guess maybe one that is related to your saying it’s not well integrated, I guess I think of academia as being not intended to be that integrated in that it’s sort of a bunch of separate papers that are kind of referring to each other maybe, but not in any particular - there’s not really a high level organizing principle or something. Whereas, I guess with AI Impacts, it somewhat grew out of a project that was to make really complicated Workflowys of different arguments for things.

Daniel Filan: What’s a Workflowy?

Katja Grace: Oh, it’s software for writing bulleted lists that can sort of go arbitrarily deep in their indenting.

Daniel Filan: The levels of indentation.

Katja Grace: Right. And you can sort of zoom in to further-in levels of indentation, and just look at that. So you can just have a gigantic list that is everything in your life and so on, but I guess Paul Christiano and I were making what we called structured cases that were supposed to be arguments for, for instance, AI is risky or something, or AI is the biggest problem or something. And the idea was there’d be this top level thing, and then there’ll be some nodes under it. And it would be yeah, if you agree with the nodes under it, then you should agree with the thing above it. And so you could look at this and be, oh, I disagree with the thing at the top, which of the things underneath do I disagree with? Ah, it’s that one. All right, what do I disagree with under that, and so on.

Katja Grace: This is quite hard to do in WorkFlowy somehow. And it was very annoying to look at. And so, yeah, I guess one idea was to do a similar thing where each node is a whole page, where there’s some kind of statement at the top and then the support for it is a whole bunch of writing on the page. Instead of that just being a huge jumble of logically, carefully related statements. And so I guess, AI Impacts is still somewhat like that. Though not super successfully at the top in that it’s missing a lot of the top level nodes that would make sense of the lower level nodes.

Daniel Filan: Is that roughly a comprehensive description of the differences between AI Impacts and academia? AI Impacts is much smaller than academia.

Katja Grace: Also true. I imagine it’s not a comprehensive list, but it’ll do for now.

What AI x-risk researchers do wrong

Daniel Filan: All right. So I guess the final question that I’d like to ask, oh, I forgot about one… Another one is, by working on AI Impacts you sort of look in to a bunch of questions that a lot of people, I guess, have opinions on already. What do you think people in AI x-risk research are most often, most importantly wrong about?

Katja Grace: I’m pretty unsure, and places where it seems like people are wrong, I definitely don’t have the impression that, obviously they’re wrong rather than, I’m confused about what they think. I think we have a sort of meta disagreement maybe where I think it’s better to write down clearly what you think about things than many people do, which is part of why I don’t know exactly what they think about some things. Yeah. I mean, it’s not that other people don’t ever write things down clearly, but I guess it’s a priority for me to try and write down these arguments and stuff, and really try and pin down whether they’re sloppy in some way. Whereas, it seems like in general, there’s been less of that than I would have expected overall, I think.

Daniel Filan: I mean, I think people write a lot of stuff. It’s just about sort of taking a bunch of things for granted.

Katja Grace: Or maybe the things that were originally laying out the things that are being taken for granted, were not super careful, more evocative, which is maybe the right thing for evoking things. That was what was needed. But yeah, I guess I’m more keen to see things laid out clearly.

How to follow Katja’s and AI Impacts’ work

Daniel Filan: All right. So if people are interested in following you and your research and AI Impacts’ research, how can they do so?

Katja Grace: Well, there’s the aiimpacts.org website is a particularly good way to follow it. I guess we have a blog which you can see on the website. So you could subscribe to that if you want to follow it, I guess otherwise we just sort of put up new pages sometimes some of which are interesting or not, which you can also subscribe to see if you want. If you just want to hear more things that I think about stuff. I have a blog, worldspiritsockpuppet.com.

Daniel Filan: All right, great. Thanks for appearing on the show. And to the listeners, I hope you join us again.

Katja Grace: Thank you for having me.

Daniel Filan: This episode is edited by Finan Adamson. The financial costs of making this episode are covered by a grant from the Long Term Future Fund. To read a transcript of this episode, or to learn how to support the podcast, you can visit axrp.net. Finally, if you have any feedback about this podcast, you can email me at feedback@axrp.net.