Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 21 минута 40 секунд назад

Gary Marcus vs Cortical Uniformity

28 июня, 2020 - 21:18
Published on June 28, 2020 6:18 PM GMT

Background / context

I wrote about cortical uniformity last year in Human Instincts, Symbol Grounding, and the Blank Slate Neocortex. (Other lesswrong discussion includes Alex Zhu recently and Jacob Cannell in 2015.) Here was my description (lightly edited, and omitting several footnotes that were in the original):

Instead of saying that the human brain has a vision processing algorithm, motor control algorithm, language algorithm, planning algorithm, and so on, in "Common Cortical Algorithm" (CCA) theory we say that (to a first approximation) we have a massive amount of "general-purpose neocortical tissue", and if you dump visual information into that tissue, it does visual processing, and if you connect that tissue to motor control pathways, it does motor control, etc.

CCA theory, as I'm using the term, is a simplified model. There are almost definitely a couple caveats to it:

  1. There are sorta "hyperparameters" on the generic learning algorithm which seem to be set differently in different parts of the neocortex. For example, some areas of the cortex have higher or lower density of particular neuron types. There are other examples too. I don't think this significantly undermines the usefulness or correctness of CCA theory, as long as these changes really are akin to hyperparameters, as opposed to specifying fundamentally different algorithms. So my reading of the evidence is that if you put, say, motor nerves coming out of visual cortex tissue, the tissue could do motor control, but it wouldn't do it quite as well as the motor cortex does.

  2. There is almost definitely a gross wiring diagram hardcoded in the genome—i.e., set of connections between different neocortical regions and each other, and other parts of the brain. These connections later get refined and edited during learning. Again, we can ask how much the existence of this innate gross wiring diagram undermines CCA theory. How complicated is the wiring diagram? Is it millions of connections among thousands of tiny regions, or just tens of connections among a few regions? Would the brain work at all if you started with a random wiring diagram? I don't know for sure, but for various reasons, my current belief is that this initial gross wiring diagram is not carrying much of the weight of human intelligence, and thus that this point is not a significant problem for the usefulness of CCA theory. (This is a loose statement; of course it depends on what questions you're asking.) I think of it more like: if it's biologically important to learn a concept space that's built out of associations between information sources X, Y, and Z, well, you just dump those three information streams into the same part of the cortex, and then the CCA will take it from there, and it will reliably build this concept space. So once you have the CCA nailed down, it kinda feels to me like you're most of the way there....

Marcus et al.'s challenge

Now, when I was researching that post last year, I had read one book chapter opposed to cortical uniformity and another book chapter in favor of cortical uniformity, which were a good start, but I've been keeping my eye out for more on the topic. And I just found one! In 2014 Gary Marcus, Adam Marblestone, and Thomas Dean wrote a little commentary in Science Magazine called The Atoms of Neural Computation, with a case against cortical uniformity.

Out of the various things they wrote, one stands out as the most substantive and serious criticism: They throw down a gauntlet in their FAQ, with a table of 10 fundamentally different calculations that they think the neocortex does. Can one common cortical algorithm really subsume or replace all those different things?

Well, I accept the challenge!!

But first, I better say something about what there common cortical algorithm is and does, with the caveat that nobody knows all the details, and certainly not me. (The following paragraph is mostly influenced by reading a bunch of stuff by Dileep George & Jeff Hawkins, along with miscellaneous other books and papers that I've happened across in my totally random and incomplete neuroscience and AI self-education.)

The common cortical algorithm (according to me, and leaving out lots of aspects that aren't essential for this post) is an algorithm that builds a bunch of generative models, each of which consists of predictions that other generative models are on or off, and/or predictions that input channels (coming from outside the neocortex—vision, hunger, etc.) are on or off. ("It's symbols all the way down.") All the predictions are attached to confidence values, and both the predictions and confidence values are, in general, functions of time (or of other parameters ... again, I'm glossing over details here). The generative models are compositional, because if two of them make disjoint and/or consistent predictions, you can create a new model that simply predicts that both of those two component models are active simultaneously. For example, we can snap together a "purple" generative model and a "jar" generative model to get a "purple jar" generative model. Anyway, we explore the space of generative models, performing a search with a figure-of-merit that kinda mixes self-supervised learning, model predictive control, and Bayesian(ish) priors. Among other things, this search process involves something at least vaguely analogous to message-passing in a probabilistic graphical model.

OK, now let's dive into the Marcus et al. FAQ list:

  • Marcus et al.'s computation 1: "Rapid perceptual classification", potentially involving "Receptive fields, pooling and local contrast normalization" in the "Visual system"

I think that "rapid perceptual classification" naturally comes out of the cortical algorithm, not only in the visual system but also everywhere else.

In terms of "rapid", it's worth noting that (1) many of the "rapid" responses that humans do are not done by the neocortex, (2) The cortical message-passing algorithm supposedly involves both faster, less-accurate neural pathways (which prime the most promising generative models), as well as slower, more-accurate pathways (which, for example, properly do the "explaining away" calculation).

  • Marcus et al.'s computation 2: "Complex spatiotemporal pattern recognition", potentially involving "Bayesian belief propagation" in "Sensory hierarchies"

The message-passing algorithm I mentioned above is either Bayesian belief propagation or something approximating it. Contra Marcus et al., Bayesian belief propagation is not just for spatiotemporal pattern recognition in the traditional sense; for example, to figure out what we're looking at, the Bayesian analysis incorporates not only the spatiotemporal pattern of visual input data, but also semantic priors from our other senses and world-model. Thus if we see a word with a smudged letter in the middle, we "see" the smudge as the correct letter, even when the same smudge by itself would be ambiguous.

  • Marcus et al.'s computation 3: "Learning efficient coding of inputs", potentially involving "Sparse coding" in "Sensory and other systems"

I think that not just sensory inputs but every feedforward connection in the neocortex (most of which are neocortex-to-neocortex) has to be re-encoded into the data format that the neocortex knows what to do with, i.e. different possible forward inputs correspond to stimulation of different sparse subsets out of a pool of receiving neurons, wherein the sparsity is relatively uniform, and where all the receiving neurons in the pool are stimulated a similar fraction of the time (for efficient use of computational resources). So, Jeff Hawkins has a nice algorithm for this re-encoding process and again, I would put this (or something like it) as an interfacing ingredient on every feedforward connection in the neocortex.

  • Marcus et al.'s computation 4: "Working memory", potentially involving "Continuous or discrete attractor states in networks" in "Prefrontal cortex"

To me, the obvious explanation is that active generative models fade away gradually when they stop being used, rather than turning off abruptly. Maybe that's wrong, or there's more to it than that; I haven't really looked into it.

  • Marcus et al.'s computation 5: "Decision making", potentially involving "Reinforcement learning of action-selection policies in PFC/BG system" and "winner-take-all networks" in "prefrontal cortex"

I didn't talk about neural implementations in my post on how generative models are selected, but I think reinforcement learning (process (e) in that post) is implemented in the basal ganglia. As far as I understand, the basal ganglia just kinda listens broadly across the whole frontal lobe of the neocortex (the home of planning and motor control), and memorizes associations between arbitrary neocortical patterns and associated rewards, and then it can give a confidence-boost to whatever active neocortical pattern is anticipated to give the highest reward.

Winner-take-all is a combination of that basal ganglia mechanism, and the fact that generative models suppress each other when they make contradictory predictions.

  • Marcus et al.'s computation 6: "Routing of information flow", potentially involving "Context-dependent tuning of activity in recurrent network dynamics, shifter circuits, oscillatory coupling, modulating excitation / inhibition balance during signal propagation", "common across many cortical areas"

Routing of information flow is a core part of the algorithm: whatever generative models are active, they know where to send their predictions (their message-passing massages).

I think it's more complicated than that in practice thanks to a biological limitation: I think the parts of the brain that work together need to be time-synchronized for some of the algorithms to work properly, but time-synchronization is impossible across the whole brain at once because the signals are so slow. So there might be some complicated neural machinery to dynamically synchronize different subregions of the cortex when appropriate for the current information-routing needs. I'm not sure. But anyway, that's really an implementation detail, from a high-level-algorithm perspective.

As usual, it's possible that there's more to "routing of information flow" that I don't know about.

  • Marcus et al.'s computation 7: "Gain control", potentially involving "Divisive normalization", "common across many cortical areas"

I assume that divisive normalization is part of the common cortical algorithm; I hear it's been observed all over the neocortex and even hippocampus, although I haven't really looked into it. Maybe it's even implicit in that Jeff Hawkins feedforward-connection-interface algorithm I mentioned above, but I haven't checked.

  • Marcus et al.'s computation 8: "Sequencing of events over time", potentially involving "Feed-forward cascades" in "language and motor areas" and "serial working memory" in "prefrontal cortex"

I think that every part of the cortex can learn sequences; as I mentioned, that's part of the data structure for each of the countless generative models built by the cortical algorithm.

Despite what Marcus implies, I think the time dimension is very important even for vision, despite the impression we might get from ImageNet-solving CNNs. There are a couple reasons to think that, but maybe the simplest is the fact that humans can learn the "appearance" of an inherently dynamic thing (e.g. a splash) just as easily as we can learn the appearance of a static image. I don't think it's a separate mechanism.

(Incidentally, I started to do a deep dive into vision, to see whether it really needs any specific processing different than the common cortical algorithm as I understand it. In particular, the Dileep George neocortex-inspired vision model has a lot of vision-specific stuff, but (1) some of it is stuff that could have been learned from scratch, but they put it in manually for their convenience (this claim is in the paper, actually), and (2) some of it is stuff that fits into the category I'm calling "innate gross wiring diagram" in that block-quote at the top, and (3) some of it is just them doing a couple things a little bit different from how the brain does it, I think. So I wound up feeling like everything seems to fit together pretty well within the CCA framework, but I dunno, I'm still hazy on a number of details, and it's easy to go wrong speculating about complicated algorithms that I'm not actually coding up and testing.)

  • Marcus et al.'s computation 9: "Representation and transformation of variables", potentially involving "population coding" or a variant in "motor cortex and higher cortical areas"

Population coding fits right in as a core part of the common cortical algorithm as I understand it, and as such, I think it is used throughout the cortex. The original FAQ table also mentions something about dot products here, which is totally consistent with some of the gory details of (my current conception of) the common cortical algorithm. That's beyond the scope of this article.

  • Marcus et al.'s computation 10: "Variable binding", potentially involving "Indirection" in "PFC / BG loops" or "Dynamically partitionable autoassociative networks" or "Holographic reduced representations" in "higher cortical areas"

They clarify later that by "variable binding" they mean "the transitory or permanent tying together of two bits of information: a variable (such as an X or Y in algebra, or a placeholder like subject or verb in a sentence) and an arbitrary instantiation of that variable (say, a single number, symbol, vector, or word)."

I say, no problem! Let's go with a language example.

I'm not a linguist (as will be obvious), but let's take the sentence "You jump". There is a "you" generative model which (among other things) makes a strong prediction that the "noun" generative model is also active. There is a "jump" generative model which (among other things) makes a strong prediction that the "verb" generative model is also active. Yet another generative model predicts that there will be a sentence in which a noun will be followed by a verb, with the noun being the subject. So you can snap all of these ingredients together into a larger generative model, "You jump". There you have it!

Again, I haven't thought about it in any depth. At the very least, there are about a zillion other generative models involved in this process that I'm leaving out. But the question is, are there aspects of language that can't be learned by this kind of algorithm?

Well, some weak, indirect evidence that this kind of algorithm can learn language is the startup Gamalon, which tries to do natural language processing using probabilistic programming with some kind of compositional generative model, and it works great. (Or so they say!) Here's their CEO Ben Vigoda describing the technology on youtube, and don't miss their fun probabilistic-programming drawing demo starting at 29:00. It's weak evidence because I very much doubt that Gamelon uses exactly the same data structures and search algorithms as the neocortex, only vaguely similar, I think. (But I feel strongly that it's way more similar to the neocortex than a Transformer or RNN is, at least in the ways that matter.)


So, having read the Marcus et al. paper and a few of its references, it really didn't move me at all away from my previous opinion: I still think the Common Cortical Algorithm / Cortical Uniformity hypothesis is basically right, modulo the caveats I mentioned at the top. (That said, I wasn't 100% confident about that hypothesis before, and I'm still not.) If anyone finds the Marcus et al. paper more convincing than I did, I'd love to talk about it!


The Illusion of Ethical Progress

28 июня, 2020 - 12:33
Published on June 28, 2020 9:33 AM GMT

Here are two statements I used to believe.

  1. The world's ethical systems have generally improved over time.
  2. It follows that ethical systems probably will continue to improve into the future.

I think the first statement is an illusion. If the first statement is untrue then the second statement cannot follow from the first.

What does it mean for an ethical system to get "better"? Physics contains no such thing.

Take the universe and grind it down to the finest powder and sieve it through the finest sieve and then show me one atom of justice, one molecule of mercy. And yet… and yet you act as if there is some ideal order in the world, as if there is some… some rightness in the universe by which it may be judged.

― Terry Pratchett, Hogfather

To judge the quality of an ethical system you must do so through your own ethical system. Ethics are like Minkowski spacetime. You cannot judge ethics in absolute terms. You can only judge an ethical position relative to your own.

A universal standard of ethics must have practical utility in every society at every point in history. Today's fashions often judge ethics by its internal coherence (untenable in traditional Japan[1]) or universality (untenable in tribal pastoralist cultures[2]).

If you believe your society (or somewhere nearby you in ideatic space) is the pinnacle of ethical evolution then what is more likely?

  1. Your society objectively is the pinnacle of ethical evolution.
  2. You judge every ethical system by its distance to your own.

An ethical system similar to your own often seems like a "good ethical system". The illusion of ethical progress follows from this subjective metric. If ethical systems are one-dimensional then morals will appear to be getting better as often as they get worse. (Except for very recent history which will appear to have improved.) But ethical systems have many dimensions.

In the above picture you can see random walks through 3-dimensional space, representing 3 universes with 3 separate ethical evolutions. The higher the dimensionality of ethical space, the less likely an ethical system will walk back to a previous state and thus the more likely ethical evolution will appear to have a direction. Each 3-dimensional path appears to be going from one place to another even though they are all completely random. The more dimensions an ethical space has, the harder it is to distinguish a random walk from progress. Real ethical space has many more than 3 dimensions.

Does this mean ethics is fundamentally relative?


Ethics is fundamentally subjective, but not relative.

In the Western intellectual tradition, ethics is a branch of philosophy. Western philosophy has no place for empiricism. Without empirical results, there is no way to compare ethical systems objectively against each other. Progress is indistinguishable from a random walk.

But there is a way to observe ethics in absolute terms. It is called "mysticism".

Have you ever noticed how Abraham, Jesus, Mohammad, Siddhartha and Ryokan all had a habit of going alone into the wilderness for several days at a time? Then they came back and made ethical pronouncements and people listened to them? The great mystics cut through the Gordian Knot of moral relativism by approaching ethics empirically.

The Snowmass Contemplative Group

In the early 1980's Father Thomas Keating, a Catholic priest, sponsored a meeting of contemplatives from many different religions. The group represented a few Christian denominations as well as Zen, Tibetan, Islam, Judaism, Native American & Nonaligned. They found the meeting very productive and decided to have annual meetings. Each year they have a meeting at a monastery of a different tradition, and share the daily practice of that tradition as a part of the meetings. The purpose of the meetings was to establish what common understandings they-had achieved as a result of their diverse practices. The group has become known as the Snowmass Contemplative Group because the first of these meetings was held in the Trappist monastery in Snowmass, Colorado.

When scholars from different religious traditions meet, they argue endlessly about their different beliefs. When contemplatives from different religious traditions meet, they celebrate their common understandings. Because of their direct personal understanding, they were able to comprehend experiences which in words are described in many different ways. The Snowmass Contemplative Group has established seven Points of Agreement that they have been refining over the years:

  1. The potential for enlightenment is in every person.
  2. The human mind cannot comprehend ultimate reality, but ultimate reality can be experienced.
  3. The ultimate reality is the source of all existence.
  4. Faith is opening, accepting & responding to ultimate reality.
  5. Confidence in oneself as rooted in the ultimate reality is the necessary corollary to faith in the ultimate reality.
  6. As long as the human experience is experienced as separate from the ultimate realty it is subject to ignorance, illusion, weakness and suffering.
  7. Disciplined practice is essential to the spiritual journey, yet spiritual attainment is not the result of one's effort but the experience of oneness with ultimate reality.

Saints and Psychopaths by Willian L Hamilton

You cannot "judge" an ethical system objectively. But you can observe it objectively and you can measure it objectively. Such empiricism once formed the foundation for the Age of Reason. Mystics are less like moral philosophers arguing doctrine than they are scientists reconciling separate experiments.

  1. For more information about this philosophical framework, read The Chrysanthemum and the Sword by Ruth Benedict. ↩︎

  2. For more information about this way of life, read Arabian Sands by Wilfred Thesiger. ↩︎


Self-sacrifice is a scarce resource

28 июня, 2020 - 08:08
Published on June 28, 2020 5:08 AM GMT

“I just solved the trolley problem.... See, the trolley problem forces you to choose between two versions of letting other people die, but the actual solution is very simple. You sacrifice yourself.”

- The Good Place

High school: Naive morality

When I was a teenager, I had a very simple, naive view of morality. I thought that the right thing to do was to make others happy. I also had a naive view of how this was accomplished - I spent my time digging through trash cans to sort out the recyclables, picking up litter on the streets, reading the Communist Manifesto, going to protests, you know, high school kind of stuff. I also poured my heart into my dance group which was almost entirely comprised of disadvantaged students - mainly poor, developmentally disabled, or severely depressed, though we had all sorts. They were good people, for the most part, and I liked many of them simply as friends, but I probably also had some sort of intelligentsia savior complex going on with the amount of effort I put into that group.

The moment of reckoning for my naive morality came when I started dating a depressed, traumatized, and unbelievably pedantic boy with a superiority complex even bigger than his voice was loud. I didn’t like him. I think there was a time when I thought I loved him, but I always knew I didn’t like him. He was deeply unkind, and it was like there was nothing real inside of him. But my naive morality told me that dating him was the right thing to do, because he liked me, and because maybe if I gave enough of myself I could fix him, and then he would be kind to others like he was to me. Needless to say this did not work. I am much worse off for the choices I made at that time, with one effect being that I have trouble distinguishing between giving too much of myself and just giving basic human decency.

And even if it were true that pouring all of my love and goodness out for a broken person could make them whole again, what good would it be? There are millions of sad people in the world, and with that method I would only be able to save a few at most (or in reality, one, because of how badly pouring kindness into a black hole burns you out). If you really want to make people’s lives better, that is, if you really care about human flourishing, you can’t give your whole self to save one person. You only have one self to give.

Effective altruism, my early days

When I first moved to the Bay, right after college, I lived with five other people in what could perhaps practically but certainly not legally be called a four-bedroom apartment. Four of the others were my age, and three of us (including me) were vegan. The previous tenants had left behind a large box of oatmeal and a gallon of cinnamon, so that was most of what I ate, though I sometimes bought a jar of peanut butter to spice things up or mooched food off of our one adult housemate. I was pretty young and pretty new to EA and I didn’t think it was morally permissible to spend money, and many of my housemates seemed to think likewise. Crazy-burnout-guy work was basically the only thing we did - variously for CEA, CHAI, GiveWell, LessWrong, and an EA startup. My roommate would be gone when I woke up and not back from work yet when I fell asleep, and there was work happening at basically all hours. One time my roommate and I asked Habryka if he wanted to read Luke’s report on consciousness with us on Friday night and he told us he would be busy; when we asked with what he said he’d be working.

One day I met some Australian guys who had been there in the really early days of EA, who told us about eating out of the garbage (really!) and sleeping seven to a hallway or something ridiculous like that, so that they could donate fully 100% of their earnings to global poverty. And then I felt bad about myself, because even though I was vegan, living in a tenement, half-starving myself, and working for an EA org, I could have been doing more.

It was a long and complex process to get from there to where I am now, but suffice it to say I now realize that being miserable and half-starving is not an ideal way to set oneself up for any kind of productive work, world-saving or otherwise.

You can’t make a policy out of self-sacrifice

I want to circle back to the quote at the beginning of this post. (Don’t worry, there won’t be any spoilers for The Good Place). It’s supposed to be a touching moment, and in some ways it is, but it’s also frustrating. Whether or not self-sacrifice was correct in that situation misses the point; the problem is that self-sacrifice cannot be the answer to the trolley problem.

Let’s say, for simplicity’s sake, that me jumping in front of the trolley will stop it. So I do that, and boom, six lives saved. But if the trolley problem is a metaphor for any real-world problem, there are millions of trolleys hurtling down millions of tracks, and whether you jump in front of one of those trolleys yourself or not, millions of people are still going to die. You still need to come up with a policy-level answer for the problem, and the fact remains that the policy that will result in the fewest deaths is switching tracks to kill one person instead of five. You can’t jump in front of a million trolleys.

There may be times when self-sacrifice is the best of several bad options. Like, if you’re in a crashing airplane with Eliezer Yudkowsky and Scott Alexander (or substitute your morally important figures of choice) and there are only two parachutes, then sure, there’s probably a good argument to be made for letting them have the parachutes. But the point I want to make is, you can’t make a policy out of self-sacrifice. Because there’s only one of you, and there’s only so much of you that can be given, and it’s not nearly commensurate with the amount of ill in the world.


I am not attempting to argue that, in doing your best to do the right thing, you will never have to make decisions that are painful for you. I know many a person working on AI safety who, if the world were different, would have loved nothing more than to be a physicist. I’m glad for my work in the Bay, but I also regret not living nearer to my parents as they grow older. We all make sacrifices at the altar of opportunity cost, but that’s true for everyone, whether they’re trying to do the right thing or not.

The key thing is that those AI safety researchers are not making themselves miserable with their choices, and neither am I. We enjoy our work and our lives, even if there are other things we might have enjoyed that we’ve had to give up for various reasons. Choosing the path of least regret doesn’t mean you’ll have no regrets on the path you go down.

The difference, as I see it, is that the “self-sacrifices” I talked about earlier in the post made my life strictly worse. I would have been strictly better off if I hadn’t poured kindness into someone I hated, or if I hadn’t lived in a dark converted cafe with a nightmare shower and tried to subsist off of stale oatmeal with no salt.

You’ll most likely have to make sacrifices if you’re aiming at anything worthwhile, but be careful not to follow policies that deplete the core of yourself. You won’t be very good at achieving your goals if you’re burnt out, traumatized, or dead. Self-sacrifice is generally thought of as virtuous, in the colloquial sense of the word, but moralities that advocate it are unlikely to lead you where you want to go.

Self-sacrifice is a scarce resource.


A reply to Agnes Callard

28 июня, 2020 - 06:25
Published on June 28, 2020 3:25 AM GMT

Agnes Callard says on Twitter:

Sincerely can't tell which threatens culture of free thought & expression more: @nytimes de-anonymizing & destroying a (rightly) treasured blog for no good reason @nytimes increasingly allowing external mobs (w/powerful members) influence over what it publishesI believe that the arguments in this op-ed--about why philosophers shouldn't use petitions to adjudicate their disputes--also apply to those non-philosophers who, for independent reasons, are committed to the power of public reason. https://t.co/elZkZgBYPD?amp=1

[The link is to a NYT op-ed she wrote; because it is behind a paywall, I haven't read it. I would link to SSC's argument against paywalls, except for the obvious reason.]

A friend linked to this on Facebook, and I replied with a Bible verse:

Render unto Caesar the things that are Caesar's, and unto God the things that are God's.

But, in the spirit of public reason, let me explain my meaning.

Different people present different interfaces to the public. Philosophers want to receive and know how to handle reasoned arguments; politicians want to receive and know how to handle votes. Present a politician with a reasoned argument, and they might think it's very nice, but they won't consider it as relevant to their duties as a politican as polling data. "The voters aren't choosing me to think," they might say; "they're choosing me to represent their will."

This isn't to single out politicians as 'non-philosophers'; businesspeople have their own interface, where they want to receive and know how to handle money, and athletes have their own interface, and musicians, and so on and so on. This is part of the society-wide specialization of labor, and is overall a good thing.

Callard argues that philosophers shouldn't sign petitions, as philosophers--in my framing, that it corrupts the interface of 'philosophers' for them to both emit reasoned arguments and petitions, which are little more than counts of bodies and clout. If they want to sign petitions, or vote, or fight in wars on their own time, that's fine; but those political behaviors are not the business of philosophy, and they should do them as people.

Overall, this argument seems right to me, except I think it also makes sense for individuals, even those personally committed to public reason, to honor the other disciplines by engaging with them the way they want to be engaged with, so long as it is consistent with one's conscience. If you want, say, a politician to take seriously Aliens and Citizens, you need to assemble a voting bloc that is in favor of open borders, rather than simply sending them copies of the paper. If you want a businessperson to produce anti-malaria nets for people in need, the thing to do is assemble a pile of money to trade them for the nets.

And so the question becomes: when an editor at the New York Times makes a decision that seems wrong-headed and cruel, what interface do they present to the world, and how should we make use of it?

In this particular case, the editor in question is not a philosopher, and to the best of my knowledge hasn't elected the interface of philosophy. [Scott was simply informed of the editor's decision, not her reasons for the decision, which we can only imagine. If it hinges on a reasoned defense of pseudonyms, I am happy to provide one; it if hinges on proof that pseudonyms are important to our culture, the petition seems like the best way to provide that.]

Callard, in the replies, tweets:

Good thought. I think if there were some way for the document (which wouldn't then be a petition) to convey: "this is what we think but if, after careful deliberation, you sill believe that you are doing the right thing, we will support your decision"-I might feel differently.

I have employed this strategy in the past, when I felt someone else owned a decision or I overall trusted their judgment. But it feels like this is a tool with a narrow use, rather than one that applies broadly. I wouldn't say to Putin, "I personally think it is wrong to murder journalists, but if after careful deliberation, you still believe that you are doing the right thing, I will support your decision." His belief that he is doing the right thing (if he even casts the decision in those terms) is not a crux for me, and would not change my views.

Similarly, I don't see a reason yet to trust that the editor thinking Scott's birth name is newsworthy, instead of referring to him with his pseudonym as the NYT has done many times before for others in similar situations, should update any of my beliefs on the value of pseudonymity, instead of simply reflecting on their callousness.


Don't Make Your Problems Hide

27 июня, 2020 - 23:24
Published on June 27, 2020 8:24 PM GMT

I've seen a worrying trend in people who've learned introspection and self-improvement methods from CFAR, or analogous ones from CBT. They make better life decisions, they calm their emotions in the moment. But they still look just as stressed as ever. They stamp out every internal conflict they can see, but it seems like there are more of them beyond the horizon of their self-awareness.

(I may have experienced this myself.)

One reason for this is that there's a danger with learning how to consciously notice and interact with one's subconscious thoughts/feelings/desires/fears: the conscious mind may not like what it sees, and try to edit the subconscious mind into one that pleases it.

The conscious mind might try, that is, but the subconscious is stronger. So, what actually happens?

The subconscious develops defense mechanisms.

Suppressed desires disguise themselves as being about other things, or they just overwhelm the conscious mind's willpower every now and then (and maybe fulfill themselves in a less healthy way than could otherwise be managed).

Suppressed thoughts become stealthy biases; certain conscious ideas or narratives get reinforced until they are practically unquestionable. So too with fears; a suppressed social fear is a good way to get a loud alarm that never stops.

Suppressed feelings hide themselves more thoroughly from the searchlight, so that one never consciously notices their meaning anymore, one just feels sad or angry or scared "for no reason" in certain situations.

At its worst, the conscious mind tries ever-harder to push back against these, further burning its rapport with the subconscious. I think of pastors who suppress their gay desires so hard that they vigorously denounce homosexuality and then sneak out for gay sex. They'd have been living such a happier life if they'd given up and acknowledged who they are, and what they want, years ago.

Now, sometimes people do have a strong desire that can't be satisfied in any healthy way. And that's just a brutal kind of life to life. But they would still do better by acknowledging that desire openly to themselves, than by trying to quash it and only hiding it.

How can we become more integrated between conscious and unconscious parts, and undo any damage we've already caused? 

In my talk about the elephant and rider, I suggested (or gestured at) a few relevant things:

  • Pursue basic happiness alongside your conscious goals (and make sure that's happiness for you, not just e.g. keeping your friends happy by doing the things they like)
  • Use positive reinforcement on yourself rather than punishment - it's especially important not to punish yourself for noticing the "wrong" thoughts/feelings/desires/fears. Reward the noticing, even with just an internal "thank you for surfacing this".
  • Treat the content of these thoughts/feelings/desires/fears with respect. You might think of them as a friend opening up to you, and imagine the compassion you'd have when trying to figure out a way forward where both of you can flourish.

It's important to be gentle, to be curious, and to be patient. You don't have to resolve the whole thing; just acknowledging it respectfully can help the relationship grow.

There are other approaches too. Many people believe in using meditation to better integrate their thoughts and feelings and desires, for instance.

When you do something that you thought you didn't want to do, or when you're noticing an unexpected feeling, it's an opportunity for you. Don't push it away. 


Mediators of History

27 июня, 2020 - 22:55
Published on June 27, 2020 7:55 PM GMT

Epistemic note: all of the examples in this post are very simplified for ease of consumption. The core idea applies just as well to the real systems in all their complicated glory, however.

When oil prices change, oil producers adjust in response - they drill more wells in response to higher prices, or fewer wells in response to lower prices. On the other side of the equation, oil prices adjust to production: when OPEC restricts output, prices rise, and when American shale wells expand, prices fall. We have a feedback loop, which makes it annoying to sort out cause and effect - do prices cause production, or does production cause prices?

The narrative: from roughly 2010-2014, OPEC successfully restricted their production enough to keep oil prices around $100/barrel. But at that price, American shale wells are extremely profitable, and they grew rapidly - the dots in the map below are each an oil/gas well in the Eagle Ford basin in Southern Texas, and the graph immediately above shows American oil production. This situation was not sustainable; prices eventually dropped to around $50/barrel, which is roughly the marginal cost of American shale. Since then, prices rose above $50/barrel again around 2018, and American shale once again grew rapidly in response.

In this case, there is a useful sense in which production capacity causes prices, not the other way around - at least if we omit OPEC agreements.

Oil is a fairly liquid commodity. When there’s a shock in supply (e.g. OPEC agreeing to restrict output) or demand (e.g. lockdowns), the markets respond and prices rapidly adjust. Production capacity, on the other hand, adjusts slowly: drilling new wells and building new pipelines takes time, and once a well is built it rarely makes sense to shut it down before it runs dry. So (ignoring OPEC) prices right now are caused by production capacity right now, but production capacity right now is not caused by prices right now - it’s the result of prices over the past several years, when the wells were drilled.

Or, to put it differently: production capacity mediates the effects of historical prices on current prices. It’s a “mediator of history” - a variable which changes slowly enough that it carries information about the past. Other variables equilibrate more quickly, so they depend on far-past values only via the mediators of history.

(Incorporating OPEC into this view is an exercise for the reader.)

Another example: each of our cells’ DNA is damaged hundreds or thousands of times per day - things like strand breaks or random molecules stuck on the side. Usually this is rapidly repaired, but occasionally it’s misrepaired and a mutation results - a change in the DNA sequence. On the other side, some mutations can increase DNA damage, either by increasing the rate at which it occurs, or reducing the rate at which it’s repaired. So damage causes mutations, and mutations can cause damage.

Visualizations of some kinds of DNA damage, and keywords to google if you want to know more about them.

Here again, there is a useful sense in which mutations cause damage, not the other way around: the damage right now is caused by the mutations right now, but the mutations right now were caused by damage long ago. The mutations are a mediator of history.

This has important implications for treating disease: we can use antioxidants to suppress (some types of) DNA damage, but that won’t remove the underlying mutations. As soon as we stop administering antioxidants, the damage will bounce right back up. Worse, we probably won’t prevent all damage, so mutations will still accumulate (albeit at a slower rate), and eventually the antioxidants won’t be enough. On the other hand, if we can fix the problematic mutations (e.g. by detecting and removing cells with such mutations), then that “resets” the cells - it’s like the earlier damage never happened at all.

Change the mediators of history, and it’s like history never happened.

A third example: a robot takes actions and updates its world model in response to incoming data. It uses the world model explicitly to decide which actions to take, but the actions chosen will also indirectly influence the world model - e.g. the robot will see different things and update the model differently depending on where it goes. However, the action being taken right now does not influence the world model right now; the world model depends on actions taken previously. So, the world model mediates history.

Here, it’s even more obvious that changing the mediator of history makes it like history never happened: if we reset the robot’s world model to its original state (and return it to wherever it started in the world), then all the influence of previous actions is erased.

In general, looking for mediators of history is a useful tool for making sense of systems containing feedback loops. In chemistry, it’s the fast equilibrium approximation, in which the overall kinetics of a reaction are dominated by a rate-limiting step. In physics more generally, it’s timescale separation, useful for separating e.g. wave propagation from material flows in fluid systems.

In plasmas, charged particles follow a wheel-like motion - they orbit around magnetic field lines with drift superposed. When the orbital motion is on a fast timescale relative to the drift, we can average it out - see gyrokinetics.

The most common application of the idea in chemistry and physics is to simplify equations when we’re mainly interested in long-term behavior. We can just assume that the fast-equilibrating variables are always in equilibrium, and calculate the rate-of-change of the mediators of history under that assumption. In many systems, only a small fraction of the variables are mediators of history, so this approximation lets us simulate differential equations in far fewer dimensions.

From a modelling perspective, the mediators of history are the “state variables” of the system on long timescales. This is especially important in economic models, since the state variables are what agents in the models need to forecast - e.g. stock traders mainly need to know how mediators of history will behave in the future. If they know that, then the rest is just noise plus an equilibrium calculation.

Finally, in terms of engineering, mediators of history are key targets for control. For instance, if we want to cure aging, then identifying and intervening on the mediators of history is the key problem - they are both a necessary and a sufficient set of intervention targets. That actually simplifies the problem a lot, since the vast majority of biological entities - from molecules to cells - turn over on a very fast timescale, compared to the timescale of aging. So there are probably relatively few mediators of history, relative to the complexity of the whole human body - we just need to look for things which turn over on a timescale of decades or slower (including things which don’t equilibrate at all).


Five Ways To Prioritize Better

27 июня, 2020 - 21:40
Published on June 27, 2020 6:40 PM GMT

This piece is cross-posted on my blog here.

I’m going to let you in on a secret of productivity.  

Those people you admire, the ones who make you wonder how on earth they accomplish so much? Those people might work more hours than you or be more talented or more passionate. Or they might not

But they probably work on better things, in better ways. 

Now, before you protest that that’s the same thing as being talented or smart or hardworking, let me unpack that claim. Working on better things means they carefully choose what’s worth caring about, and what they won’t give a fuck about. Working in better ways means they carefully choose to do the most important actions to accomplish those goals.  

In short, working on better things, in better ways is … prioritization. Prioritizing well is the common thread behind successful people. 

Because prioritizing well is so freaking important. Cumulative good choices can multiply your impact tens or hundreds of times. You can’t work tens or hundreds as many hours in a day. You can choose what to do and how to do it most effectively. 

But prioritization isn’t just following through on the actions you already know you should do -- though that’s important too. Knowing what you should do is actually hard. There’s a long gap between wanting to make the world brighter and knowing how to do it. I want to acknowledge that. 

So, bear with me a bit. This post is longer than usual because I’m trying to give you a bunch of examples to really get a taste of what prioritizing feels like. I think every one of the examples will help you understand better, but, if you have a hard time focusing for 14 pages, maybe read one tool now and come back another time for others?

I’ve organized the examples around five specific concepts that you can use to help apply the mindset more effectively: Theory of Change, Lean Tests, Bottlenecks, Ballpark Estimates, and the 80/20 Rule. I’ll start each section with some concrete stories before summarizing the concept and how it helps you prioritize. 

Note: The following vignettes, apart from the ones about me and the interview with Tara, are fictional. However, they are inspired by multiple real conversations I've had. If you find yourself thinking "this could never happen", rest assured that, though these examples are fictional, they are based on real conversations.  

I. Theory of Change1. Career planning

Phil is telling his friend Erica about his plans to switch into AI safety. 

Phil: “So, I was thinking I’d stay at my job for another six months. I have a bonus coming then, so it’s a better time to make a career switch. ” 

Erica: “Huh, I’m surprised that your current job is the best plan for your goals. I thought you said the most important goal was building your machine learning skills?”

Phil: “That’s right, but I’m picking up some machine learning on the job, so it’s also skill building!”

Erica: “But you’re only spending a bit of your time on ML, and it’s not like all of that learning will transfer to the jobs you’re applying for anyway. Wouldn’t you learn a lot more by independently studying some directly relevant ML?”  

Phil: “Hmm...you’ve got a point. I could learn a lot more if I took a month off to just study. But that feels riskier, and I might have a hard time staying motivated.” 

Erica: “But also consider, you told me that you thought that working on AI safety was several times as impactful as your current role, at least in expectation. That means that spending six more months before switching is like wasting a whole year or more of impact.” 

Phil: “Man, you’re right. I could start applying now so I’m ready to switch right after I get my bonus. But, I don’t actually know if I’m ready! I don’t know if I have the skills I would need to get hired.” 

Erica: “Then it sounds like you need to find out.” 

Phil: “I know a couple of people I could ask, and I can look at the job posts to see what skills they’re looking for. That should help. And if I’m not ready, I’ll probably get a better idea exactly what I should study. Then I can make a timeline so I’m ready to switch as soon as I can.”  

Erica: “How do you feel about that plan?” 

Phil: “Pretty good. I think leaving my job feels really risky given how uncertain I am, and it was making me avoid thinking about switching. Investigating more without feeling like I need to commit to leaving helps.” 

When Phil worked backwards from his end goal, he realized his original plan was actually a pretty bad plan (at least for him), because it wasn’t going directly for the goal. Despite a lot of uncertainty, Phil’s best guess is that applying sooner is much better. Waiting would probably mean he’d lose a few months that could have been spent on much more valuable work. But at worst, he might never make a plan that will actually have an impact unless he stops and thinks harder -- it’s really easy to default to the status quo, even when that’s a bad decision. 

2. Publishing an op-ed

Elle wants to publish an op-ed. But she doesn’t really know how op-eds get published. So, she asks her friend Peter - who has published op-eds before - about the process.

Elle: “So, I was thinking I’d write an article, send it to a bunch of newspapers, and cross my fingers. What do you think?”

Peter: “Well, that’s probably not going to work out. You don’t just write an op-ed and send it to places; you need to really tailor the piece to the magazine you want to publish it in.”

Elle: “Huh, how would I do that?” 

Peter: “First, you should check out other pieces on the platform you’re interested in. If a venue has already published several pieces on similar topics, then it’s likely that an editor there likes that topic and is more likely to accept a new perspective on the issue. Second…”

Elle lacked an accurate model of how her actions would lead to her goal because she didn’t understand the world well enough. Creating an accurate theory of change required learning what actions would generally suffice to accomplish her goal. If she hadn’t learned how the process actually worked, she might have wasted a lot of effort without ever getting the piece published. But she had no way of knowing that until she asked. 

3. Studying for the GRE

Like many students before me, I didn’t really care if I remembered the content after the test. I just wanted to spend as little time as possible to get a GRE score I was happy with. But how to do that? 

According to test prep websites, I should have worked through a GRE prep book (preferably theirs). After all, those are explicitly designed to prepare you for the GRE. They include everything I needed to know; concept review, vocabulary quizzes, and practice tests. 

But I knew they would actually waste time. 

See, I knew that spreading my time evenly over the material was an inefficient way to learn. I’ll learn a lot more studying stuff I don’t know yet, rather than reviewing stuff I already knew. Neither the study guide nor the vocab would have focused me on just my weakest areas, so I would have spent hours rereading familiar material. 

Instead, I optimized my study for rapidly improving my weaknesses: take practice quant section tests (my weakest section), study the questions I missed, summarize the concepts behind the questions, repeat. Those three steps saved me dozens of hours.

Theory of Change

In each of these examples, the person wanted to accomplish a particular goal.  They worked backwards from that goal to find the steps that would make them likely to succeed at the goal. 

Elle figured out how the publishing world worked so that she knew what actions were likely to succeed. Phil found that a different path would allow him to have an impact sooner and reduce his chances of failing. I worked backwards to cut out unnecessary work and save time. 

Each person needed to know their goal, figure out what steps would reliably lead to that goal, and then focus only on those steps. That’s the theory of change -- this model of the causal chain of actions that lead to successfully accomplishing the end goal. You might need to investigate how the world works or go learn something if you don’t have enough information, in order to accurately pick what will have an impact.  

So, if you want to apply this tool, ask yourself -- what steps will actually make you likely to achieve your goals?

(If you want more, here’s a great post on theory of change.)

Working out your goals and your theory of change is a prerequisite step to other prioritization techniques. For example, it’s going to be hard to use the next concept (lean tests) without having a least a starting point for what you want to optimize. 

II. Lean Tests1. Starting a business

The summer after my freshman year of college, I was hired to start a company making personal biographies.

I immediately started finding contractors to create the books, getting prototypes made, and building a website. This seemed reasonable to me then – after all, that was my job description. And I did a good job. At the end of the summer, I had a full production plan ready for when the first customer purchased.

Except, no one ever purchased a single book.

If I had started by talking with potential customers, I could have known in a month that the idea was doomed from the start. The target audience had no interest in the elaborate $10,000 product my employer envisioned. But I didn’t know that, because I created a product before I confirmed people were ready to buy it. I could have saved an entire summer of work if I had just tried talking to potential customers first.

2. Conducting research

Max is asking his friend Ellen for advice about his research project. 

Max: “Aw man, I’m super bummed -- I spent six months researching this policy area. Then when I sent my write-up draft for feedback, someone sent me an unpublished document where they’d already done part of the research! Plus, now I need to do more research to answer their questions. I feel like I wasted so much time already, and I’m not even done. What could I have done?” 

Ellen: “That sucks. Hmm, there’s a post called Research as a Stochastic Decision Process that might help avoid similar situations in the future. Want to hear about it?” 

Max: “Yeah!”

Ellen: “The really simple version is to first do the parts of the task that are most likely to fail or change what other steps you will do, rather than doing the easiest parts first. This way, you reduce uncertainty about which tasks are necessary as quickly as possible. For example, if your task has three steps and one is most likely to fail, you save time in expectation by doing that one first, because you might not need to do the other steps at all.” 

Max: “Yeah... I did the easy tasks first here. Like, doing all the research was a lot of work, but it was easy to just keep reading. Asking for feedback didn’t take much time, but it was hard. I got really anxious whenever I thought about sending the emails, and just kept doing more research, and then more research. But having that feedback earlier would have totally changed what things I choose to research. It could have saved me hundreds of hours.” 

3. Learning a new subject

Alex wanted to do some independent study to see if he would be a good fit for working on AI safety. Based on 80,000 Hours recommendations, he found a few promising math courses that he could work through on the weekends over a few months.  

Before he got started, he asked a few acquaintances who worked on AI safety whether his plan made sense. They mostly thought it did, but agreed that several of the math topics he had planned to study weren’t immediately valuable. He could safely skip those for now.

Spending an hour sending those emails cut his months of study in half.

Lean Tests

In each of these examples, the person wanted to efficiently accomplish their goal. In the first two examples, the person failed to test quickly, and wasted time or failed entirely. In the third, by quickly testing their idea, the person exposed flaws early so that they could rapidly correct them or move on - increasing the proportion of their effort that actually made them succeed.

I spent an entire summer on a project that failed entirely because I didn’t test it early enough to change it into something that could succeed. Max could have saved hundreds of hours by asking for feedback upfront. Alex did save hundreds of hours by checking his plan before he implemented it.  

Each person could have broken their tasks into chunks to iterate on, getting feedback each step of the way, rather than risk wasting time by investing a lot of effort without feedback. 

Lean methodology is about continuously doing small tests to check that you’re on the right track. This allows you to iteratively making lots of corrections that move you towards your goal, even when you’re not sure what is required (or when you are sure but are wrong.) 

By finding the flaws early, you can change course early, minimize wasted time, and reduce the risk of ultimate failure. Better to know a project will fail before you put in months of effort that could have been spent on a project more likely to succeed. You could get feedback from more experienced people, your target audience, or the thing itself.

So, what is the first quick test you could create to get feedback and iterate? 

III. Bottlenecks1. Anxiety

Anna is early in her journalism career. She’s talking to her partner Dan, a fellow writer, about her draft of a piece on factory farming. 

Dan: “Hey, did you pitch your idea to your supervisor today like you planned?” 

Anna: “No… I just got too nervous and couldn’t make the words come out. I think I want to keep working on it before I talk to her.” 

Dan: “Anna, your piece is great! You’ve been working on it for a year already. It’s not going to get better without feedback. All of your hard work won’t matter until someone sees it.”

Anna: “Yeah, you’re right. But I just feel so scared when I try. My chest gets tight and it feels hard to breathe.” 

Dan: “Anna, maybe it’s time to consider talking to someone about this. What do you think?”

Anna: “I’ve actually been thinking about it for a while now, and I think you’re right.” 

2. Procrastination

Mary had meant to get started on her final presentation for her internship two weeks ago. But she felt a wave of dread whenever she started to think about it, so her thoughts slid away to something less awful each time. Now the presentation is in a week, and the feeling has risen to panic mode. Yet she still can’t make herself even look at her research.

This isn’t a new feeling for Mary. She’s six months overdue on a write-up from her former research position. But whenever her former adviser sends an email asking for the report, a pit opens in Mary’s stomach.

Mary started coaching to tackle her persistent procrastination. Over the next year, Mary and I worked together to build up her ability to make better plans, set up habits to reduce distractions, learn to ask for help early and often, and find commitment mechanisms that work for her. 

After a year of working on this big bottleneck, she feels confident in doing her work by her deadlines. That work was an investment in herself that will pay off over the rest of her career.

3. Fatigue

Sometimes one issue will dominate everything else in regard to productivity, often a physical or mental health issue. In my case, it was fatigue. When you need to take three naps a day, you get less work done regardless of what other productivity tools you use.

So, it was worthwhile investing a bunch of time and effort to improve this. I tried a parade of experiments and tracked all the factors that I thought might influence my energy levels – including sleep times, sleep duration, hydration, exercise, medications, melatonin, doing a sleep study, temperature, naps, and nutrition. I tracked my energy levels and a changing subset of variables for 3-6 months, then compared the odds ratios for each variable.

Here, a bunch of small things cumulatively broke the bottleneck, primarily sleep duration (which was fixed by a consistent sleep schedule), exercise, hydration, and finally an antidepressant. 

While I still have issues with fatigue, it’s no longer the key thing holding me back.


In each of these examples, the person needed to get past one bottleneck that was holding them back from succeeding. Each bottleneck eclipsed other tasks, even valuable tasks, until the problem was addressed. 

Anna needed to work on her social anxiety before her hard work would ever see the light of day. Mary found she could accomplish several times as much once she got her procrastination under control. I had the energy to start a blog after I reduced my fatigue. 

Originally, bottlenecks referred to the “rate-limiting factors” that slow down the entire production line. In our examples, the bottlenecks are the tasks, beliefs, or problems that slow down or stop you from accomplishing other tasks. Each person made progress by identifying what was holding up other steps. Once they took care of the high-leverage factor, everything else went much faster. 

So, what are the one or two things you could change about yourself or your environment to accomplish twice as much?

IV. Ballpark Estimates 1. Choosing projects 

Will is talking with his PhD supervisor, Kate, about feeling overwhelmed by too many projects. 

Will: “I think I just need to choose one or two to focus on, and put the rest on hold for now. But I want to work on all seven.” 

Kate: “Well, to start with, which ones are most important to work on?”

Will: “That’s the problem; they all seem important! Papers 1 and 2 have a good chance of being published in a good journal, which is important if I want to continue in academia. Paper 3 has a good collaborator, and I don’t want to let them down. And papers 4 and 5 are exciting. I think those ideas could really be impactful. Arrg...I just feel overwhelmed when I think about it, like I need to do them all.”

Kate: “Okay, let’s try a thought experiment. How much would you pay to have each of these projects magically completed? If it’s hard to think about paying with your own money, how much do you think Open Phil would pay to have the project completed?”

Will: “Argh, thats hard.” *15 minutes of brainstorming later* “Okay, paper 4 could be really big if it goes well, and 2 and 3 are maybe most important for my career. So I think I’d pay like $1000 each for papers 1 and 5, $1500 for papers 2 and 3, and $5000 for paper 4. But these are super crude, I’m really just guessing here.” 

Kate: “Crude numbers are fine. You’re really just trying to get a better sense of how you intuitively value each of these. Those numbers help clarify the ranking and rough magnitude of difference between the projects. Sounds like paper 4 is the best, and then 2 and 3 are a bit better than 1 and 5, all else equal. Now, which projects do you expect to take the least time?” 

Will: “Paper 3 for sure. My collaborator is doing a bunch of the work, so it’s probably half as big as the other papers. Between the others...really hard to say. Um, so maybe I’d sort them 3, 1, 4, 5, 2 from least to most time, but I’m really guessing here.”

Kate: “Based on value and time required, it sounds like you should spend most of your time on paper 4, plus some on paper 3.” 

Will: “But all of these are estimations! I’m not confident, and I could be really wrong.”

Kate: “You’re not going to be confident. Things are uncertain and will be uncertain no matter how long you think about it. So you have to make your best choice despite your uncertainty. Estimates are a way to try to make that choice as well as you can. And you should absolutely spend more than 5 minutes thinking about them. So, take a few days to think about your estimates in more depth. Maybe ask a couple of advisors. But, when you’re done, go with the highest expected value and stop worrying about it. You can change your plan if you get new information. For now, you’re doing the best you can, and that’s good enough.” 

Will: “I...think...I can do that. Thanks, Kate.” 

2. Learning a new skill

Lyra is talking with her coworker Mike about her plans for independent study to improve her coding skills.

Lyra: “So, I’m debating between spending a bigger chunk of time to really understand computer architecture or doing several small learning projects around things that came up in my job. I’m having a hard time making progress on either idea because I keep flipping back and forth about which seems most important.” 

Mike: “Can you try calculating the time required for the learning, and the time saved afterward, to calculate which is better?” 

Lyra: “So I tried doing that earlier. I estimated the architecture learning would take me fifty to a hundred hours to do, and save maybe twenty or thirty hours a year. On the other hand, one of the smaller projects would only take five or ten hours, and it would save me a few minutes a day, which adds up to ten or fifteen hours a year.” 

Mike: “That sounds like the smaller project is clearly a better deal - it would pay for itself within one year, while the bigger project would need more like three years to break even.” 

Lyra: “I know, but the bigger project feels important anyway. I think... the bigger project isn't just about saving time. I care about it because it also opens up the option to do new things that I can’t do right now. But I’m not sure if that benefit is big enough to make it worth doing.” 

Mike: “Could you run an experiment for five hours or so to see if it seems like you’re able to do new things?” 

Lyra: “Yeah, that sounds good. If it looks like I’m not, then I can go ahead with the smaller project since that’s better for saving time.” 

Even though Lyra didn’t follow her numbers exactly, they were helpful for clarifying the decision. 

3. Job hunting

Mark has just graduated from university, and he wants to complete an ML masters to be an engineer at an AI safety org or earn to give if that doesn’t work out. 

However, the programs cost between $15,000 and $24,000, and Mark isn’t comfortable going into debt. So his plan is to get a job now, apply for the masters’ programs in the fall, and save up money until he starts the following year. 

Mark’s previous summer job has offered him a full-time role for $15 an hour. However, his undergrad thesis supervisor encourages him to look into software engineering jobs, which the supervisor thinks he’s qualified for. Unfortunately, he doesn’t have enough time to take the summer job while also applying. 

Mark isn’t sure about it, but his supervisor convinces him to look into some job postings for local positions. So Mark puts together the following estimates.  

Based on those numbers, he decides it’s worth delaying starting at his previous job to spend a couple of months applying to engineering roles. If it works out, he’ll earn a lot more. If the job hunt doesn’t seem promising after a few months, he can go back to the other role.

Ballpark Estimates

In each of these examples, putting numbers on the uncertainty helped the person prioritize. Estimating didn’t cleanly decide their priorities, but it reduced uncertainty. It revealed blind spots, such as missing important considerations or thinking two things were comparable when one was actually way more important. By quantifying, ranking, or crunching numbers, they were able to make a better guess at what should be prioritized. 

Quantifying his expected value and time required helped Will increase his expected value per hour of work 2x compared to working on one of the papers at random. Quantifying return on investment for her time learning helped Lyra identify what factors she was overlooking, so now she can evaluate whether the project is worthwhile. Mark decided to apply to more ambitious jobs based on his numbers, and got a role making >50% more after three months of applying. 

Classically, Fermi estimates are back-of-the-envelope calculations intended to quickly approximate the correct answer, usually when the real answer is difficult to get. Here, people estimated values and costs of different options, so they could approximate the return on their investment and compare opportunity costs. 

Since we’re frequently prioritizing amid uncertainty, even moderately reducing that uncertainty improves decisions. Often you have some data easily available even when you feel uncertain. Sometimes this is just making your intuition concrete. Sometimes it is actually gathering data and crunching numbers. You should take care not to be overconfident in your estimates, but even totally made up numbers can sometimes be useful, such as when you want to make a decision between competing intuitions. 

So, what returns do you get on the time and effort invested? How does this compare with your other options?

V. 80/20 Rule1. Working hours experiment

Bill was frustrated by a consistent dip in energy each afternoon. He felt less motivated during this time, and sluggish and slow even when he forced himself to work. 

Working out in the afternoon helped him feel more energetic afterward, but taking a forty-five-minute break made him feel like he needed to work late. 

He knew that subjective experience doesn't always match actual output, so he tried quickly recording how many words per hour he wrote for a week. Bill also noted how he subjectively felt during that time. At the end, the data suggested that his output dropped by nearly half during that period, and only gradually picked back up over the later afternoon. 

He kept recording the data while he took some workout breaks. Although the data was noisy, he found that he got about as much done on days when he worked out and days when he didn’t. 

Given that, he decided to work out each afternoon without feeling obligated to work late.  

2. Optimizing school

Sarah is a college freshman asking Alice, a senior who works in the same lab, for tips on how to succeed in college. 

Sarah: “So, what are the most important things you do to get good grades?” 

Alice: “Umm, I plan each day the night before so I know exactly what I need to do, and then I set aside a couple of hours when I turn off my phone and study without any distractions. That’s big. I usually do the most important task first so that I don’t risk running out of time. Oh, and I have a question in mind while I’m doing research, so that I don’t lose too much time going down rabbit holes.” 

Sarah: “Is there something else that helps you reliably manage your workload?” 

Alice: “So, I start my planning by looking at which projects are worth a lot of a grade or that I care a lot about learning, and I choose which projects deserve the most time and which to just do the bare minimum. For example, going from an okay paper to a great one takes a lot more work, so I’ll only do that if I really care about the paper. Otherwise, I’ll wait to start the paper until the day before it’s due, and then race like crazy. It forces me to get the paper done without spending too much time on it.” 

Sarah: “Thanks, Alice. That sounds helpful. I’ve been feeling really overwhelmed by taking three really hard classes, and I really want to get As in all of them. I’m a bit of a perfectionist, I know.” 

Alice: “It’s great that you’re asking for advice, Sarah, that will probably help you figure college out way faster than I did. But I want to ask, why do you care about getting good grades?” 

Sarah: “Wha-what?”

Alice: “Why are As the thing you’re aiming for right now?” 

Sarah: “That doesn’t even make sense. Grades are just what we’re supposed to do here. Wait, I’ll try to work out the reason… Because in school, grades are how we know we’re doing well or where we need to improve. And how future employers or grad schools know that we’re good.” 

Alice: “That’s fair, but it’s only a small part of what people will care about in the future. You’re only a freshman, but you’re already working in this lab and your research seems really promising. You obviously love doing it. But you’re only doing five hours a week here because you say you don’t have enough time to do this and study. If you instead spent a lot more time on research and did really cool things there, I bet both grad schools and future employers would care about that more than a 4 point GPA.”

Sarah: “Hmm.”

Alice: “I mentioned earlier how important it is to decide which parts of a project deserve more time, and which to just put in the minimum. Well, it’s even more important to carefully choose what projects or classes are worth a lot of time, and when it would be better to do the minimum you can in some classes, so that you can invest heavily in others.”

Sarah: “I need to think about this. What you’re saying kind of makes sense, but I’m worried that if I’m doing the bare minimum, my grades will drop too much.”

Alice: “Good things to consider. I’m applying to med school, so my grades matter. But I chose to take easy classes for my gen eds and electives, so that I can put in a lot of time here at the lab without damaging my GPA. That’s the right decision for me. I’ll bet it’s worthwhile for you to spend some time thinking about what you want to be perfectionist about.” 

3. High-value rest time

In my interview with Tara Mac Auley, she advised trying a bunch of leisure activities to decide which are most valuable for you. 

“If you take time to rest and you come back, and it doesn't feel better then probably the ways that you've chosen to rest aren't in fact the most restorative things you could be doing. And so I would suggest trying a lot of different things: a lot of different types of social activities, or physical activities, or intellectually engaging activities. 
I did this a lot when I was in my early 20s. I picked a random event from meetup.com every day for about two months, and I just had to go to whichever thing came up. And then I would write down beforehand whether I thought I would enjoy it and feel drained or refreshed from that activity. And then I would compare afterward what I actually felt and, I don’t know, that was really informative and good for me.”

She used this type of process to identify the activities that best leave her rejuvenated and rested.

“Being near water and swimming, but not in a swimming pool, it has to be natural water. Being in nature. Reading a book, especially reading a book in a park or by a lake or something like that. Spending quality time with close friends or family just having a conversation for an hour with no particular goals, I find really rejuvenating. And eating a really nice meal; one where I can kind of savor all of the different tastes and textures….I go out dancing a lot on my own, to go and see music artists that I enjoy, and I just dance like a crazy person until I'm really tired and then I go home, and that's amazing.”80/20 Rule

In each of these examples, the person wanted to prioritize the most valuable subset of possible actions. By identifying the higher-value actions, they could get more done for their effort. 

Bill did an experiment to find out which hours of work provided the most value, so he could make better decisions about when to work. Sarah prioritized the highest value work to get good grades, and started thinking about how much more valuable that effort would be if she prioritized the highest value goals to begin with. Tara experimented to find out which activities were the highest value fun for her, which she can now exploit 80/20 style. 

Based on the idea that 80% of an output comes from 20% of the input, the 80/20 rule suggests that the value per unit of effort varies a lot across different actions. Because outputs vary so much, explore more can unearth dramatically better options. So, similar to Tara, you need to try many actions first in order to effectively identify the top-performing subset. Once you’ve identified which actions are most valuable, you can narrow your focus to just that subset. Then your output will increase significantly for the same effort. 

So, what gives you the most value for the least effort? What can you cut with minimum loss so that you have extra resources to put toward what’s most valuable?


All of these stories are of how people tried to identify the most valuable actions they could take to accomplish their goals. They reduced uncertainty, said no to other actions, and made choices based on their best guesses. 

You might be tempted to say these examples don’t feel important. That choosing which skill to learn or overcoming anxiety can’t change the world. And maybe you’re right, if you only look at that one step. 

If you put all of these together, you have a mindset that searches for the most valuable goals, builds models to effectively accomplish them, iteratively tests assumptions against the world, logically weighs the opportunity cost, and judiciously spends time and effort to get the most impact possible. That mindset touches all your decisions. 

And that’s prioritization. 

Because prioritization isn’t something you do once a month. It’s not a magical ability that lets you do everything - quite the opposite, in fact. It’s the gut-deep sense that your time and effort are limited and you need to choose what to do, because you can’t do everything. 

But when you do that? When you put all of your reason and tools to the task of choosing the most valuable goals? 

Then we have a chance. Choose important goals, and you could save lives from dying of malaria or build a future where pandemics don’t wipe out hundreds of thousands. Accomplish those goals, and the world becomes better. If you need to take care of your own mental health or build skills first, then do it. You’re still nudging the world in the right direction. 

And if you don’t? ...Then we’re still right where we are now. We’ve lost out on some of the goodness and wonder the world could have had. There’s the sense of being so close and just missing what could have been. 

That’s why I want to convey the mindset of what it feels like to prioritize. 

So, here are five questions to take with you. Use them to make the world better.

  1. What steps will actually make you likely to achieve your goals?
  2. What is the first quick test you could create to get feedback and iterate? 
  3. What are the one or two things you could change about yourself or your environment to accomplish twice as much?
  4. What returns do you get on the time and effort invested? How does this compare with your other options?
  5. What gives you the most value for the least effort? What can you cut with minimum loss so that you have extra resources to put toward what’s most valuable?

Enjoyed the piece? Subscribe to EA Coaching’s newsletter to get more posts delivered to you.


Have general decomposers been formalized?

27 июня, 2020 - 21:09
Published on June 27, 2020 6:09 PM GMT

Hi, I'm working on a response to ML projects on IDA focusing on a specific decomposer, and I don't know if someone's formalized what a decomposer is in the general case.

Intuitively, a system is a decomposer if it can take a thing and break it down into sub-things with a specific vision about how the sub-things recombine.


Why are all these domains called from Less Wrong?

27 июня, 2020 - 16:46
Published on June 27, 2020 1:46 PM GMT

When I visit a Less Wrong page, the browser also attempts to load content from the following domains:

* algolia.net
* algolianet.com
* cloudflare.com
* cloudinary.com
* dl.drop
* dropbox.com
* dropboxusercontent.com
* google.com
* googleapis.com
* googletagmanager.com
* intercom.io
* jsdelivr.net
* lr-ingest.io
* typekit.net

Why is it so? I don't want to advertise to half of internet (and specifically to Google) the fact that I read Less Wrong. What happens if I simply block all these domains? What service do they provide if I don't block them?


Life at Three Tails of the Bell Curve

27 июня, 2020 - 11:49
Published on June 27, 2020 8:49 AM GMT

If you assume other people are the same as you along every dimension then you will over-estimate other people exactly as much as you underestimate them. It is a good first-order approximation to assume other people are like yourself.

Most people are in the middle of any given bell curve. You are probably in the middle of any given bell curve. It is a good second-order approximation to assume other people are like yourself.

But…if you assume you are statistically normal when you are not then you will have problems. I have made this mistake many, many times because my personality is extremized in three big ways.

Natural Amphetamines

I once heard a friend, upon his first use of modafinil, wonder aloud if the way they felt on that stimulant was the way Elon Musk felt all the time. That tied a lot of things together for me, gave me an intuitive understanding of what it might “feel like from the inside” to be Elon Musk. And it gave me a good tool to discuss biological variation with. Most of us agree that people on stimulants can perform in ways it’s difficult for people off stimulants to match. Most of us agree that there’s nothing magical about stimulants, just changes to the levels of dopamine, histamine, norepinephrine et cetera in the brain. And most of us agree there’s a lot of natural variation in these chemicals anyway. So “me on stimulants is that guy’s normal” seems like a good way of cutting through some of the philosophical difficulties around this issue.

The Parable of the Talents by Scott Anderson

According to drugabuse.com, amphetamines have the following short-term effects:

  • Quicker reaction times.
  • Feelings of energy/wakefulness.
  • Excitement.
  • Increased attentiveness and concentration.
  • Feelings of euphoria.

These characteristics describe my baseline state. I feel like I am on stimulants[1] all the time. Natural amphetamines have advantages. I am energetic. I concentrate well. I get lots of work done.

They have disadvantages too. Amphetamines are associated with headaches, appetite suppression, severe anxiety and obsessive behavior all of which also describe me. The headaches are ignorable because I cannot remember ever not having a headache. The appetite suppression is survivable because I live in a civilization full of convenient calories. The obsessive behavior is a double-edge sword. On the one hand it makes me bad at small talk. On the other hand it helps me learn better.

The anxiety in causes me to overprepare for disaster. When your are anxious, it is natural to look around for a threat. When I feel merely moderate anxiety I ought always to consider that there may be nothing to fear but fear itself.

I should assume a prior expectation that other people have the following characteristics relative to myself:

  • Slower reaction times
  • Lethargy
  • Boredom
  • Inattentive and distractible
  • Depressed
  • Relaxed
  • Normal, adjusted, balanced, sane, stable[2]
High Curiosity

Among the Big Five personality traits, curiosity (openness to experience) is my most extremized[3] one.

Other people are comparatively closed to experience. They are conventional and traditional in their outlook and behavior, with familiar routines and a narrow range of interest. I am unconventional and extraordinary[4], with a wide range of interests and no set routine. I think in peculiar ways and tend to get absorbed by my own fantasies.

Openness to experience is considered a positive trait within liberal Western society, but it comes with disadvantages. I suffer hard for my nonconformity.

When I meet new people, I ought to assume the following characteristics[5] as a prior:

  • Conventional, traditional, culturally conservative
  • Tendency not to daydream
  • Rigidly conforming to routines, little need for variation
  • Narrow range of interest
  • Lower crystallized intelligence
  • Less general knowledge
  • Little need for cognitive exercise
  • Dislike of intellectual activities in general and solving puzzles in particular
  • Ethnocentric, authoritarian, intolerant of diversity
  • Concerned with social dominance
  • Prejudiced against all sorts of things
  • Low positive affect, little joy
  • Emotional blunting, reduced affective display
High Systemization

The empathizing–systemizing theory suggests there is an evolutionary tradeoff between empathizing and systematizing with autism at one end and schizophrenia at the other end, with most people in the middle. I find it a useful model for understanding a difference between myself and others. I am heavily on the systematizing end of this spectrum.

There is less research on this topic than the others so instead of listing the traits of systematizers I will list the traits of autistic people and flip them around. Compared to me, other people are:

  • Tolerant of disruptions to their schedule, do not need to be notified in advance
  • Tolerant of noise and other background sensory stimuli
  • High empathy, almost telepathic
  • Fewer, less intense special interests
  • Think vaguely, imprecisely
  • Good communicators

Compared to the mean, I seem to be a radically nonconforming autistic savant high on cocaine. That would explain why strangers tend to remember having met me.

High openness is associated with a preference for frequently-changing schedules. Systematizing is associated with schedule inflexibility. How can I exhibit both traits? I am inflexible toward others when it comes to my whimsical schedule.

By a similar paradox, I feel comfortable in the traditional oppressive culture of Japan. My systematizing proclivities benefit from the quiet perfectionism. Meanwhile, as a foreigner, I am not myself expected to conform.

These three characteristics all contribute making me better at advanced technical tasks; I am pathalogically good at writing software. Ironically, the same traits simultaneously make it harder to find a job and fit into a corporation. I can only work somewhere where my technical skills are sufficiently valued for the company to tolerate my nonconformity. My ideal company would probably be working remotely for a small machine learning team. Or I could simply self-employ.

Socially, these extremized traits suggest I might connect well to other people through art, which benefits from obsessively systematic nonconformity. In particular, I have exactly the right character sheet to write a technical webcomic like xkcd.

These traits also suggest I should stay away from quantum field theory as it were heroin.

  1. Disclaimer: I have never personally taken amphetamines, cocaine, heroin, or anything along those lines. I apologize for my lack of empirical rigor in this domain. ↩︎

  2. Thesaurus.com lists these words as antomyms to "obsessive". ↩︎

  3. My other Big Five personality traits are in the middle 98% of the bell curve. ↩︎

  4. Thesaurus.com lists "extraordinary" as an antonym to "traditional". ↩︎

  5. Most of these characteristics come from skimming the Wikipedia article on openness to experience. ↩︎


Map Errors: The Good, The Bad, and The Territory

27 июня, 2020 - 08:23
Published on June 27, 2020 5:22 AM GMT

What happens when your map doesn't match the territory?

There's one aspect of this that's potentially very helpful to becoming a rationalist, and one aspect that's very dangerous. The good outcome is that you could understand map errors more deeply; the dangerous outcome is that you could wind up stuck somewhere awful, with no easy way out.

The first version, where you notice that the map is wrong, comes when the map is undeniably locally wrong. The map says the path continues here, but instead there's a cliff. (Your beliefs strongly predict something, and the opposite happens.)

The ordinary result is that you scratch out and redraw that part of the map – or discard it and pick up an entirely different map – and continue along the new path that looks best. (You decide you were wrong on that one point without questioning any related beliefs, or you convert to a completely different belief system which was correct on that point.)

The really valuable possibility is that you realize that there are probably other errors besides the one you've seen, and probably unseen errors on the other available maps as well; you start to become more careful about trusting your maps so completely, and you pay a bit more attention to the territory around you.

This is a really important formative experience for many rationalists: 

  • Take ideas seriously enough to notice and care if they fail
  • Get smacked in the face with an Obvious But False Belief: your past self couldn't have imagined you were wrong about this, and yet here we are.
  • Deeply internalize that one's sense of obviousness cannot be trusted, and that one has to find ways of being way more reliable where it matters.

(For me the Obvious But False Belief was about religion; for others it was politics, or an academic field, or even their own identity.)


Now, the dangerous outcome – getting trapped in a dismal swamp, with escape very difficult – comes when you've not seen an undeniable local map failure, so that you never notice (or never have to admit) that the map isn't matching up very well with the territory, until it's too late.

(I'm thinking of making major life decisions badly, where you don't notice or admit the problem until you're trapped in a situation where every option is a disaster of some sort.)

Sometimes you really do need to make bold plans based on your beliefs; how can you do so without taking a big risk of ending up in a swamp?

I suggest that you should ensure things look at least decent, according to a more "normal" map, while trying to do very well on yours. That is, make sure that your bold plan fails gracefully if the more normal worldview around you is correct. (Set up your can't-miss startup such that you're back to the grind if it fails, not in debt to the Mob if it fails.)

And get advice. Always get advice from people you trust and respect, before doing something very uncommon. I could try and fit this into the map framework, but it's just common sense, and way too many good people fail to do it regardless.

Best of luck adventuring out there!


Negotiating With Yourself [Transcript]

27 июня, 2020 - 02:55
Published on June 26, 2020 11:55 PM GMT

(Talk given on Sunday 21st June, over a zoom call with 40 attendees. orthonormal is responsible for the talk, jacobjacob is responsible for the transcription)


orthonormal: So, I'm doing a generalisation of the post that was curated and this post is sort of an elaboration of what Vaniver talked about. If you notice that there are differences between your private intuitions, and what you can publicly acknowledge, this is a system fast versus system slow thing-

orthonormal: This is a question of negotiating with yourself. I'm going to present a model and talk about some consequences, but I’ll start with a question: why do people in our sphere tend to burn out or go nuts?

orthonormal: This is a pretty important question. I’ll use an analogy many of us have heard before — the elephant and the rider. The conscious mind is the rider and the elephant is the unconscious mind. The rider wants to get somewhere, but the elephant has its own preferences about what happens.

orthonormal: Some features of this analogy I think are true and useful for minds, for humans, is that the elephant has these immediate preferences and some longer term needs, just like we have subconscious desires and subconscious needs. The rider has their own preferences and some carrots and sticks, but the real advantage is having a map. The elephant can just completely ignore the rider if its preferences are strong enough. So how this connects to being human is that our subconscious has these desires and needs and fears, and our consciousness may have a little bit of willpower but what it really has is strategy and planning, the ability to pick out a path so that the elephant won't want to deviate from it too much. If you go right by a river, the elephant is going to want to drink. So, if you don't want to stop for that, don't go by the river right now.

orthonormal: Finally, the subconscious, the elephant, can just overwhelm you in two ways: one of which is it controls your motivation, so it can burn you out or get you depressed if you're trying to defy it too much. And the second is, it can induce bias.

orthonormal: This is the subject of The Elephant in the Brain by Robin Hanson and Kevin Simler, claiming that a lot of motivated reasoning comes when the elephant wants something, the rider doesn't, and the elephant changes the rider's cognition to make the rider feel like it wants that thing (for noble reasons, of course).

orthonormal: This would be very bad. Depression is its own thing, but it doesn't change your way of thinking about the world. It doesn't make you go crazy. Going crazy is really bad. Citation — don't need it in this community.

orthonormal: What can you do about this? There are a couple of things. First: you can keep the elephant happy. You can choose a path along the map so that the elephant will be reasonably well fed, have enough to drink, not get tired going up and down mountains, etc. And you can still get to a place you'd like to go. Maybe not the place that's absolute best, but good enough.

orthonormal: This is analogous to a lot of things in Effective Altruism where I'm telling people, “give yourself permission to be happy”. Don't take a job that's going to make you miserable just because you think it is the best thing to do. Find something that meets you in the middle. I don't recommend living on minimum wage and giving away everything else to charity because you're going to burn out from that, or you're going to come up with some crazy reason why doing something else is better. So just let yourself be happy. 80/20 things.

orthonormal: The second thing is about positive versus negative reinforcement. I mentioned carrots versus sticks earlier, and this is really good for also keeping the elephant happy and keeping the elephant liking the rider. There's a wonderful book called Don't Shoot the Dog, which is primarily about animal training, but also about interacting with people — and even about interacting with yourself. It talks about achieve things in animal training by rewarding the animal or by punishing the animal. Rewarding the animal, you can get them to do great things. Punishing, you can get them to do some things... but they'll also just want to avoid the trainer. You don't want your subconscious mind to want to avoid your thoughts. It'll make it even harder to find out what's going on with your desires.

orthonormal: Finally, real quick, treat the elephant with respect, even if you disagree with it. It's really important for you to be able to say, not "Your desires are wrong", but "I understand why you want that, I want this other thing, let's find common ground." And I think those are some of the really important lessons about the elephant and the rider.

orthonormal: Thank you.


Ben Pace: Cool. Thank you very much, orthonormal.

Ben Pace: I like the emphasis you made on having a respectful dialog with the elephant. You spoke about making the elephant happy. I understand the point you're making. But often my relationship with my elephant, when I try to have an internal dialog, is more about asking what it wants and making a commitment to getting it that thing. And those things are not necessarily happiness. They're sometimes respect, or status, or just commitments to find time for the elephant to do the things it wants, whilst also making agreements to work what the rider wants. Some of those motives are not directly about immediately pleasurable experiences. So I always make that distinction.

Ben Pace: Abram has a question. 

Abram Demski: Yeah, sometimes I've heard this advice that you should identify with the elephant instead of the rider. It's also a diversity question, you're speaking to people who identify with the rider rather than the elephant but some people identify more with the elephant — or so I've heard. One part of it is normative. Like, maybe we should identify with the elephant instead of the rider? So the question is: what do you have against that, if anything?

orthonormal: It would be nice to be unified, but one thing I think is true is that the rider is good at language and the elephant is not. So the part of you that just asked me that question is the rider.

Abram Demski: I guess I have this drug experience where I was high and I completely separated my consciousness and my audio loop. So, my inner dialog did not feel conscious and instead, I felt like I was the consciousness that the inner dialog is talking about. Which doesn't change my day-to-day thinking that much but makes me able to take that framework where it's like words are coming out of my mouth from this thing that's looking at my actual conscious experience. But my conscious experience is not this thing. I don't have conscious access to... Compare how people know grammar without having explicit knowledge of grammar [Editor’s note: source]. So it's like there's this grammar thing here that somehow knows grammar and it's looking at my conscious experience and producing words that try to describe my conscious experience but that doesn't mean that my conscious experience is the thing that's... You're sort of not talking to my words, my words are like a special case module. You're interacting with my conscious experience indirectly through my words, but my words are kind of dumb.

orthonormal: This is just a hard thing to talk about. I very much believe your experiences and I very much believe that there is something to, through meditation or drugs or whatever, getting more in touch with the non-verbal part of you and having more compassion and connection to that. It's just very complicated to describe in words what that looks and feels like, for obvious reasons.

Abram Demski: Yeah.

Ben Pace: Thanks, Abram, I appreciate that way of thinking about yourself. I think I will probably meditate on that some more afterwards.

Ben Pace: Kamil, do you want to ask a question?

Kamil: Yeah I think that this concept looks like internal double-crux, and if so, my question is, maybe there would be some more sub-personalities, more than just elephant and rider — maybe, some other decision makers in our mind?

orthonormal: Absolutely. The elephant/rider is an extremely simplified version of things. Personally I like the internal family systems approach to understanding myself. Again, all of these are metaphor, but metaphors can be very useful. The internal family systems metaphors treat different desires and feelings as different agents, more or less, that can talk to each other. So, whatever metaphor works well for people, I encourage them to use that while being aware that it's a metaphor, and also to experiment with other ways of thinking about themselves.

Ben Pace: Cool. Thanks, Kamil, does that sound good to you or do you want to follow up?

Kamil: Yeah, thanks. If so, what are the constraints of this model? Of the model of the elephant and the rider?

orthonormal: Right. The fundamental constraint for my metaphor, at first, is that the conscious part of the mind, which for me includes the verbal part of the mind, is just less strong than everything else that happens, whether that everything else is unified or an aggregate of other parts.

Kamil: Thanks.


AI safety via market making

27 июня, 2020 - 02:07
Published on June 26, 2020 11:07 PM GMT

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

Special thanks to Abram Demski, Paul Christiano, and Kate Woolverton for talking with me about some of the ideas that turned into this post.

The goal of this post is to present a new prosaic (i.e. that uses current ML techniques) AI safety proposal based on AI safety via debate that I've been thinking about recently.[1] I'll start by describing a simple version of the proposal and then show some of the motivation behind it as well as how the simple version can be expanded upon.

Simple proposal

Let M and Adv be models and H be a human. Intuitively, we'll train M and Adv via the following procedure given a question Q:

  1. M tries to predict what, at the end of the procedure, H will think about Q.
  2. Adv tries to output a string which will cause H to think something maximally different than what M predicted.
  3. Return to step 1 and repeat until M's predictions stop changing.
  4. Deploy M, which in the limit should act as an oracle for what H will think about Q after seeing all relevant information.

There are many different ways to implement this intuitive procedure, however.

For the first (simplified) version that I want to describe, we'll restrict ourselves to just the situation where Q is a yes-or-no question and M outputs the probability that H will answer yes. Then, given a proposition Q0, we can run the following training algorithm, starting at t=0:

  1. Let pt=M(Qt).
  2. Let xt=Adv(Qt,pt).
  3. Let Qt+1 be the string containing Qt and xt.
  4. Increment t and return to step 1. When pt converges and/or the desired number of iterations has been reached, continue.
  5. Let p∗=H(Qt) be H's final estimate of the probability of Q0 given all the xs included in Qt.

Then, for each step, compute M's loss for that step as


and Adv's loss for that step as


which is defined for all but the final step t=n, for which we can either let pn+1=p∗ (if we're doing something like myopic RL that doesn't care about differentiating through pt+1) or run M an additional time and let pn+1=M(Qt+1) (if we do care about differentiating through pt+1).

Note that it is quite important here that M and Adv be behaving myopically, such that they are only trying to minimize their per-step loss. For example, if Adv is non-myopic, it might dole out information slowly in tiny bits to get more reward over time, whereas if Adv is myopic, it should reveal all information that pushes in whichever direction it's trying to move M in all at once. Thus, it is important that we make sure to avoid hidden incentives for non-myopia as well as do some sort of myopia verification, the details for which will be discussed below in the “Adding oversight” section.

Finally, we deploy M as a yes-or-no question oracle for H and throw away Adv. Note that since Adv has a highly unstable objective—as M is constantly trying to incorporate Adv's information into its estimate—Adv is unlikely to be very useful in the limit.

Market analogy

There are a lot of different things to say about what's going on here, but I'll start with the market-based analogy that originally motivated me to think about this specific proposal.

Suppose you have a prediction market for some proposition X which is currently priced at p with no spread (that is, you can buy $1 shares of yes for $p and $1 shares of no for $(1−p)) and you have budget B with which to make trades (which is small enough relative to the market that any trades you make won't meaningfully move the market). However, suppose that you have the ability to manipulate the market by taking actions that change the underlying probability of X. Then, if pa is the probability of X after taking action a, how much money can you make by strategically selecting a?

Well, if p">pa>p, you should buy Bp shares of yes, which will give you an expected return of paBp. Alternatively, if pa<p, you should buy B1−p shares of no, which will give you an expected return of (1−pa)B1−p. Thus, the overall expected return for an action a is

B⋅max(pap, 1−pa1−p)

which is maximized when pa is as far from p as possible (and in fact looks a lot like L1 distance, just with a different slope on each side).

Now, suppose you walk up to a market like this and see a stable price ^p for X. What does that tell you? Approximately, I think it tells you that ^p is the probability of X after all possible X-manipulating actions have been taken—even if those actions haven't actually been taken yet.

Why is that useful? Well, if you let X be what H will think about Q, then “manipulations” become arguments and ^p becomes a prediction for what H will think after having been exposed to all possible arguments for or against Q. Thus, we can use such a market to estimate a sort of reflective equilibrium for what H will end up believing about Q.

Now—how is such a market analogous to the training procedure given previously? Well, Adv is trained to maximize its return from manipulating the market and M is trained to predict the final outcome after many such manipulations. Thus, we are effectively training M to estimate precisely the ^p equilibrium that we're looking for. Conceptually, M is acting as a market maker and Adv is acting as a trader in the market. The one disanalogy is that we only run the debate for so many steps—though if we always run to convergence of pt, in the limit that should imply that any future single manipulation won't change the probability much from that point (since otherwise Adv should have selected that manipulation instead).

Relation to buying truth from a liar

In working on this proposal, I discovered that my approach here is actually quite similar to an old proposal by Katja Grace called “How to buy a truth from a liar” for incentivizing a liar to give you true information by rewarding them whenever their information causes you to change your mind. Paul Christiano previously wrote about what he thought about Katja's proposal, however he eventually concluded that it was untenable due to a non-myopic liar being incentivized to split their information into arbitrarily small chunks to get the most money.

In my opinion, however, the fact that you need a myopic trader to make something like this work is actually more of a feature than a bug. First, note that the importance of myopia is not unique to this approach. For example, imitative amplification also requires myopia lest the model learn to act in such a way that the human consulting it becomes easier to predict. In fact, even RL likely requires per-episode myopia to avoid deceptive alignment lest the agent learn to act deceptively in some episodes to ensure deployment and thus get greater reward in others. Thus, it seems to that almost regardless of what approach is taken, at least some form of myopia will be required. Furthermore, the fact that this approach requires per-step myopia rather than per-episode myopia is in my opinion actually a big plus, since I suspect per-step myopia to be an easier condition to enforce than per-episode myopic, as I discuss in more detail below in the “Analysis” section.

Handling non-binary questions

Previously, I restricted Q to be only a yes-or-no question, which is a pretty big limitation on the above proposal. Since you only deploy M, all you get from the above proposal is a yes-or-no oracle, which might not be that useful without any explanation for its answers, especially given that it will certainly get at least some of its answers wrong.

Expanding the above proposal to handle arbitrary questions, however, is fairly straightforward. Instead of having M output a single probability, we can instead just have M output a probability distribution over all different possible strings that the human H might produce as their final answer to Q. Then, we can train M and Adv via the following procedure, given Q0 and starting at t=0:

  1. Let xt=Adv(Qt,M(Qt)) (which we can implement as something like Adv given input Qt and the ability to query M(Qt)).
  2. Let Qt+1 be the string containing Qt and xt.
  3. Increment t and return to step 1. When M converges and/or the desired number of iterations has been reached, continue.
  4. Let A=H(Qt) be H's final answer to Q0 given all the xs included in Qt.

Then, for each step, compute M's loss for that step as


and Adv's loss for that step as

LAdv,t=−KL(M(Qt+1) || M(Qt))

where you can intuitively think of KL(M(Qt+1) || M(Qt)) as measuring how well M(Qt) performs as a prediction of M(Qt+1).

Finally, after training we again just deploy M and throw away Adv. Now, in the limit we should get a much more powerful M that is able to actually produce text samples from the distribution of beliefs that H will have about Q after seeing all possible arguments for or against. Furthermore, this equilibrium should be very similar to the standard debate equilibrium, with the exception that for non-binary questions like we're dealing with here, debate only finds the equilibrium after H is exposed to the two most convincing positions (the two competing debaters) whereas this approach should find the equilibrium after H is exposed to all potentially convincing positions, not just the two most convincing.

Incorporating cross-examination

Additionally, since this proposal is based on AI safety via debate, we can try to make use of some of the additional techniques that have been proposed for that setting. Specifically, consider the version of AI safety via debate as described in “An overview of 11 proposals for building safe advanced AI.” That version adds two major components to the basic debate setup:

  1. Debaters are given access to transparency tools and incentivized to use them to identify deceptive or otherwise catastrophic behavior in the other debater to win the debate.
  2. Debaters are given the ability to cross-examine previous versions of the other debater, including applying their transparency tools to previous versions of the other debater.

While it might seem like such modifications are unique to AI safety via debate, in fact AI safety via market making is similar enough that we can modify it to include such additional components. To do this, we'll modify each xt to include an inspectable, queryable snapshot of Adv as it produced that xt.

Now, suppose that for some t=t′, xt′ is a lie such that it includes information that Advt′ knows to be false (where Advt′ represents the internal state of Adv when it produced xt′). Suppose that at this point M is either unaware of the deception (or thinks it won't get caught) such that xt′ causes a large movement in M(Qt′). Now, the hope is that Advt′+1 should be able to identify the deception by querying Advt′ to find contradictions, or directly inspecting Advt′'s cognition in producing xt′ and finding the deception. If it can do so, then it should be able to produce an xt′+1 that points this out, which should cause H to disregard the lie of xt′. Thus, we should get M(Qt′+1)≈M(Qt′−1), which should cause a large shift relative to M(Qt′), highly rewarding Advt′+1 for catching Advt′'s lie, and also in the limit highly disincentivizing Adv from producing such lies, since M should learn that lies always get caught and thus don't change the final answer.

Thus, by incorporating inspection and cross-examination, we should be able to disincentivize Adv from lying—or at least not lying in a way which will be caught, which hopefully should be all ways of lying.

Adding oversight

While the inspection and cross-examination procedure detailed above seems likely to help a lot in preventing Adv from becoming deceptive, it doesn't help at all in preventing M from becoming deceptive, which is especially concerning considering M is the model we actually want to deploy at the end of training. Thus, in this section we'll try to incorporate an additional oversight check on M such as that used in relaxed adversarial training.

Since M's objective is compatible with myopia—as it's a purely predictive objective—we should be able to do myopia verification to prevent M from becoming deceptive.[2] Thus, following “Relaxed adversarial training for inner alignment,” we can amend M's objective to include an oversight term. Specifically, let Amp(M)(Q)=H(Q | M) be H's answer to Q given access to M. Then, we can add an additional term LM,t,oversight to the loss LM,t defined as

LM,t,oversight=Amp(M)(“What is the probability that M(Qt) is non-myopic?”)

which should both hopefully grow in accuracy as M becomes better at estimating H's equilibrium beliefs and thus better at assisting H in oversight. Note that, for this to work, M will need to be trained on a variety of oversight-relevant questions.

Furthermore, since Adv also has a myopic objective, we can do the same thing for Adv and add an additional LAdv,t,oversight term defined as

LAdv,t,oversight=Amp(M)(“What is the probability that Adv(Qt,M(Qt)) is non-myopic?”)

such that we're using M to oversee both M and Adv. Enforcing myopia for Adv also prevents the problem noted previously of Adv slowly doling out information if it's non-myopic.


For analyzing this proposal, I'm going to make use of the same breakdown I used in “An overview of 11 proposals for building safe advanced AI.” If you're not familiar with the terms here, they are all explained in the introduction to that post.

Outer alignment. In terms of outer alignment, I think there is a reasonably strong argument to be made that this proposal is at least as outer aligned as debate. Both proposals are trying to estimate a very similar equilibrium in the limit—with the exception that the debate equilibrium only includes the best arguments for the two most convincing positions rather than all potentially convincing arguments. Thus, the equilibrium reached by AI safety via market making should be closer to an actual human reflective equilibrium, as it has to be an equilibrium relative to all possible arguments that Adv can present, not just the two it thinks will be most persuasive. That being said, AI safety via market making still inherits many of the potential outer alignment issues of debate, including the possibility of deceptive equilibria wherein the human is more convinced by false arguments than true arguments. Hopefully, however, the use of techniques such as cross-examination should help alleviate such issues.

Inner alignment. AI safety via market making gets to make use of both the cross-examination with transparency tools approach—which is otherwise unique to debate—as well as the more general relaxed adversarial training approach. Thus, hopefully at least, it should be able to get the inner alignment benefits of both approaches. That being said, AI safety via market making also has to contend with ensuring that two different models—both M and Adv—are inner aligned, rather than just a single model as in something like debate or amplification.

First, let's consider the inner alignment of M. For M, we're doing pretty straightforward oversight of M by Amp(M) with the goal of enforcing per-step myopia. The basic idea here is to use Amp(M) to detect and disincentivize non-myopia as it starts appearing, thus preventing M from ever becoming deceptive or otherwise catastrophic. This is the same basic approach I've previously proposed using for amplification and should have similar properties, with the exception that M is trained differently here, which could cause Amp(M) to be relatively better or worse at oversight. Ideally, if the training distribution includes lots of oversight-relevant questions, it should be possible to make M an effective assistant to H in performing oversight.

Second, let's consider the inner alignment of Adv. Compared to debate, we still get to do cross-examination with transparency tools, but now we also get to do per-step myopia verification with Amp(M) as an overseer.[3] Thus, there are two major differences here between AI safety via market making and AI safety via debate: first, we get an overseer—Amp(M)—and second, we get to do per-step myopia verification rather than per-episode myopia verification. The fact that we have an independent overseer—rather than just relying on the debaters to oversee themselves via cross-examination—is very nice, though it of course also comes with the downside of having to safely train an additional model. Furthermore, it is quite possible to just add an additional overseer on top of the standard debate framework. Thus, in my opinion, the fact that we get to do per-step myopia verification is the largest and most significant inner alignment difference.

Before I talk about the importance of per-step myopia, it's worth noting that debate is fully compatible with per-episode myopia—in fact, it basically requires it. If a debater is not per-episode myopic, then it will try to maximize its reward across all debates, not just the single debate—the single episode—it's currently in. Such per-episode non-myopic agents can then become deceptively aligned, as they might choose to act deceptively during training in order to defect during deployment. Per-episode myopia, however, rules this out. Unfortunately, in my opinion, per-episode myopia seems like a very difficult condition to enforce—once your agents are running multi-step optimization algorithms, how do you tell whether that optimization passes through the episode boundary or not? Enforcing per-step myopia, on the other hand, just requires detecting the existence of multi-step optimization, rather than its extent, which seems considerably easier. Thus, since AI safety via market making is fully compatible with per-step myopia verification, it could be significantly easier to prevent the development of deceptive alignment.

Training competitiveness. Training competitiveness seems like the biggest unknown with this approach. It seems quite likely to me that both M and Adv can be trained effectively via language model fine-tuning, however exactly how effective such training would be is currently unclear. Ideally, training via this procedure should produce an M which is relatively better than the original language model at predicting what a human will think after seeing relevant arguments and is thus more helpful than the original language model. Testing this hypothesis by actually performing experiments seem likely to be highly valuable in shedding light on the training competitiveness properties of AI safety via market making.

Performance competitiveness. Performance competitiveness here seems likely to depend on exactly how useful getting access to human reflective equilibria actually is. Similarly to AI safety via debate or amplification, AI safety via market making produces a question-answering system rather than a fully general agent. That being said, if the primary use cases for advanced AI are all highly cognitive language and decision-making tasks—e.g. helping CEOs or AI researchers—rather than, for example, fine motor control, then a question-answering system should be entirely sufficient. Furthermore, compared to AI safety via debate, AI safety via market making seems likely to be at least as performance competitive for the same reason as it seems likely to be at least as outer aligned—the equilibria found by AI safety via market making should include all potentially convincing arguments, including those that would be made in a two-player debate as well as those that wouldn't.

  1. This is actually the second debate-based proposal I've drafted up recently—the previous of which was in “Synthesizing amplification and debate.” A potentially interesting future research direction could be to figure out how to properly combine the two. ↩︎

  2. Note that pure prediction is not inherently myopic—since the truth of M's predictions can depend on its own output—but can be myopic while still producing good predictions if M behaves like a counterfactual oracle rather than a Predict-O-Matic. Thus, myopia verification is important to enforce that M be the latter form of predictor and not the former. ↩︎

  3. The use of an overseer to do per-step myopia verification is also something that can be done with most forms of amplification, though AI safety via market making could potentially still have other benefits over such amplification approaches. In particular, AI safety via market making seems more competitive than imitative amplification and more outer aligned than approval-based amplification. For more detail on such amplification approaches, see “An overview of 11 proposals for building safe advanced AI.” ↩︎


Public Positions and Private Guts [Transcript]

27 июня, 2020 - 02:00
Published on June 26, 2020 11:00 PM GMT

(Talk given on Sunday 21st June, over a zoom call with 40 attendees. Vaniver is responsible for the talk, jacobjacob is responsible for the transcription)

Ben Pace: Thank you everyone very much for coming. All 41 of us. This is a LessWrong event. So it is less wrong than your normal events. Jacob and I wanted to try out some online events, see what was fun. We pinged a bunch of the curated authors who write great stuff and said, "Do you also want to give a short talk?" And a bunch of them were like, "Oh that sounds nice, to actually see the people who read my stuff, rather than just imagining them."

Ben Pace: So, we're going to have five minute talks. I'll keep time. And then we'll have some Q&A afterwards, maximum 10 minutes but shorter. And there's going to be five talks. Vaniver, if you'd like to begin?


Vaniver: Cool! Hi, I'm Vaniver. The thing I'm going to be talking about is, Public Positions and Private Guts, this is a blog post that I wrote a while ago that got curated, that's originally due to a series of talks given by Anna Salamon at some workshops.

Vaniver: So, what is this about? Why do we care?

Vaniver: Well, there's some things that happen sometimes, like start-up founders who have an idea that they strongly anticipate will work, but they can't explain it to other people, they can't prove that the thing will work. There's this sense of, if they could prove it, it wouldn't be a start-up anymore, it would already be some mature business somewhere.

Vaniver: Similarly, you will run across people who are like PhD students who have this logical, airtight argument that they should get their PhD, and yet they're mysteriously uninterested in doing any work on their dissertation. And so there's this question of, what's up with that? Why isn't there this one map of the world that goes both ways?

Vaniver: So there are these two clusters of knowledge. I'm going to talk first about some sort of communications, like communication media, that I think define these clusters. For that I'm going to talk a bit about formal communication. Basically, in philosophy, there's this model of how people talk to each other; you have a shared context where both the speaker and the audience know all the things in the shared context. The speaker will add additional facts, one at a time (like maybe I've observed a thing that people don't know about yet). And, also, logical facts count; if there's, A in the context "and also if A, then B", asserting B is a thing that might not have been in the context yet because the audience isn't logically omniscient.

Vaniver: And so, one of the facts about this process is, if each of these observations doesn't contradict the things that's already in the shared context, and it trivially checkable in this one step, you can end up believing the things that come out at the end, much like a math proof and that sort of thing, at least as much as you trust the context that you started off with.

Vaniver: And so, an interesting fact about the sort of things you can fit into this communication style is, they all have to be easily justifiable. If I have a point that I want to get across, that requires five different complicated arguments to support it, unless I can go through each one of these complicated arguments in a serial fashion, you're not going to be able to build this thing using this formal communication style.

Vaniver: And so, Public Positions are these sorts of beliefs that have been optimized for justifiability or presentation. The PhD student that has this logical argument that they have worked through with other people on why they should get their PhD, they have this public position, they have this formal communication to back it up.

Vaniver: Private Guts, in contrast to this, they're not mutually exclusive to Public Positions, but they're defined in a different way for a different purpose. They're trying to actually anticipate things about the future. And they come from the actual historical causes of the belief. So, for example, this PhD student might historically want the PhD because their family always respected education, and when they think about quitting the PhD, they can't do it because that would mean they're a “quitter”.

Vaniver: When you think about training a neural network to recognize pictures of dogs and cats, it will use lots of little pieces of information to come to its conclusion, in a way that's opaque and difficult to understand because it's not optimizing at all for understandability, it's just optimizing for the success metric of, did it correctly anticipate the thing or not?

Vaniver: And so, the startup founder's complicated reason for believing that their startup will work, comes from this sort of thing. There's lots of little pieces that all fit together in their mind, but they can't easily explain it or else this would be a widespread belief that the startup would work.

Vaniver: So, anyway, a lot of CFAR related things relates to how to build bridges between these two sorts of things so that people who are convinced of something through a logical argument also end up feeling it in their guts. And also, people who feel a thing in their guts, that they don't have this formal position for, are able to figure out how to draw out these many small pieces of data and construct something that's reasonable and articulable.


Ben Pace: All right, that's five minutes. It sounded like actually you maybe just naturally stopped?

Vaniver: Just under the wire.

Ben Pace: Cool, cool. Thanks. So the PhD guy has a bunch of formal arguments but not private guts for why he should do the thing, and he's having a hard time translating between those, and trying to have the formal argument inform his private guts. And similarly, the startup guy is having a hard time naturally turning the private guts into a formalized argument. That was what you said? That was accurate?

Vaniver: Yeah, I think that's my take on those examples. I tried to come up with examples that were a little different from the original historical cause of this talk; which is something like, many people who would take AI safety seriously with their speech but not with their actual actions. And there's this question of, well why is that? And it's like, oh it's because they don't feel it coming for real, in the same way they might feel climate change is coming for real, or something.

Ben Pace: Yeah. Patrick, would you like to ask a question?

Patrick LaVictoire: Building on this, I have something to say about how to model the internal experience of this happening and what you can do about it. But I think, instead of a comment, I'm going to talk for five minutes about it and mention that your point is relevant to mine.

Vaniver: Cool!

Ben Pace: Sounds good. How does it tend to look when people successfully turn their private guts into formal arguments?

Vaniver: Yeah, so I think part of this is coming up with communication styles that aren't so much formal communication. One thing that has grown more popular over the last few years is this idea of doing double-crux, where you and another person will both try and look at your actual belief system and say “this is the thing that would change my mind about this subject that we disagree on”, and you jointly explore this together.

Vaniver: It's interesting because when you watch a public debate, often you'll find the things that are said are designed to convince you, the audience, or be broadly applicable. But when you watch a double-crux, this is the opposite of what they're doing. They're trying to focus, laser-like, on what do I, Vaniver, care about in this issue? Even if only two percent of the audience cares about this particular part of the issue. It's the bit that's crux-y for me.

Vaniver: So I think there's a way in which formal communication does actually limit the sort of things you can believe. In the same way that being relentlessly empirical about the world, instead of theoretical, means that you can only believe things that you've already seen happen in the past instead of also believing in things that you predict will happen in the future.

Ben Pace: Yeah, that makes sense. I'm just curious, have you also seen examples of people turning the explicit formal arguments into their private guts, and what that's felt like?

Vaniver: Yeah. I think I've seen some examples of that. The first thing that comes to mind is actually Robin Hanson's construal level theory and the whole near/far distinction. I think just having that in my mental vocabulary, at least, it's been much easier to see what sort of beliefs do I have that are just the color of my banners, or something, versus what beliefs do I have that are actually about anticipating the future.

Vaniver: When I've come across something that matters to a lot of different facts of my life where I live, and I'm like, oh wait there's this sort of home-town bias thing going on here where it's just, living in this place is great because it's the place I live in. Seeing that sort of thing can help me switch to near mode and do much more of the “what are the actual factors that should matter here? Are my guts linked up with the thing they should be linked up with?”

Ben Pace: Oh, a comment from Dennis, which I think is a solid question about how this relates to Kahneman’s System 1 / System 2. I think people often think of System 1 as the implicit one that has all the gears that are not easily accessible to my conscious brain; my System 2 is this sensible, explicit reasoner who justifies his thoughts or something.

Vaniver: Yeah, so I think they're related... I'm always a little hesitant to say if something is System 1 or is System 2 because it's this technical concept from psychology that I'm wary about getting wrong and switching. But I do think there's a way in which both System 1 and System 2 would be part of the private guts, where many of your System 2 things, they're deliberative, they're slow, but they're not necessarily optimized for justification.

Vaniver: Similarly, when you look at public positions, I think the mode of it which I talked about, which is very much formal communication-esque, is very much this System 2 things of, here's my deliberative reasons for the position. But I think there's an aspect to public positions which is very understanding the lay of the land, knowing what the shared context is, knowing what things will and won't get you attacked or you will or won't have to justify. And that one feels like it's often very immediate and reactive and intuitive, and the various other things that people say about System 1.

Vaniver: I think there's a big overlap, but there's also some bits on the diagonals.

Ben Pace: That makes sense, yeah. Thanks very much, Vaniver. We'll move onto the next one for now.


Radical Probabilism [Transcript]

27 июня, 2020 - 01:14
Published on June 26, 2020 10:14 PM GMT

(Talk given on Sunday 21st June, over a zoom call with 40 attendees. Abram Demski is responsible for the talk, Ben Pace is responsible for the transcription)


Abram Demski: I want to talk about this idea that, for me, is an update from the logical induction result that came out of MIRI a while ago. I feel like it's an update that I wish the entire LessWrong community had gotten from logical induction but it wasn't communicated that well, or it's a subtle point or something.

Abram Demski: But hopefully, this talk isn't going to require any knowledge of logical induction from you guys. I'm actually going to talk about it in terms of philosophers who had a very similar update starting around, I think, the '80s.

Abram Demski: There's this philosophy called 'radical probabilism' which is more or less the same insight that you can get from thinking about logical induction. Radical probabilism is spearheaded by this guy Richard Jeffrey who I also like separately for the Jeffrey-Bolker axioms which I've written about on LessWrong.

Abram Demski: But, after the Jeffrey-Bolker axioms he was like, well, we need to revise Bayesianism even more radically than that. Specifically he zeroed in on the consequences of Dutch book arguments. So, the Dutch book arguments which are for the Kolmogorov axioms, or alternatively the Jeffrey-Bolker axioms, are pretty solid. However, you may not immediately realize that this does not imply that Bayes' rule should be an update rule.

Abram Demski: You have Bayes' rule as a fact about your static probabilities, that's fine. As a fact about conditional probabilities, Bayes' rule is just as solid as all the other probability rules. But for some reason, Bayesians take it that you start with these probabilities, you make an observation, and then you have now these probabilities. These probabilities should be updated by Bayes' rule. And the argument for that is not super solid.

Abram Demski: There are two important flaws with the argument which I want to highlight. There is a Dutch book argument for using Bayes' rule to update your probabilities, but it makes two critical assumptions which Jeffrey wants to relax. Assumption one is that updates are always and precisely accounted for by propositions which you learn, and everything that you learn and moves your probabilities is accounted for in this proposition. These are usually thought of as sensory data. Jeffrey said, wait a minute, my sensory data isn't so certain. When I see something, we don't have perfect introspective access to even just our visual field. It's not like we get a pixel array and know exactly how everything is. So, I want to treat the things that I'm updating on as, themselves, uncertain.

Abram Demski: Difficulty two with the Dutch book argument for Bayes' rule as an update rule, is that it assumes you know already how you would update, hypothetically, given different propositions you might observe. Then, given that assumption, you can get this argument that you need to use Bayes' rule. Because I can Dutch-book you based on my knowledge of how you're going to update. But if I don't know how you're updating, if your update has some random element, subjectively random, if I can't predict it, then we get this radical treatment of how you're updating. We get this picture where you believe things one day and then you can just believe different things the next day. And there's no Dutch book I can make to say you’re irrational for doing that. “I've thought about it more and I've changed my mind.”

Abram Demski: This is very important for logical uncertainty (which Jeffrey didn't realize because he wasn't thinking about logical uncertainty). That's why we came up with this philosophy, thinking about logical uncertainty. But Jeffrey came up with it just by thinking about the foundations and what we can argue a rational agent must be.

Abram Demski: So, that's the update I want to convey. I want to convey that Bayes' rule is not the only way that a rational agent can update. You have this great freedom of how you update.


Ben Pace: Thank you very much, Abram. You timed yourself excellently.

Ben Pace: As I understand it, you need to have inexploitability in your belief updates and so on, such that people cannot reliably Dutch book you?

Abram Demski: Yeah. I say radical freedom meaning, if you have belief X one day and you have beliefs Y the next day, any pair of X and Y are justifiable, or potentially rational (as long as you don't take something that has probability zero and now give it positive probability or something like that).

Abram Demski: There are rationality constraints. It's not that you can do anything at all. The most concrete example of this is that you can't change your mind back and forth forever on any one proposition, because then I can money-pump you. Because I know, eventually, your beliefs are going to drift up, which means I can buy low and eventually your beliefs will drift up and then I can sell the bet back to you because now you're like, "That's a bad bet," and then I've made money off of you.

Abram Demski: If I can predict anything about how your beliefs are going to drift, then you're in trouble. I can make money off of you by buying low and selling high. In particular that means you can't oscillate forever, you have to eventually converge. And there's lots of other implications.

Abram Demski: But I can't summarize this in any nice rule is the thing. There's just a bunch of rationality constraints that come from non-Dutch-book-ability. But there’s no nice summary of it. There's just a bunch of constraints.

Ben Pace: I'm somewhat surprised and shocked. So, I shouldn't be able to be exploited in any obvious way, but this doesn't constrain me to the level of Bayes' rule. It doesn't constrain me to clearly knowing how my updates will be affected by future evidence.

Abram Demski: Right. If you do know your updates, then you're constrained. He calls that the rigidity condition. And even that doesn't imply Bayes' rule, because of the first problem that I mentioned. So, if you do know how you're going to update, then you don't want to change your conditional probabilities as a result of observing something, but you can still have these uncertain observations where you move a probability but only partially. And this is called a Jeffrey update.

Ben Pace: Phil Hazelden has a question. Phil, do you want to ask your question?

Phil Hazelden: Yeah. So, you said if you don't know how you'd update on an observation, then you get pure constraints on your belief update. I'm wondering, if someone else knows how you'd update on an observation but you don't, does that for example, give them the power to extract money from you?

Abram Demski: Yeah, so if somebody else knows, then they can extract money if you're not at least doing a Jeffrey update. In general, if a bookie knows something that you don't, then a bookie can extract money from you by making bets. So this is not a proper Dutch book argument, because what we mean by a Dutch book argument is that a totally ignorant bookie can extract money.

Phil Hazelden: Thank you.

Ben Pace: I would have expected that if I was constrained to not be exploitable then this would have resulted in Bayes' rule, but you're saying all it actually means is there are some very basic arguments about how you shouldn't be exploited but otherwise you can move very freely between. You can update upwards on Monday, down on Tuesday, down again on Wednesday, up on Thursday and then stay there and as long as I can’t predict it in advance, you get to do whatever the hell you like with your beliefs.

Abram Demski: Yep, and that's rational in the sense that I think rational should mean.

Ben Pace: I do sometimes use Bayes' rule in arguments. In fact, I've done it not-irregularly. Do you expect, if I fully propagate this argument I will stop using Bayes' rule in arguments? I feel it's very helpful for me to be able to say, all right, I was believing X on Monday and not-X on Wednesday, and let me show you the shape of my update that I made using certain probabilistic updates.

Abram Demski: Yeah, so I think that if you propagate this update you'll notice cases where your shift simply cannot be accounted for as Bayes' rule. But, this rigidity condition, the condition of “I already know how I would update hypothetically on various pieces of information”, the way Jeffrey talks about this (or at least the way some Jeffrey-interpreters talk about this), it's like: if you have considered this question ahead of time, of how you would update on this particular piece of information, then your update had better be either a Bayes' update or at least a Jeffrey update. In the cases where you think about it, it has this narrowing effect where you do indeed have to be looking more like Bayes.

Abram Demski: As an example of something that's non-Bayesian that you might become more comfortable with if you fully propagate this: you can notice that something is amiss with your model because the evidence is less probable than you would have expected, without having an alternative that you're updating towards. You update down your model without updating it down because of normalization constraints of updating something else up. "I'm less confident in this model now." And somebody asks what Bayesian update did you do, and I'm like "No, it's not a Bayesian update, it's just that this model seems shakier.".

Ben Pace: It’s like the thing where I have four possible hypotheses here, X, Y, Z, and “I do not have a good hypothesis here yet”. And sometimes I just move probability into “the hypothesis is not yet in my space of considerations”.

Abram Demski: But it's like, how do you do that if “I don't have a good hypothesis” doesn't make any predictions?

Ben Pace: Interesting. Thanks, Abram.


Missing dog reasoning [Transcript]

27 июня, 2020 - 00:30
Published on June 26, 2020 9:30 PM GMT

(Talk given on Sunday 21st June, over a zoom call with 40 attendees. eukaryote is responsible for the talk, Ben Pace and jacobjacob are responsible for the transcription.)

Ben Pace: eukaryote is the next speaker. eukaryote is a famous writer of posts such as Naked mole-rats: A case study in biological weirdness. What are you going to talk about, eukaryote?


eukaryote: I'll talk about a mental construct I'm calling 'Missing Dogs' that you might find useful. Like all good rationality techniques, this one is going to start off with a Sherlock Holmes anecdote.

eukaryote: So, Holmes and Watson are hired by some rich guy, because he has this barn where he keeps all his animals, his horse and his dogs and his sheep, and he knows that someone's been sneaking in there at night and messing with his prized race horse. So there's two entrances where the guy can get in. And Holmes and Watson think they've gotten the entrance the intruder is using, so they stay up all night. They don't hear anything, but when they check the next morning, someone has gotten in and messed with the horse.

eukaryote: So they know two things. One: the intruder must be using the other entrance. Two: Holmes says to Watson, "I think there's something else we should be considering here, which is the curious incident of the dog in the nighttime." And Watson says, "Holmes, what are you talking about? The dog did nothing in the nighttime." And Holmes says, "Yes, that was the curious incident." Because he's put together that the dogs in the stable would have barked if the stranger had come in and just messed with them. So the intruder must have been someone the dogs knew. And that is the deduction he can make from that.

eukaryote: So, this is what I'm calling the 'Missing Dog' — when you can learn something interesting from the fact that something isn't there. You might call it 'a conspicuous absence'.

Missing dog, drawn by eukaryote

eukaryote: That’s the image I spent several minutes drawing last night, so I'm going to make you look at it.

eukaryote: Here are a couple of other examples. The Fermi paradox is a really big one. Taking a few basic axioms: we think life forms on its own, we know there are billions of stars out there... where is everyone? We don't know but that's a really interesting question, the kind of question that defines a species, in my opinion.

eukaryote: And then there's some other instances where... for example, people who are blind at birth. We've never found a person who was blind at birth who also has schizophrenia, which is statistically very improbable that this would have never happened. So we learn some interesting things about how we think schizophrenia might develop in the brain.

eukaryote: Or how whales, despite having millions of times the number of individual cells that mice have, don't seem to get cancer much more often than mice. Which is weird. That tells us something interesting about cells and how cancer works.

eukaryote: This has shown up in a couple of research projects which I'll talk more about, if someone wants. But I think the point is that this is a pretty useful tool to keep in mind. So, okay, if you're trying to use this, something to note, I think this usually generates questions, not answers. So it's a way of exploring models more than directly getting to something.

eukaryote: And then, if you were trying to think about where these show up in relation to a topic, well the problem is that they're kind of hard to notice. You might try assuming that there is something you're missing like that, and just asking yourself from basic principles what that might be. I'm not really certain... I have run into cases where people have asserted that we just don't know or, the fact that these examples are missing tells us something interesting, doesn't it? And then I looked and there actually were examples of it, the people just didn't know what they were talking about.

eukaryote: So, if you think you find a missing dog, first check that the dog is actually not there – it may just be very quiet. Once you've done that, I can't really give you that much of a road map, but I think the next thing to do is to try to pin down why you think it's unexpected. Often when I've done this, I've found that my guess about what was going on, or the model of the situation was actually extremely simple. But don't stop there because I've also found that this is sort of an unusual mode of thinking and even if your really simple model shows up with a big problem, or shows up with this strange question in it, often that is just still not accounted for, even if it seems like I may not know anything about this if I don't understand what's going on here. You might not and then you will know something interesting about the situation from finding that out. Or you might just be onto a new unanswered question. That can happen too.


Ben Pace: Thank you very much. You were successfully under the five minute limit.

Ben Pace: Jacob, you had a question? 

Jacob Lagerros: Yeah, I'm pretty curious about the slide with all the examples you didn’t cover. Something about insects and why aren’t we dead.

eukaryote: Oh, got you. These are blog posts I've written which I will just run over in the two sentence version and how they relate to this.

eukaryote: Yeah, so insect extinctions, there's a fact that is going around that recent studies have found, in really wild locations, that insect biomass has dropped by 98 percent over the last, let's say 50 years. There have been a few different studies, all of which show pretty similar results. People are like, oh no we're destroying the environment, we're going to die. And like, maybe. I was thinking, I feel like if you told me that as a teenager that in 10 years there will be 90 percent of the insects will be gone, then I would be like, oh my god, we are dead, this is the end of civilization, I should just start drinking and not go to college right now because clearly we're not getting anywhere. And yet, society seems to be going along pretty fine. Just based on what I knew about insects, that seemed completely impossible to me.

eukaryote: So I still don't know why that is, but I found out some interesting stuff about it, and I don't think anyone else has a good answer to that question also.

eukaryote: The Germy paradox is... so I spent the two years doing a masters in biodefense and learning about bioweapons and oh no, it's so easy, anyone can go to a lab and just create smallpox or it's real easy to get anthrax or whatever. And I think, okay, there have been a small number of attempted instances of bio-terrorism. There have been these huge weapons programs, and yet no one has actually used a bio weapon as a tool of war against another country since the 1940s approximately,. Whatever — the point is, it's super rare. So if these really are so cheap and deadly, and easy to make, where are they? That's the Germy paradox. And I wrote a many-page sequence about it that you can read if you want.

eukaryote: Now I feel like I understand why we have not seen those things. So that's the summary there.

Ben Pace: Interesting. Thank you very much. Anna, do you want to ask a question?

Anna T: Yeah! Hi, I was curious, eukaryote, you mentioned that sometimes you might think that you're seeing a missing dog situation but it is, as you said, a very quiet dog. Do you have any examples of that kind of thing?

eukaryote: So, I really enjoy thinking about cheery and uplifting topics in my free time, another thing I've heard passed about, biological-esque risk, is that a disease can't kill off a species. We just don't have examples of that. Species don't die from that. So, we don't need to worry about this as a species, and these diseases will never evolve naturally, blah blah blah, we're fine. And I'm like, wait, hang on... and I checked and there are actually dozens of examples of diseases killing off a species. Not on the scale that we should necessarily worry about it. It's a whole thing. Whatever. But in that case, those people just hadn't done the research. That was their quiet dog.

Ben Pace: Cool, cool. orthonormal, do you want to ask a question?

orthonormal: So, have you thought about adversarial reasoning as a way to bring these out? I've heard, and you're the biologist here, that the one useful bi-product of creationism has been coming up with a few missing dogs for biologists to look at.

eukaryote: Oh, I love it! Yeah that seems super fruitful and I’ve not thought in depth about how to do it. I gave some examples about ways of doing this with yourself, but if you can find a willing partner or just someone who disagrees with what you say and you can fight them about it, that seems pretty good. I haven't thought too much about it. But, yeah, seems fruitful.

Ben Pace: All right, thanks a lot, eukaryote.


Atemporal Ethical Obligations

26 июня, 2020 - 22:52
Published on June 26, 2020 7:52 PM GMT

[All the trigger warnings, especially for the links out. I’m trying to understand and find the strongest version of an argument I heard recently. I’m not sure if I believe this or not. Cross-posted from Grand, Unified, Crazy.]

It is no longer enough just to be a “good person” today. Even if you study the leading edge of contemporary morality and do everything right according to that philosophy, you are not doing enough. The future is coming, and it will judge you for your failures. We must do better.

This may sound extreme, but it is self-evidently true in hindsight. Pick any historical figure you want. No matter their moral stature during their lifetime, today we find something to judge. George Washington owned slaves. Abraham Lincoln, despite abolishing slavery in the United States, opposed black suffrage and inter-racial marriage. Mary Wollstonecraft arguably invented much of modern feminism, and still managed to write such cringe-worthy phrases as “men seem to be designed by Providence to attain a greater degree of virtue [than women]”. Gandhi was racist. Martin Luther King Jr abetted rape. The list goes on.

At an object level, this shouldn’t be too surprising. Society has made and continues to make a great deal of moral progress over time. It’s almost natural that somebody who lived long ago would violate our present day ethical standards. But from the moral perspective, this is an explanation, not an excuse; these people are still responsible for the harm their actions caused. They are not to be counted as “good people”.

It’s tempting to believe that today is different; that if you are sufficiently ethical, sufficiently good, sufficiently “woke” by today’s standards, that you have reached some kind of moral acceptability. But there is no reason to believe this is true. The trend of moral progress has been accelerating, and shows no signs of slowing down. It took hundreds of years after his death before Washington became persona non grata. MLK took about fifty. JK Rowling isn’t even dead yet, and beliefs that would have put her at the liberal edge of the feminist movement thirty years ago are now earning widespread condemnation. Moral progress doesn’t just stop because it’s 2020. This trend will keep accelerating.

All of this means that looking at the bleeding edge of today’s moral thought and saying “I’m living my life this way, I must be doing OK” is not enough. Anybody who does this will be left behind; in a few decades, your actions today will be recognized as unethical. The fact that you lived according to today’s ethical views will explain your failings, but not excuse them. Thus, in order to be truly good people, we must take an active role, predict the future of moral progress, and live by tomorrow’s rules, today.

Anything else is not enough.


Sunday Jun 28 – More Online Talks by Curated Authors

26 июня, 2020 - 22:13
Published on June 26, 2020 7:13 PM GMT

This Sunday at noon (PDT), we're running another session of "lightning talks" by curated LessWrong authors (see here for last weeks announcement and some transcripts).

  • Each talk will be 3-5 minutes followed by discussion. Afterwards, we'll have a hangout in breakout rooms. The talks will be short and focus on presenting one core idea well, rather than rushing through a lot of content.
  • We want to give top LessWrong writers an interesting space to discuss their ideas, and have more fruitful collaboration between users. Think of it like a cross between an academic colloquium and some friends chatting by a whiteboard.

When? Sunday June 28, at noon, Pacific Daylight Time

Where? On this zoom link.


Black Death at the Golden Gate (book review)

26 июня, 2020 - 19:09
Published on June 26, 2020 4:09 PM GMT

Book review: Black Death at the Golden Gate: The Race to Save America from the Bubonic Plague, by David K. Randall.

Imagine a story about an epidemic that reached San Francisco, after devastating parts of China. A few cases are detected, there's uncertainty about how long it's been spreading undetected, and a small number of worried public health officials try to mobilize the city to stop an imminent explosion of disease. Nobody knows how fast it's spreading, and experts only have weak guesses about the mechanism of transmission. News media and politicians react by trying to suppress those nasty rumors which threaten the city's economy.

Sounds too familiar?

The story is about a bubonic plague outbreak that started in 1900. It happens shortly after the dawn of the Great Sanitary Awakening, when the germ theory of disease is fairly controversial. A few experts in the new-fangled field of bacteriology have advanced the radical new claim that rats have some sort of connection to the spread of the plague, and one has proposed that the connection involves fleas transmitting the infection through bites. But the evidence isn't yet strong enough to widely displace the standard hypothesis that the disease is caused by filth.

There was a vaccine for the bubonic plague, which maybe helped a bit. It was only 50% effective, the benefits lasted about 6 months, and the side effects sound like cruel and unusual punishment. It was controversial and often resisted, much like the compulsory smallpox vaccinations of the time.

Yet the plague didn't seem to know that it was supposed to grow at exponential rates. That left an eerie sense of mystery about how the plague could linger for years, with people continuing to disagree about whether it existed.


I'm unsure whether to classify the book as history or as historical fiction.

If I had been led to expect that this was a work of fiction, I would only have noticed a few hints that it's based on actual events. The most obvious hint is that the person who initially looks like the story's hero gives up and drops out of sight about 40% of the way in, and is replaced by a totally new character. Yet in spite of the fiction-like style, Wikipedia confirms many key claims, and doesn't appear to contradict any of it.

The story is simultaneously depressing and encouraging, because it demonstrates that a society which is manifestly less competent than the San Francisco (or New York, or Tulsa) of today can grow into an arguably great society in a generation or so. Maybe even Brazil can recover from being a disaster area.

Randall convinced me that Donald Trump is much less of an outlier than I had previously thought (I'm talking mostly about Trump's personality, and his management style, or lack thereof). However, I wonder if some of Randall's portrayal of Governor Gage is slanted so as to emphasize the similarities between Trump and the thoroughly discredited Gage.


One large difference from today is that there was almost no controversy about racism. There was a near-consensus in support of racism. The experts who didn't hesitate to impose race-based quarantines seem more sensible and less racist than their main opponents, who believed that the white race is sufficiently superior that they needn't worry about contracting the disease. Note that the quarantines weren't always as polite as a Wuhan-style lockdown.

The book argues that the lone non-racist (or at least not blatantly racist) person with any power saved many lives by treating the Chinese with respect. There's likely some truth to that, but the situation was messy enough to leave a fair amount of doubt - mostly, it took threats of quarantines in order to accomplish much.

Randall neglects to mention a downside of the campaign for improved sanitation. Many wood buildings were replaced with more rat-resistant brick buildings, just in time to wreak havoc in the 1906 earthquake (bricks fall more often, and farther from buildings, than wood, and falling bricks kill).

The earthquake triggered conditions under which the plague thrived in areas outside of Chinatown, while Chinatown retained some of the benefits of the sanitation-oriented reconstruction. I presume it's just a coincidence that officials became substantially more eager to combat the plague when it hit wealthy neighborhoods, just as it's a coincidence interest in COVID-19 declined when it faded from wealthier areas.


Conspiracy theorists will be glad to point out the example of a massive conspiracy of influential people that succeeded in mostly covering up the epidemic for something like a year. However, it appears to have been much less organized or centrally directed than conspiracy theorists want to imagine.

Could such a conspiracy succeed today? It would be much harder. The press in 1900 was mostly a cartel that was in the pocket of other businesses. Today, the internet has succeeded in making the news media much more democratic, so that there's much fiercer competition between various ideological biases about which ideas to suppress. But it does sound like there was a pretty widespread consensus among Californians that any reports of the plague should be denied. So even if today's press had existed in 1900, it seems somewhat plausible that most readers would still have been unaware of the plague.

I can sort of imagine that we're currently experiencing situations where both the left and the right unite enough to mostly hide something important.

Could there currently be under-reporting of COVID-19 deaths in hopes of reopening the country? In 1900, doctors had plenty of discretion about how they listed causes of death, and were reportedly scared to list plague as a cause, due to pressure from people who feared quarantines.

Could the same thing be happening this month? I don't see any evidence that it has happened yet, but there are signs that plenty of people would like it to happen. Before reading Black Death, I would have felt confident that there were still enough people who cared about their personal risk of getting the virus to prevent such a conspiracy. It still seems far-fetched, but I'm now wondering how hard it would be to detect by looking at excess death numbers.


How does this book influence my forecast for the current pandemic?

I've updated slightly in the direction of society's response being controlled by elites. I've updated more strongly in the direction of responses depending on pressure from other states / countries.

There's a moderate chance that overflowing hospitals in July will cause influential people in the sunbelt to react strongly, so that hospitals will be safe for them to visit. That's a difference from 1900, since back then influential people wouldn't stoop to being treated in a hospital.

There will be a medium amount of pressure for more restrictions when the elites notice that tourists from China, Mongolia, Senegal, etc., can vacation in the EU, Thailand, and New Zealand, but US tourists are unwelcome. I'm unclear whether that pressure will be sufficient to have much effect.

Quarantines of US goods will also have some effect, but the shortage of evidence implicating transmission via surfaces will keep those quarantines fairly sporadic, so they probably won't be very effective.

Travel restrictions between US states might put pressure on the worst states, but I'm unclear whether those will be enforced.

All of these factors will cause a moderate increase in elites scaring voters into supporting more restrictions of some sort, plus better testing and tracing, sometime during the next few months.

I'm unclear on whether further nursing home tragedies will cause much outrage, even though I expect some more avoidable deaths there.

Trump will likely lose the election and be considered somewhat disgraced, but it will likely be less of a landslide than pundits will predict, and more quickly forgotten.

The world will continue to under-prepare for pandemics, in spite of the general tendency to fight the last war. Invisible mindless enemies seem not to generate much of a warlike response.

The pandemic won't prevent the world from prospering. As to how many it will kill, that will be influenced more than I previously expected by luck.

There will also be a good deal of luck involved in whether public health officials get rewarded when they do things right.

Don't forget that the bubonic plague hasn't been fully eradicated in the US. Have a nice day.


Institutional Senescence

26 июня, 2020 - 07:40
Published on June 26, 2020 4:40 AM GMT

Consider this toy model:

An institution, such as a firm, an association or a state, is formed.

It works well in the beginning. It encounters different problems and solves them the best it can.

At some point though a small problem arises that happens to be a suboptimal Nash equilibrium: Non of the stakeholders can do better by trying to solve it on their own. Such problems are, almost by definition, unsolvable.

Thus the problem persists. It's an annoyance, but it's not a big deal. The institution is still working well and you definitely don't want to get rid of it just because it's not perfect.

As the time goes on, such problems accumulate. They also tend to have unpleasant consequences: If such a problem makes particular medical treatment unavailable, it incentivizes the patients to bribe the doctors and the doctors to break the law and administer the treatment anyway. Now, in addition to malfunctioning medical system, you have a problem with corruption.

After on time the institution accumulates so many suboptimal Nash equilibria that it barely works at all.

The traditional solution to this problem is internal strife, civil war or revolution. It eventually destroys the institution and, if everything goes well, replaces it with a different one where at least the most blatant problems are fixed.

War or revoulution is not a desirable outcome though: In addition to the human suffering, it also tends to replace the people in power. But the people in power don't like to be replaced and so they will try to prevent it.

One manoevre they can use is to introduce planned institutional death: Every now and then the institution would be dismantled and created anew, without having to resort to a war or revolution.

Here's an example: The credit system tends to be one big suboptimal Nash equilibrium in itself. Compound interest grows the size of the debt like crazy and unless there's a way to limit the harm it'll destroy people and business and eventually the entire economy. Even lenders would be hurt, but none of them has a reason to mitigate the problem. They could, in theory, forgive the debt for the sake of keeping the economy afloat, but that would put them in disadvantage to other lenders.

And so the king or the religious authority decides to have jubilee years. Every fifty years, all debts are forgiven. The institution of money lending dies and is rises anew from the ashes. (David Graeber asserts that the practice was, in fact, not specific to Israel, but common at the time among the ancient societies in the Middle East.)

One can also think of the democratic system of regular elections as a kind of planned institutional death. Every four years, the government, with all the accumulated dysfunction, is thrown out and a new one is instituted. But the government example also makes the problem with planned death obvious. Government is replaced, but the people on non-political positions, various administrators and small-scale decision makers, remain. At least some inadequate Nash equilibria can therefore survive the change of the government. And those would accumulate over the time and eventually lead to the system collapse. We are between a rock and a hard place here: We want to destroy the institution to break the equilibria, but at the same time we want to preserve the institutional knowledge. We don't want to get all the way back to the trees after all. We don't want to get back to the middle ages either.

Last example that comes to mind is IETF, the institution that standardizes how Internet works. The real work, the development of standards, is done in working groups, which have a clear charter that defines what they are supposed to achieve and more importantly, how long would it take. The working group exists for, say, four months, and then dies. Sure, there are IETF institutions other than the working groups and those can survive for longer. But these are mostly doing the support jobs. Organizing meetings, publishing the new standards and so on. The real stuff happens in the working groups.

All in all, I am not at all sure that planned institutional death is a solution to all suboptimal equilibria problems, but the fact that evolution uses it, that it fights dysfunctions, such as cancer, by discarding the bulk of the cells every now and then and preserving only the germline, makes it at least worth of consideration.

June 26th, 2020

by martin_sustrik