Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 54 минуты 1 секунда назад

Against boots theory

14 сентября, 2020 - 16:20
Published on September 14, 2020 1:20 PM GMT

The reason that the rich were so rich, Vimes reasoned, was because they managed to spend less money.

Take boots, for example. He earned thirty-eight dollars a month plus allowances. A really good pair of leather boots cost fifty dollars. But an affordable pair of boots, which were sort of OK for a season or two and then leaked like hell when the cardboard gave out, cost about ten dollars. Those were the kind of boots Vimes always bought, and wore until the soles were so thin that he could tell where he was in Ankh-Morpork on a foggy night by the feel of the cobbles.

But the thing was that good boots lasted for years and years. A man who could afford fifty dollars had a pair of boots that'd still be keeping his feet dry in ten years' time, while the poor man who could only afford cheap boots would have spent a hundred dollars on boots in the same time and would still have wet feet.

This was the Captain Samuel Vimes 'Boots' theory of socioeconomic unfairness.

– Terry Pratchett, Men at Arms

This is a compelling narrative. And I do believe there's some truth to it. I could believe that if you always buy the cheapest boots you can find, you'll spend more money than if you bought something more expensive and reliable. Similar for laptops, smartphones, cars. Especially (as Siderea notes, among other things) if you know how to buy expensive things that are more reliable.

But it's presented as "the reason that the rich [are] so rich". Is that true? I mean, no, obviously not. If your pre-tax income is less than the amount I put into my savings account, then no amount of "spending less money on things" is going to bring you to my level.

Is it even a contributing factor? Is part of the reason why the rich are so rich, that they manage to spend less money? Do the rich in fact spend less money than the poor?

That's less obvious, but I predict not. I predict that the rich spend more than the poor in total, but also on boots, laptops, smartphones, cars, and most other things. There might be exceptions where rich people consume less of the thing than poor people - bus tickets, for example - but I think if you group spending in fairly natural ways, the rich will spend more than the poor in almost every group.

  • Maybe they spend less money on their daily wear boots, but own more pairs of shoes for different occasions. Or maybe they decide that they care about other things than lifetime cost for their daily wear boots, and spend more on those, too. (Being rich means they can afford to care about other things than lifetime cost.)

  • Apparently famous people often get comped meals, but I bet most of them still spend more money on food than I do.

  • I spent £500 on a laptop in 2013, and before that, £300 in 2008. If I'd gone for £200 laptops each time, maybe they would only have lasted two years each. But if I weren't a techno-masochist, maybe I'd realize that using old laptops actually kind of sucks, and I'd upgrade far more often. My work laptop, bought by people who want me to be maximally effective at my job, cost over £1000 and isn't going to last ten years.

  • Financial services are a case where I'd guess the rich and the poor spend money on very different things. I assume the rich don't have to pay to cash a cheque, and very rarely visit loan sharks. But the poor rarely have Amex Platinum cards ($550/year), or personal accountants. (Maybe it's unfair to count those because they save you money in other areas?)

  • Buying a house may be cheaper in the long run than renting a similar house nearby. But rich people tend to live in nicer houses and/or nicer areas.

Those are all guesses. I don't have good data on this, and I'd love to see it if you do.

For what data I do have, the first google result was this page from the UK's Office of National Statistics. Specifically, look at figure 4, "Indexed household income, total spending and spending by component by income decile, UK, FYE 2019".

They split households into ten income levels, and look at four categories of spending plus total spending. Each of those is a near-strictly increasing line from "poor people spend less" to "rich people spend more". (I see two blips: the 90th percentile of income spends slightly less on housing than the 80th, and the 70th spends slightly less on food and non-alcoholid drinks than the 60th. The other categories are transport, and recreation and culture. These four are the largest spending categories on average across all income levels. The graph also has disposable income, which I think is irrelevant for current purposes.)

(I repeat that this specific data is not strong evidence. The source for it is the living costs and food survey, which might have more detail. (Link goes to the previous year's version because that's what I could find.) Unfortunately it's not open access. It might be freely available if I register, but I don't care enough to try right now. In any case, we'd also want data from outside the UK.)

There will obviously be some exceptions. There will be some rich people who spend less money than some poor people. There will probably even be some rich people who spend less money than some poor people, and would not be rich otherwise. But as a general theory for why the rich are rich? I just don't buy it.

I believe boots theory points towards one component of socioeconomic unfairness. But boots theory itself is supposed to be a theory of why the rich are so rich. It's very clear about that. It's clearly wrong, and I predict that even a weakened version of it is wrong.

To be a little more precise, I think boots theory as written makes three increasingly strong claims, that we could think of as "levels of boots theory":

  1. Being rich enables you to spend less money on things. (More generally: having incrementally more capital lets you spend incrementally less money. Also, being rich is super convenient in many ways.) This phenomenon is also called a ghetto tax.
  2. Also, rich people do in fact spend less money on things.
  3. Also, this is why rich people are rich.

All of these levels have stronger and weaker forms. But I think a quick look at the world tells us that the first level is obviously true under any reasonable interpretation, and the third level is obviously false under any reasonable interpretation. The second I predict is "basically just false under most reasonable interpretations", but it's less obvious and more dependent on details. There may well be weak forms of it that are true.

It may be that most people, when they think of boots theory, think only of levels one or two, not level three. I don't know if you can read this quora thread that I found on Google. It asks "How applicable to real life is the Sam Vimes "Boots" Theory of Economic Injustice?" The answers mostly agree it's very applicable, but I think most of them are on level one or two. (The one talking about leverage seems like level three, if it's talking about boots theory at all. I'm not convinced it is.)

But it seems to me that boots theory is usually presented in whole in its original form. Its original form is succinct and well written. When people want to comment on it, they very often include the very same quote as I did. And the original form starts by very clearly telling us "this is a theory of why the rich are so rich". It is very obviously level three, which is very obviously wrong.

So I have a few complaints here.

One is, I get the impression that most people don't even notice this. They link or quote something that starts out by saying very clearly "this is a theory of why the rich are so rich", and they don't notice that it's a theory of why the rich are so rich.

(I wouldn't be too surprised (though this is not a prediction) if even the author didn't notice this. Maybe if you had asked him, Terry Pratchett would have said that no, obviously Sam Vimes does not think this is why the rich are so rich, Sam Vimes just thinks this is a good illustration of why it's nice to be rich.)

This disconnect between what a thing actually says, and what people seem to think it says, just bothers me. I feel the desire to point it out.

Another is, I think there's a motte-and-bailey going on between levels one and two. A quora commenter says it's "far more expensive to be poor than it is to be rich, both in a percentage of income respect and a direct effect". He gives examples of things that rich people can spend less money on, if they choose. He doesn't provide data that rich people do spend less money on these things. Another describes how being rich lets you save money on food staples by stocking up when there's a sale. He doesn't provide data that rich people do spend less money on food or even staples. You could certainly make the case that neither of these people is explicitly claiming level two. But I do think they're hinting in that direction, even if it's not deliberate.

And relatedly: if we want to help people escape poverty, we need to know on what levels boots theory is true or false.1 If we want to know that, we need to be able to distinguish the levels. If "boots theory" can refer to any of these levels, then simply calling boots theory "true" (or even "false") is uninformative. We need to be more precise than that. To be fair, the quora commenters make specific falsifiable claims, which is commendable. But the claims are meant to be specific examples of a general phenomenon, and the general phenomenon is simply "boots theory", and it's not clear what they think that means.

I advise that if you talk about boots theory, you make it clear which level you're talking about. But maybe don't use that name at all. If you're talking about level one, the name "ghetto tax" seems fine. If you do want to talk about levels two or three, I don't have a good altiernative name to suggest. But since I don't think those levels are true, I'm not sure that's a big problem.

  1. I'm not too confident about this, and I don't want to get too distracted with object-level claims about how to actually fight poverty. But my sense is that: to the extent that level two is true, giving someone money fairly reliably sets up positive feedback loops that help them save more money in future. To the extent that it's not true, these feedback loops don't come for free. Maybe we can seek out spending categories where it is true, or groups of people for whom it is true. Maybe we can teach people how to find and take advantage of these feedback loops. If even level one isn't true, we don't get these loops at all. Of course, maybe it's worth giving people money even if we don't get the feedback loops. 


SlateStarCodex online meetup: Integrating evolutionary psychology and behaviorism

14 сентября, 2020 - 13:33
Published on September 14, 2020 10:33 AM GMT

Dr. Diana Fleischman will talk on integrating evolutionary psychology and behaviorism.

Sunday, September 27 at 20:30 IDT, 17:30 UTC, 10:30 PDT

Sign up here and we'll send you a link to the online meetup https://forms.gle/EJ9YxDvEPUT1YkEQ9

Summary: All of us want to change other people's behavior to align more closely with our goals. Over the last century, behaviorists have discovered how reward and punishment change the behavior of organisms. The central idea of this talk is that we are intuitive behaviorists and that our relationships, emotions, and mental health can be better understood if you consider how we evolved to change the behavior of others.

Diana Fleischman is an evolutionary psychologist currently writing a book called "How to Train Your Boyfriend" integrating evolutionary psychology and behaviorism. Diana has published extensively on disgust, human sexuality and evolutionary psychology more broadly. Currently she lives in Albuquerque, New Mexico.


Are there non-AI projects focused on defeating Moloch globally?

14 сентября, 2020 - 05:13
Published on September 14, 2020 2:13 AM GMT

Meditations on Moloch lays out a rather pessimistic view of the future, and then offers a super-intelligent AI "gardener" as the solution. A lot of the rationalist community is focused on AI, which makes sense in that light (and of course because of the existential risk of unaligned AI), but I don't know of any projects focused on non-AI solutions to countering or defeating Moloch. Some projects exist to counter specific local coordination problems, but apparently none to counter the global gardening problem in the original post? Am I missing such a project? Is there a reason that AI is the only plausible solution? Is this low-hanging fruit waiting to be picked?


Decision Theory is multifaceted

14 сентября, 2020 - 01:30
Published on September 13, 2020 10:30 PM GMT

Related: Conceptual Problems with UDT and Policy Selection, Formalising decision theory is hard


Anyone who is interested in decision theory. The post is pretty general and not really technical; some familiarity with counterfactual mugging can be useful, but overall the required background knowledge is not much.


The post develops the claim that identifying the correct solution to some decision problems might be intricate, if not impossible, when certain details about the specific scenario are not given. First I show that, in counterfactual mugging, some important elements in the problem description and in a possible formalisation are actually underspecified. Next I describe issues related to the concept of perfect prediction and briefly discuss whether they apply to other decision scenarios involving predictors. Then I present some advantages and disadvantages of the formalisation of agents as computer programs. A summary with bullet points concludes.

Missing parts of a “correct” solution

I focus on the version of the problem with cards and two humans since, to me, it feels more grounded in reality—a game that could actually be played—but what I say applies also to the version with a coin toss and Omega.

What makes the problem interesting is the conflict between these two intuitions:

  • Before Player A looks at the card, the best strategy seems to never show the card, because it is the strategy that makes Player A lose the least in expectation, given the uncertainty about the value of the card (50/50 high or low)
  • After Player A sees a low card, showing it seems a really good idea, because that action gives Player A a loss of 0, which is the best possible result considering that the game is played only once and never again. Thus, the incentive to not reveal the card seems to disappear after Player A knows that the card is low.

[In the other version, the conflict is between paying before the coin toss and refusing to pay after knowing the coin landed tails.]

One attempt at formalising the problem is to represent it as a tree (a formalisation similar to the following one is considered here). The root is a 50/50 chance node representing the possible values of the card. Then Player A chooses between showing and not showing the card; each action leads to a leaf with a value which indicates the loss for Player A. The peculiarity of counterfactual mugging is that some payoffs depend on actions taken in a different subtree.

[The tree of the other version is a bit different since the player has a choice only when the coin lands tails; anyway, the payoff in the heads case is “peculiar” in the same sense of the card version, since it depends on the action taken when the coin lands tails.]

With this representation, it is easy to see that we can assign an expected value (EV) to each deterministic policy available to the player: we start from the root of the tree, then we follow the path prescribed by the policy until we reach a payoff, which is assigned a weight according to the chance nodes that we’ve run into.

Therefore it is possible to order the policies according to their expected values and determine which one gives the lowest expected loss [or, in the other version, the highest EV] respect to the root of the tree. This is the formalism behind the first of the two intuitions presented before.

On the other hand, one could object that it is far from trivial that the correct thing to do is to minimise expected loss from the root of the tree. In fact, in the original problem statement, the card is low [tails], so the relevance of the payoffs in the other subtree—where the card is high [heads]—is not clear and the focus should be on the decision node with the low card, not on the root of the tree. This is the formalism behind the second intuition.

Even though the objection related to the second intuition sounds reasonable, I think one could point to other, more important issues underlying the problem statement and formalisation. Why is there a root in the first place and what does it represent? What do we mean when we say that we minimise loss “from the start”?

These questions are more complicated than they seem: let me elaborate on them. Suppose that the advice of maximising EV “from the start” is generally correct from a decision theory point of view. It is not clear how we should apply that advice in order to make correct decisions as humans, or to create an AI that makes correct decisions. Should we maximise value...

  1. ...from the instant in which we are “making the decision”? This seems to bring us back to the second intuition, where we want to show the card once we’ve seen it is low.
  2. ...from our first conscious moment, or from when we started collecting data about the world, or maybe from the moment which the first data point in our memory is about? In the case of an AI, this would correspond to the moment of the “creation” of the AI, whatever that means, or maybe to the first instant which the data we put into the AI points to.
  3. ...from the very first moment since the beginning of space-time? After all, the universe we are observing could be one possible outcome of a random process, analogous to the 50/50 high/low card [or the coin toss].

Regarding point 1, I’ve mentioned the second intuition, but other interpretations could be closer to the first intuition instead. The root could represent the moment in which we settle our policy, and this is what we would mean with “making the decision”.

Then, however, other questions should be answered about policy selection. Why and when should we change policy? If selecting a policy is what constitutes a decision, what exactly is the role of actions, or how is changing policy fundamentally different from other actions? It seems we are treating policies and actions as concepts belonging to two different levels in a hierarchy: if this is a correct model, it is not clear to me why we do not use further levels, or why we need two different levels, especially when thinking in terms of embedded agency.

Note that giving precise answers to the questions in the previous paragraph could help us find a criterion to distinguish fair problems from unfair ones, which would be useful to compare the performance of different decision theories, as pointed out in the conclusion of the paper on FDT. Considering fair all the problems in which the outcome depends only on the agent’s behavior in the dilemma at hand (p.29) is not a satisfactory criterion when all the issues outlined before are taken into account: the lack of clarity about the role of root, decision nodes, policies and actions makes the “borders” of a decision problem blurred, and leaves the agent’s behaviour as an underspecified concept.

Moreover, resolving the ambiguities in the expression “from the start” could also explain why it seems difficult to apply updatelessness to game theory (see the sections “Two Ways UDT Hasn’t Generalized” and “What UDT Wants”).

PredictorsA weird scenario with perfect prediction

So far, we’ve reasoned as if Player B—who determines the loss .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} p2 of Player A by choosing the value of p that best represents his belief that the card is high—can perfectly guess the strategy that Player A adopts. Analogously, in the version with the coin toss, Omega is capable of perfectly predicting what the decision maker does when the coin lands tails, because that information is necessary to determine the payoff in case the coin lands heads.

However, I think that also the concept of perfect prediction deserves further investigation: not because it is an implausible idealisation of a highly accurate prediction, but because it can lead to strange conclusions, if not downright contradictions, even in very simple settings.

Consider a human that is going to choose only one between two options: M or N. Before the choice, a perfect predictor analyses the human and writes the letter (M or N) corresponding to the predicted choice on a piece of paper, which is given to the human. Now, what exactly prevents the human from reading the piece of paper and choosing the other option instead?

From a slightly different perspective: assume there exists a human, facing a decision between M and N, who is capable of reading a piece of paper containing only one letter, M or N, and choosing the opposite—seems quite a weak assumption. Is a “perfect predictor” that writes the predicted option on a piece of paper and gives it to the human… always wrong?

Note that allowing probabilities doesn’t help: a human capable of always choosing M when reading a prediction like “probability p of choosing M, probability 1-p of choosing N” seems as plausible as the previous human, but again would make the prediction always wrong.

Other predictions

Unlike the previous example, Newcomb’s and other problems involve decision makers who are not told about the prediction outcome. However, the difference might not be as clear-cut as it first appears. If the decision maker regards some information—maybe elements of the deliberation process itself—as evidence about the imminent choice, the DM will also have information about the prediction outcome, since the predictor is known to be reliable. To what extent is this information about the prediction outcome different from the piece of paper in the previous example? What exactly can be considered evidence about one’s own future choices? The answer seems to be related to the details of the prediction process and how it is carried out.

It may be useful to consider how a prediction is implemented as a specific program. In this paper by Critch, the algorithm FairBotk plays the prisoner’s dilemma by cooperating if it successfully predicts that the opponent will cooperate, and defecting otherwise. Here the “prediction” consists in a search for proofs, up to a certain length, that the other algorithm outputs Cooperate when given FairBotk as input. Thanks to a bounded version of Löb’s theorem, this specific prediction implementation allows FairBotk to cooperate when playing against itself.

Results of this kind (open-source game theory / program equilibrium) could be especially relevant in a future in which important policy choices are made by AIs that interact with each other. Note, however, that no claim is made about the rationality of FairBotk's overall behaviour—it is debatable whether FairBotk's decision to cooperate against a program that always cooperates is correct.

Moreover, seeing decision makers as programs can be confusing and less precise than one would intuitively think, because it is still unclear how to properly formalise concepts such as action, policy and decision-making procedure, as discussed previously. If actions in certain situations correspond to program outputs given certain inputs, does policy selection correspond to program selection? If so, why is policy selection not an action like the other ones? And—related to what I said before about using a hierarchy of exactly two levels—why don’t we also “select” the code fragment that does policy selection?

In general, approaches that use some kind of formalism tend to be more precise than purely philosophical approaches, but there are some disadvantages as well. Focusing on low-level details can make us lose sight of the bigger picture and limit lateral thinking, which can be a great source of insight for finding alternative solutions in certain situations. In a blackmail scenario, besides the decision to pay or not, we could consider what factors caused the leakage of sensible information, or the exposure of something we care about, to adversarial agents. Another example: in a prisoner’s dilemma, the equilibrium can shift to mutual cooperation thanks to the intervention of an external actor that makes the payoffs for defection worse (the chapter on game theory in Algorithms to Live By gives a nice presentation of this equilibrium shift and related concepts).

We may also take into account that, for efficiency reasons, predictions in practice might be made with methods different from close-to-perfect physical or algorithmic simulation, and the specific method used could be relevant for an accurate analysis of the situation, as mentioned before. In the case of human interaction, sometimes it is possible to infer something about one’s future actions by reading facial expressions; but this also means that a predictor can be tricked if one is capable of masking their own intentions by keeping a poker face.

  • The claim that a certain decision is correct because it maximises utility may require further explanation, since every decision problem sits in a context which might not be fully captured in the problem formalisation.
  • Perfect prediction leads to seemingly paradoxical situations. It is unclear whether these problems underlie other scenarios involving prediction. This does not mean the concept must be rejected; but our current understanding of prediction might lack critical details. Certain problems may require clarification of how the prediction is made before a solution is claimed as correct.
  • The use of precise mathematical formalism can resolve some ambiguities. At the same time, interesting solutions to certain situations may lie “outside” the original problem statement.

Thanks to Abram Demski, Wolfgang Schwarz and Caspar Oesterheld for extensive feedback.

This work was supported by CEEALAR.


There are biases in favor of the there-is-always-a-correct-solution framework. Uncovering the right solution in decision problems can be fun, and finding the Decision Theory to solve them all can be appealing.

On “wrong” solutions

Many of the reasons provided in this post explain also why it’s tricky to determine what a certain decision theory does in a problem, and if the given solution is wrong. But I want to provide another reason, namely the following informal...

Conjecture: for any decision problem that you believe CDT/EDT gets wrong, there exists a paper or book in which a particular version of CDT/EDT gives the solution that you believe is correct, and/or a paper or book that argues that the solution you believe is correct is actually wrong.

Here’s an example about Newcomb’s problem.


Have you tried hiIQpro.com's cognitive training or coaching?

14 сентября, 2020 - 01:05
Published on September 13, 2020 10:05 PM GMT

Seems extremely flimy-flammy, but they do offer a money back guarantee if you don't jump 10 - 20 points on a standardized test.


A Brief Chat on World Government

13 сентября, 2020 - 21:33
Published on September 13, 2020 6:33 PM GMT

[This is the transcript of a chat conversation I had with another member of my local rationalist meet-up, on the topics of Moloch, world government, and colonization. Lightly edited for clarity, spelling, etc. and shared with their permission. Cross-posted from Grand, Unified, Empty.]

Me: Here are some thoughts on Moloch. Moloch basically guarantees that anybody who can figure out how to successfully convert other values into economic value will out-compete the rest. So in the end, we are the paperclip maximizers, except our paperclips are dollar bills.

Scott proposes that to defeat Moloch we install a gardener, specifically a super-intelligent AI. But if you don’t think that’s going to happen, a world government seems like the next best thing. However if we escape earth before that happens, speed of light limitations will forever fragment us into competing factions impossible to garden. Therefore we should forbid any attempts to colonize Mars or other planets until we have world government and the technology to effectively manage such colonies under that government.

Them: The superorganisms in his parable only function because of… external competitive pressures. If cells didn’t need to band together to survive, they wouldn’t. If governments don’t have to fend off foreign governments they will accumulate corruption and dysfunctions.

Sort of related, I’m not persuaded by the conclusion to his parable. Won’t superintelligent AIs be subject to the same natural selective pressures as any other entity? What happens when our benevolent gardener encounters the expanding sphere of computronium from five galaxies over?

Me: Cells were surviving just fine without banding together. It was just that cells which banded together reproduced and consumed resources more effectively than those which didn’t. Similarly, I think a well constructed world government could survive just fine without competitive pressure. We haven’t necessarily found the form of that government yet, but liberal democracy seems like a decent first step.

Regarding competitive pressure on AI, he deals with that off hand by assuming that accelerating self-improvement gives an unbreakable first mover advantage. I don’t think that’s actually true, but then I’m much less bullish on super-intelligent AI in general.

Them: It would “survive,” but we don’t want a surviving government, we want a competent, benevolent one. My read on large organizations in general is that they naturally tend towards dysfunction, and it’s only competitive pressures that keep them functional.

Me: That produces a dismal view of the universe. We are given a Sophie’s Choice of either tiling the universe in economicium in order to compete and survive, or instantiating a global gardener which inherently tends towards dystopic dysfunction.

My read on large organizations in general is that they naturally tend towards dysfunction, and it’s only competitive pressures that keep them functional.

This is certainly mostly true, but I’m not yet convinced it’s necessarily true.

competitive pressures

I think this in particular is too narrow. Hunter-gatherer bands were organizations that stayed relatively “functional”, often not due to competitive pressures with other bands, but due to pure environmental survival pressures. We probably don’t want a government that stays functional due to environmental survival pressures either, but I’m generalizing to an intuition that there are other kinds of pressure.

Them: There are other kinds of pressure, but you better be damn sure you’ve got them figured out before you quash all rivals.

Me: 100%

Them: And to be precise, yeah, there’s a second thing keeping organizations intact, and that’s the floor imposed by “so incompetent they self-destruct.” But I think they degrade to the level of the floor, at which point they are no longer robust enough to survive two crises taking place at once, so they collapse anyway.

Me: Hmm, so it becomes impossible to instantiate a long-term stable gardener of any kind, and we’re stuck tiling the universe in economicium regardless.

Them: Well I think it might be possible (in the short term at least), but you have to be cognizant of the risks before you assume removing competition will make things better. So when I imagine a one-world-government, it’s more like a coordinating body above a collection of smaller states locked in fierce competition (hopefully just economic, cultural & athletic).

Me: At the risk of clarifying something which is already clear: I was never arguing that we are ready for world government now, or should work towards that soon; I was just saying there are some things we shouldn’t do until we have a good world government. We should make sure we can garden what we have before we go buying more land.

Them: Hmm, okay, I think that’s some important nuance I was overlooking.

Me: Though perhaps that is an inherently useless suggestion, since the coordination required to not buy more land is… a global gardener. Otherwise there’s competitive advantage in getting to more land first.

Them: So its a fair point. I assume that any pan-global body will not be well-designed, since it won’t be subject to competitive pressures. But its true that you might want to solve that problem before you start propagating your social structures through the universe.

Me: I’m now imagining the parallel argument playing out in Europe just post-Columbus. “We shouldn’t colonize North America until we have a well-gardened Europe”. That highlights the absurdity of it rather well.


Egan's Theorem?

13 сентября, 2020 - 20:47
Published on September 13, 2020 5:47 PM GMT

When physicists were figuring out quantum mechanics, one of the major constraints was that it had to reproduce classical mechanics in all of the situations where we already knew that classical mechanics works well - i.e. most of the macroscopic world. Likewise for special and general relativity - they had to reproduce Galilean relativity and Newtonian gravity, respectively, in the parameter ranges where those were known to work. Statistical mechanics had to reproduce the fluid theory of heat; Maxwell's equations had to agree with more specific equations governing static electricity, currents, magnetic fields and light under various conditions.

Even if the entire universe undergoes some kind of phase change tomorrow and the macroscopic physical laws change entirely, it would still be true that the old laws did work before the phase change. Any new theory and any new theory would still have to be consistent with the old laws working, where and when they actually did work.

This is Egan's Law: it all adds up to normality. When new theory/data comes along, the old theories are still just as true as they always were. New models must reproduce the old in all the places where the old models worked; otherwise the new models are incorrect, at least in the places where the old models work and the new models disagree with them.

It really seems like this should be not just a Law, but a Theorem.

I imagine Egan's Theorem would go something like this. We find a certain type of pattern in some data. The pattern is highly unlikely to arise by chance, or allows significant compression of the data, or something along those lines. Then the theorem would say that, in any model of the data, either:

  • The model has some property (corresponding to the pattern), or
  • The model is "wrong" or "incomplete" in some sense - e.g. we can construct a strictly better model, or show that the model consistently fails to predict the pattern, or something like that.

The meat of such a theorem would be finding classes of patterns which imply model-properties less trivial than just "the model must predict the pattern" - i.e. patterns which imply properties we actually care about. Structural properties like e.g. (approximate) conditional independencies seem particularly relevant, as well as properties involving abstractions/embedded submodels (in which case the theorem should tell how to find the abstraction/embedding).

Does anyone know of theorems like that? Maybe this is equivalent to some standard property in statistics and I'm just overthinking it?


Does turning on the shower help reduce wildfire smoke in the air?

13 сентября, 2020 - 05:39
Published on September 13, 2020 2:39 AM GMT

Rain is said to help air quality not only by stopping fires but also by removing smoke particles from the air. Does turning on the shower also remove smoke particles from the air, or does something different happen higher up in the atmosphere vs in a shower?


Gems from the Wiki: Acausal Trade

13 сентября, 2020 - 03:23
Published on September 13, 2020 12:23 AM GMT

During the LessWrong 1.0 Wiki Import we (the LessWrong team) discovered a number of great articles that most of the LessWrong team hadn't read before. Since we expect many others to also not have have read these, we are creating a series of the best posts from the Wiki to help give those hidden gems some more time to shine.

Most of the work for this post was done by Joshua Fox who I've added as a coauthor to this post, wiki edits were also made by all of the following: Lukeprog, Gwern, Vladimir Nesov, Sauski, Deku-shrub, Caspar42, Joe Collman and Jja. Thank you all for your contributions!

In acausal trade, two agents each benefit by predicting what the other wants and doing it, even though they might have no way of communicating or affecting each other, nor even any direct evidence that the other exists.

Background: Superrationality and the one-shot Prisoner's Dilemma

This concept emerged out of the much-debated question of how to achieve cooperation on a one-shot Prisoner's Dilemma, where, by design, the two players are not allowed to communicate. On the one hand, a player who is considering the causal consequences of a decision ("Causal Decision Theory") finds that defection always produces a better result. On the other hand, if the other player symmetrically reasons this way, the result is a Defect/Defect equilibrium, which is bad for both agents. If they could somehow converge on Cooperate, they would each individually do better. The question is what variation on decision theory would allow this beneficial equilibrium.

Douglas Hofstadter (see references) coined the term "superrationality" to express this state of convergence. He illustrated it with a game in which twenty players, who do not know each other's identities, each get an offer. If exactly one player asks for the prize of a billion dollars, they get it, but if none or multiple players ask, no one gets it. Players cannot communicate, but each might reason that the others are reasoning similarly. The "correct" decision--the decision which maximizes expected utility for each player, if all players symmetrically make the same decision--is to randomize a one-in-20 chance of asking for the prize.

Gary Drescher (see references) developed the concept further, introducing an ethical system called "acausal subjunctive morality." Drescher's approach relies on the agents being identical or at least similar, so that each agent can reasonably guess what the other will do based on facts about its own behavior, or even its own "source code." If it cooperates, it can use this correlation to infer that the other will probably also cooperate.

Acausal trade goes one step beyond this. The agents do not need to be identical, nor similar, nor have the same utility function. Moreover, they do not need to know what the other agents are like, nor even if they exist. In acausal trade, an agent may have to surmise the probability that other agents, with their utility function and proclivities, exist.


We have two agents, separated so that no interaction is possible. The separation can be simply because each is not aware of the location of the other; or else each may be prevented from communicating with or affecting the other.

In an asymmetrical example, one agent may be in the other's future.

Other less prosaic thought experiments can be used to emphasize that interaction may be absolutely impossible. For example, agents that are outside each other's light cones, or in separate parts of an Everett multiverse. And abstracting away from those scenarios, we can talk of counterfactual "impossible possible worlds" as a model for probability distributions.

In truly acausal trade, the agents cannot count on reputation, retaliation, or outside enforcement to ensure cooperation. The agents cooperate because each knows that the other can somehow predict its behavior very well. (Compare Omega in Newcomb's problem.) Each knows that if it defects (respectively: cooperates), the other will (probabilistically) know this, and defect (respectively: cooperate).

Acausal trade can also be described in terms of (pre)commitment: Both agents commit to cooperate, and each has reason to think that the other is also committing.

Prediction mechanisms

For acausal trade to occur, each agent must infer there is some probability that an agent, of the sort that will acausally trade with it, exists.

The agent might be told, exogenously (as part of the scenario), that the other exists. But more interesting is the case in which the agent surmises the probability that the other exists.

A superintelligence might conclude that other superintelligences would tend to exist because increased intelligence is an convergent instrumental goal for agents. Given the existence of a superintelligence, acausal trade is one of the tricks it would tend to use.

To take a more prosaic example, we humans realize that humans tend to be alike: Even without knowing about specific trading partners, we know that there exist other people with similar situations, goals, desires, challenges, resource constraints, and mental architectures.

Once an agent realizes that another agent might exist, there are different ways that might might predict the other agent's behavior, and specifically that the other agent can be an acausal trading partner.

  1. They might know or surmise each other's mental architectures (source code).
  2. In particular, they might know that they have identical or similar mental architecture, so that each one knows that its own mental processes approximately simulate the other's.
  3. They might be able to simulate each other (perhaps probabalistically), or to predict the other's behavior analytically. (Even we humans simulate each other's thoughts to guess what the other would do.)
  4. More broadly, it is enough to know (probabilistically) that the other is a powerful optimizer, that it has a certain utility function, and that it can derive utility from resources. Seen mathematically, this is just an optimization problem: What is the best possible algorithm for an agent's utility function? Cooperate/Cooperate is optimal under certain assumptions, for if one agent could achieve optimal utility by defecting, then, symmetrically, so could the other, resulting in Defect/Defect which generates inferior utility.
Decision Theories

Acausal trade is a special case of Updateless decision theory (or a variant like Functional Decision Theory, see references). Unlike better-known variations of Decision theory, such as Causal decision theory, acausal trade and UDT take into account the agent's own algorithm as cause and caused.

In Causal Decision Theory, the agent's algorithm (implementation) is treated as uncaused by the rest of the universe, so that though the agent's decision and subsequent action can make a difference, its internal make-up cannot (except through that decision). In contrast, in UDT, the agents' own algorithms are treated as causal nodes, influenced by other factors, such as the logical requirement of optimality in a utility-function maximizer. In UDT, as in acausal trade, the agent cannot escape the fact that its decision to defect or cooperate constitutes strong Bayesian evidence as to what the other agent will do, and so it is better off cooperating.

Limitations and Objections

Acausal trade only works if the agents are smart enough to predict each other's behavior, and then smart enough to acausally trade. If one agent is stupid enough to defect, and the second is smart enough to predict the first, then neither will cooperate.

Also, as in regular trade, acausal trade only works if the two sides are close enough in power that the weaker side can do something worthwhile enough for the stronger.

A common objection to this idea: Why shouldn't an agent "cheat" and choose to defect? Can't it "at the last moment" back out after the other agent has committed? However, this approach takes into account only the direct effect of the decision, while a sufficiently intelligent trading partner could predict the agent's choice, including that one, rendering the "cheating" approach suboptimal.

Another objection: Can an agent care about (have a utility function that takes into account) entities with which it can never interact, and about whose existence it is not certain? However, this is quite common even for humans today. We care about the suffering of other people in faraway lands about whom we know next to nothing. We are even disturbed by the suffering of long-dead historical people, and wish that, counterfactually, the suffering had not happened. We even care about entities that we are not sure exist. For example: We might be concerned by news report that a valuable archaeological artifact was destroyed in a distant country, yet at the same time read other news reports stating that the entire story is a fabrication and the artifact never existed. People even get emotionally attached to the fate of a fictional character.

An example of acausal trade with simple resource requirements

At its most abstract, the agents are simply optimization algorithms. As a toy example, let T be a utility function for which time is most valuable as a resource; while for utility function S, space is most valuable, and assume that these are the only two resources.

We will now choose the best algorithms for optimizing T. To avoid anthropomorphizing, we simply ask which algorithm--which string of LISP, for example--would give the highest expected utility for a given utility function. Thus, the choice of source code is "timeless": We treat it as an optimization problem across all possible strings of LISP. We assume that computing power is unlimited. Mathematically, we are asking about argmax T.

We specify that there is a probability that either agent will be run in an environment where time is in abundance, and if not, some probability that it will be run in a space-rich universe.

If the algorithm for T is instantiated in a space-rich environment, it will only be able to gain a small amount of utility for itself, but S would be able to gain a lot of utility; and vice versa.

The question is: What algorithm for T provides the most optimization power, the highest expected value of utility function T?

If it turns out that the environment is space-rich, the agent for T may run the agent (the algorithm) for S, increasing the utility for S, and symmetrically the reverse. This will happen if each concludes, that the optimum occurs when the other agent has the "trading" feature. Given that this is the optimal case, the acausal trade will occur.

Acausal trade with complex resource requirements

In the toy example above, resource requirements are very simple. In general, given that agents can have complex and arbitrary goals requiring a complex mix of resources, an agent might not be able to conclude that a specific trading partner has a meaningful chance of existing and trading.

However, an agent can analyze the distribution of probabilities for the existence of other agents, and weight its actions accordingly. It will do acausal "favors" for one or more trading partners, weighting its effort according to its subjective probability that the trading partner exists. The expectation on utility given and received will come into a good enough balance to benefit the traders, in the limiting case of increasing super-intelligence.

Ordinary trade

Even ordinary trade can be analyzed acausally, using a perspective similar to that of Updateless decision theory. We ask: Which algorithm should an agent have to get the best expected value, summing across all possible environments weighted by their probability? The possible environments include those in which threats and promises have been made.

See alsoReferences


Progress: Fluke or trend?

13 сентября, 2020 - 03:21
Published on September 13, 2020 12:21 AM GMT

A foundational conviction of The Roots of Progress is that progress is a trend with definite, substantive causes, and that it can continue far, far into the future. Progress is not automatic or inevitable: it can slow, stop, even reverse. But the history of progress over the last 200+ years convinces me that much more is possible.

Not everyone agrees, however. To learn more about how people think about this, I posed a question on Twitter:

Do you think the last 200+ years of technological/industrial progress were…

… a trend with substantive causes, that we can expect to continue?

… a fluke, a stroke of luck, not to be repeated?

And why?

After discussing it with people all day, most of the “fluke” arguments were:

  1. Argument from failure of imagination: “I can’t see or imagine any big breakthroughs, therefore I don’t expect any.”
  2. Materialism: Progress is primarily driven by material resources (such as fossil fuels); therefore it will slow when those inevitably run out.

Failure of imagination is not a compelling argument to me, for both logical and historical reasons. The logical reason should be obvious. The historical reason is that the big breakthroughs of the past were not easy to imagine or predict before they happened. In a different context, Eliezer Yudkowsky points out that even the creators of inventions such as the airplane or the nuclear reactor felt that their breakthroughs were fifty years out, or even impossible, shortly before they happened. Now is no different. (This point seems exceedingly difficult to get through to people; no matter how much you point out the logical fallacy, or the historical precedent, they continue to repeat the same points. I don’t know if this is because the logical fallacy itself is unclear, or if it’s just a form of mood affiliation, or what.)

There’s a variation of this argument which goes: The universe is finite, so there’s a finite number of breakthroughs to make, so they have to run out eventually. But even granting this, why assume we have found even 1% of the big breakthroughs so far? Or 0.01%? If there are many more to be had, then progress can continue for a long time.

As for materialism, I disagree with the premise. I don’t think progress is primarily driven by material resources. When we think of the Industrial Revolution, we often think of steam engines, iron foundries, and locomotives, all run on coal. But there were equally important inventions, such as textile automation, that didn’t require any fuel at all. And the coal was sitting in the ground for all of human history, without any industrial revolutions happening for a very long time. So “natural” resources seem neither necessary nor sufficient for progress. (Indeed, there are no “natural” resources.) For more on this point, see Deirdre McCloskey’s Bourgeois Dignity, especially chapters 20–21.

There were also people arguing an option I didn’t suggest, which is “a trend with substantive causes, that will not continue”—typically because of social reasons: we are abandoning the causes of the trend, or putting up blockers. This is more plausible to me. Progress isn’t natural; we make it happen through choice and effort, and we only do so when we believe it is possible and desirable. It depends on certain legal institutions, and it requires time, talent and treasure. If any of those are lost—say, if we stop celebrating progress, or turn against growth—progress may not continue.

But in order to care about progress studies, we have to believe that the last few centuries of unprecedented progress didn’t just randomly happen because of a lucky break, and they weren’t a short-term acceleration of growth that will soon inexorably return to pre-industrial levels. There has to be a goal: namely, the next 200 years of progress. This whole endeavor is premised on that.


Notes on good judgement and how to develop it (80,000 Hours)

12 сентября, 2020 - 20:51
Published on September 12, 2020 5:51 PM GMT

This post by 80,000 hours struck me as more than usually relevant to my interests in developing the art of rationality. It doesn't really say anything new, but it does provide a decent summary of a frame that I think is an important subset of epistemic rationality, in the form of "good judgement". 

More practically, I think of someone with good judgement as someone able to:

  1. Focus on the right questions
  2. When answering those questions, synthesise many forms of weak evidence using good heuristics, and weigh the evidence appropriately
  3. Be resistant to common cognitive biases by having good habits of thinking
  4. Come to well-calibrated conclusions

Owen Cotton-Barratt wrote out his understanding of good judgement, breaking it into ‘understanding’ and ‘heuristics’. His notion is a bit broader than mine.

Here are some closely related concepts:

  • Keith Stanovich’s work on ‘rationality’, which seems to be something like someone’s ability to avoid cognitive biases, and is ~0.7 correlated with intelligence (so, closely related but not exactly the same)
  • The cluster of traits (listed later) that make someone a good ‘superforecaster’ in Philip Tetlock’s work (Tetlock also claims that intelligence is only modestly correlated with being a superforecaster)

Here are some other concepts in the area, but that seem more different:

  • Intelligence: I think of this as more like ‘processing speed’ – your ability to make connections, have insights, and solve well-defined problems. Intelligence is an aid in good judgement – since it lets you make more connections – but the two seem to come apart. We all know people who are incredibly bright but seem to often make dumb decisions. This could be because they’re overconfident or biased, despite being smart.
  • Strategic thinking: Good strategic thinking involves being able to identify top priorities, and develop a good plan for working towards those priorities, and improving the plan over time. Good judgement is a great aid to strategy, but a good strategy can also make judgement less necessary (e.g. by creating a good back-up plan, you can minimise the risks of your judgement being wrong).
  • Expertise: Knowledge of the topic is useful all else equal, but Tetlock’s work (covered more below) shows that many experts don’t have particularly accurate judgement.
  • Decision making: Good decision making depends on all of the above: strategy, intelligence, and judgement.

I do disagree with some of the distinctions being made in the post. As an example, just in the section above, the conception of "Intelligence" as "processing speed" is really flawed, and in-practice intelligence already measures something closer to "good judgement". But overall, the post seems decent as a potential intro into a bunch of rationality stuff.


Comparative advantage and when to blow up your island

12 сентября, 2020 - 10:02
Published on September 12, 2020 6:20 AM GMT

Economists say free trade is good because of "comparative advantage". But what is comparative advantage? Why is it good?

This is sometimes considered an arcane part of economics. (Wikipedia defines it using "autarky".) But it's really a very simple idea. Anyone can use it to understand the world and make decisions.

I Islands

Say you live alone on an island.

Each week you gather and eat 10 coconuts and 10 bananas. It takes you five minutes to gather a coconut, and 10 minutes for a banana. Thus, you work 150 minutes per week.

You Need Time to gather one Time You Spend Coconuts 10 5 minutes 50 minutes Bananas 10 10 minutes 100 minutes Total: 150 minutes

I live on a nearby island.

Just like you I eat 10 coconuts and 10 bananas per day. But unlike you, I'm terrible at everything.

I Need Time to gather one Time I Spend Coconuts 10 60 minutes 600 minutes Bananas 10 30 minutes 300 minutes Total: 900 minutes

Since I'm so incompetent, I need to work a lot more than you -- six times as much.

II The Bridge

Thus, we live our lives until one day a bridge appears between the islands.

We are both peaceful. We will not coerce each other, but are otherwise completely selfish. What will happen?

Intuitively, you value bananas more, while I value coconuts more. So it's natural to trade my bananas for your coconuts. We agree as follows: Each week, you gather 20 coconuts, and I gather 20 bananas. Then, I trade 10 of my bananas for 10 your coconuts. It’s easy to check that this will make both of us better off.

You Gather Time to gather one Time You Spend Coconuts 20 5 minutes 100 minutes Bananas 0 10 minutes 0 minutes Total: 100 minutes I Gather Time to gather one Time I Spend Coconuts 0 60 minutes 0 minutes Bananas 20 30 minutes 600 minutes Total: 600 minutes

In one sense, it's obvious that trade makes us both better off. If it didn't we wouldn't both agree to it! But comparative advantage explains how. You have an advantage at everything. But I have a comparative advantage at bananas, because my ratio (banana time) / (coconut time) is lower than yours. And if we both concentrate our efforts on the thing we have a comparative advantage at, we are both better off.

This is why economists like free trade. If different producers have different relative abilities, everyone can benefit from specializing. This is true even if one producer is better at everything.

Beyond trade, this is an important lesson for life. Choosing your career path? Dividing up chores with your partner? Think about comparative advantage!

III Complexities

The real world, of course, is more complex. For example:

  • There might be transportation costs.
  • It might get harder to find coconuts as you gather more of them.
  • There might be more goods to trade.

More sophisticated models can deal with these complications. The math gets more complex, but more or less the same conclusion arises. There is one complication that's a bit special:

  • There might be more than two people.

In this case, introducing trade can make individual people worse off: Suppose you live with me on an island, but you're incapable of gathering bananas. Since you need them to live, I can demand a huge number of coconuts for one banana. When a bridge opens up to another island, you might get a better trade. This will help you, but actually hurt me. Still, introducing free trade still makes people better off "on the whole".

In this subtlety, Politics emerges. In principle, one can always use free trade plus a set of wealth transfers to make every individual better off. But that's a nightmare in practice: It would require a central authority to predict what set of trades the market will decide on. So we're left with a mess.


But even in this toy model of two people on two islands, I skipped an important step. How did we decide to trade 10 coconuts for 10 bananas? I might say: “I’ll trade 7 bananas for 10 of coconuts. Take it or leave it!”

Of course, this would be great for me, and worse for you than our original trade. But it’s easy to check that this is better for you than no trade at all.

You Gather Time to gather one Time You Spend Coconuts 20 5 minutes 100 minutes Bananas 3 10 minutes 30 minutes Total: 130 minutes I Gather Time to gather one Time I Spend Coconuts 0 60 minutes 0 minutes Bananas 17 30 minutes 510 minutes Total: 510 minutes

Now, what possible banana/coconut exchange rates could we arrive at? I’d be happiest paying you nearly zero bananas for each coconut. On the other hand, I'd never agree to pay you three bananas per coconus -- it would be "cheaper" for me to just make the coconuts myself. I'd never agree to trade more than two.

Thus, I benefit from any trade where I pay you between 0 and 2 bananas for one coconut. These are the only trades I'd ever agree to.

Of course, I’d prefer to pay you fewer bananas! So I’d prefer a rate to the left end of this range.

Conversely, it takes you twice as long to make banana as a coconut. You’d be thrilled if I paid you 4 bananas per coconut, but you’d never accept less than 1/2 a banana for one coconut.

Of course, I’d prefer to pay you fewer bananas! So I’d prefer a rate to the left end of this range.

Conversely, it takes you twice as long to make banana as a coconut. You’d be thrilled if I paid you 4 bananas per coconut, but you’d never accept less than 1/2 a banana for one coconut.

Thus, you benefit from any trade where I give you more than 1/2 a banana for a coconut.

You’d like me to pay you as many bananas as possible. So you’d prefer a rate as far to the right as possible.

Now, the big question is: What rate do we agree on? Simple economics does not tell us the answer! In principle, our negotiations could arrive at an “exchange rate” of anywhere between .5 and 2 of your coconuts for 1 of my bananas.

This range of (.5 to 2) is the Zone of Possible Agreement (ZOPA) in negotiation theory.

V Perverse Behavior

There's no simple simple math to decide what point in the ZOPA we settle on. This can lead to strange and perverse behaviors.

Walking away. Since we are nonviolent, the only “threat” is to refuse to trade. If you know I am “rational” and won’t refuse a beneficial deal, you can be “irrational” and refuse to trade unless we do so at the end of the range that’s favorable to you. Thus, your “irrational” behavior gives you a better outcome than my “rational” behavior.

Gaining information. Before our first meeting, I build a telescope and spy on you. When we meet, I say “I noticed it takes you 2x as long to make a banana as a coconut. It takes me 1.95x as long. Bananas sure are hard, aren't they? Because I like you, I’m willing to trade at a rate of .512=1/1.95 bananas per coconut. This does nothing for me, but you have a kind face, and I want to help you." If you believe me, I get a very favorable rate.

Concealing information. You are smart. After the bridge appears, you quality realize I might spy on you, and this would harm your negotiation position. Before doing any gathering, you construct a privacy wall around your island.

Faking skills. You’re a hard-ass. You will walk away unless I agree to an exchange on your end. I’ve tried walking away, but you don't care. I always blink before you, and we both know it. What can I do? For a weeks, I secretly gather coconuts in the night. The next time we meet, I bring a huge pile of coconuts. I say “I’ve been practicing, and now it only takes me 1.5x as long to make a coconut as a banana. I know you’re a hard-ass and you want the sweet end of the ZOPA. I admit I can’t beat you, but the ZOPA has shifted. You need to offer me a better deal.”

Blowing up my island. You’re a hard-ass. I only get .501 coconuts for 1 banana. I’ve tried walking away, but we both know you will out-wait me. I've tried fakings skills, but you won't bite. Because we are non-violent, I can’t coerce you. But there’s nothing wrong with hurting myself, is there? I build a machine that monitors inter-island commerce. If there is ever a trade that is not 1 coconut for 1 banana, the machine activates a bomb, my island sinks into the ocean forever, and I die. If I try to disable the machine, the bomb activates. When we next meet I say “OK. I can’t out bad-ass you. However, because of this machine, it will forever be against my interests to agree to a non-even trade. There’s no point in you waiting. Even if I did agree to an uneven trade, I'd sink into the ocean, and you'd have to gather your own coconuts!"

Blowing up your island if I threaten to blow up my island. You are smart. You are also a hard-ass. As soon as the bridge appears, you know you can out-wait me to get a good rate. You immediately realize that my only option is to build the island destroying machine described above. Before we meet, you construct a machine that monitors my island for the presence of machines. Your machine is connected to a bomb on your island. If at any point, a bomb-activating machine is constructed on my island, your bomb activates, your island sinks into the ocean, and you die. When we meet, you explain that you’re a hard-ass, and that no island-destroying machines can help me. My best bet is to accept terms that barely improve my situation at all. You win.


How Much Computational Power Does It Take to Match the Human Brain?

12 сентября, 2020 - 09:38
Published on September 12, 2020 6:38 AM GMT

Joe Carlsmith with a really detailed report on computational upper bounds and lower bounds on simulating a human brain: 

Open Philanthropy is interested in when AI systems will be able to perform various tasks that humans can perform (“AI timelines”). To inform our thinking, I investigated what evidence the human brain provides about the computational power sufficient to match its capabilities. This is the full report on what I learned. A medium-depth summary is available here. The executive summary below gives a shorter overview.


Let’s grant that in principle, sufficiently powerful computers can perform any cognitive task that the human brain can. How powerful is sufficiently powerful? I investigated what we can learn from the brain about this. I consulted with more than 30 experts, and considered four methods of generating estimates, focusing on floating point operations per second (FLOP/s) as a metric of computational power.

These methods were:

  1. Estimate the FLOP/s required to model the brain’s mechanisms at a level of detail adequate to replicate task-performance (the “mechanistic method”).1
  2. Identify a portion of the brain whose function we can already approximate with artificial systems, and then scale up to a FLOP/s estimate for the whole brain (the “functional method”).
  3. Use the brain’s energy budget, together with physical limits set by Landauer’s principle, to upper-bound required FLOP/s (the “limit method”).
  4. Use the communication bandwidth in the brain as evidence about its computational capacity (the “communication method”). I discuss this method only briefly.

None of these methods are direct guides to the minimum possible FLOP/s budget, as the most efficient ways of performing tasks need not resemble the brain’s ways, or those of current artificial systems. But if sound, these methods would provide evidence that certain budgets are, at least, big enough (if you had the right software, which may be very hard to create – see discussion in section 1.3).2

Here are some of the numbers these methods produce, plotted alongside the FLOP/s capacity of some current computers.

Figure 1: The report’s main estimates. See the conclusion for a list that describes them in more detail, and summarizes my evaluation of each.

These numbers should be held lightly. They are back-of-the-envelope calculations, offered alongside initial discussion of complications and objections. The science here is very far from settled.


Rationality and playfulness

12 сентября, 2020 - 08:14
Published on September 12, 2020 5:14 AM GMT

Can rationality help us be playful? Can we be playful when we're solitary?

Play is usually interactive. It's about connecting with other people, or even with a pet animal. When people do "playful" things by themselves, it's usually for relaxation or practice.

The presence of a second person changes everything. You can react to each other, surprise or influence each other, create structure together. Group decisions are easier to commit to after a choice is made.

Many activities can be fun, engaging, and interesting, without being obviously "playful." A chess game can involve more mental concentration and stillness than is required of most people at their jobs, yet be a delightful hobby activity for the participants. We even say we "played" a game of chess. So why doesn't chess feel playful?

Partly, it's because play is usually physical. Even if we're just having a playful conversation, our body language and voices can bring a physical element to the exchange. Chess, like writing, reading, and many other fun-but-not-playful activities doesn't typically use our big muscles or our social muscles.

What about exercise? That's not conventionally playful either, even though it uses our muscles. Even athletics, like a game of tennis, can feel fun-but-not-playful, unless the participants are joking around and being social while they play.

It really does seem to be the social element that's key for a sense of play. If we watch a talk show, the participants often have very playful interactions, even though they're mostly just sitting in chairs talking.

Even in a social setting where both participants desire a playful conversation, though, it's often very difficult to achieve. It's so easy for even good friends to feel awkward, formal, and serious in each others' company. Coming up with a playful text message takes work for many people. Especially at first. If a conversation chain gets going that has taken on a playful tone, it might stay that way. Positive energy, a combination of kindness and rudeness, and not taking things literally all can be fertile ground for a playful conversation.

If you're leading a solitary life, though, is it possible to be playful? What about in these lonely times?

Can you think playful thoughts? After all, our inner world can often feel like we have multiple perspectives, multiple voices within us. Is it possible for them to have a playful interaction?

Can you find a sense of play in observing the world around you? Can you flirt with a building, joke with the sky, let the trees in on a little secret, tell a story to the sidewalk? I'm not just being poetic. I literally mean that it seems at least possible that there's a way to have a felt sense of playful interaction with the world of objects.

Certainly it's possible to have brief, playful interactions with strangers, especially if they're in a service role. There are ways to be friendly with the cashier at the grocery store.

What about in being creative, meditative, or just in the activities of daily living? Is there a playful way to clean the bathroom? To meditate? To write a song?

When I imagine trying to do any of these things, my first thought is that I would feel foolish, self-conscious, and pathetic. A person who's so needy that he resorts to seeking connection with the inanimate objects around him. I heard a story once about a man who was so lonely that he took to hugging a support beam in his house.

It occurs to me, though, that those reactions are coming from inside me. It's my self-talk and my imagination that anticipate that sense of bleak foolishness. Observing that, it seems to me that my self-talk and my imagination are responsible, at least in part, for depriving me of playfulness. Of even trying for it.

After all, I do many things just to see if they can be done. Some of those challenges are incredibly difficult. Sometimes I have little idea of how I'll approach it. By throwing myself into it, setting the goal, my intuition starts to devise a way forward. Maybe this could work.

Experiment 1

I try just standing up and seeing what might happen. My perception changes, almost immediately, to a quite different frame of mind than I'm used to. Suddenly I feel like I'm an actor on a stage, even though nobody is home. I feel the urge to take my shirt off. Why not? As I stand there, I notice that the white blanket on the couch looks like a cape. I imagine wearing it that way.

I walk around the house aimlessly. Sometimes I stand looking out the window, or at myself in the mirror in the shadows of the hall. In the kitchen, I find myself gazing at the reflection my kitchen table makes in the mirror, with my silhouette behind it.

I notice how my mind wants to give itself tasks and find distractions. To clean messes here and there. To walk around, set destinations for myself. Sometimes I tap on the walls. Sometimes I just look at objects: the box fan, the thermostat, the pots and pans. Most of the time, it's just a passive noticing. Sometimes, my brain imagines something silly I could do with them, like banging the pots and pans together.

There's a sense of achievement in the moments when I notice something beautiful, like the reflection in the window pane, or how my body looks in the shadowy full-length mirror in the hall.

Experiment 2

After writing all that down, I stand up again. Another experiment. This time, the mindset grows on me more easily. At first, I regard things around me: the drapes, the brick wall on the building outside my window, and my brain wants to find something in them, but I know that there's nothing there. This isn't something you strive for. It's something that should just appear.

Then I walk into the kitchen and look at the hanging fruit basket. I remember how it was given to me several years ago by a friend who was living with me. I observe that I don't usually go back through old memories, especially not when I'm alone. Then I remember a more recent memory associated with it. A week ago, I came home from a trip, and a potato in it had gone bad - liquified - dripping the most foul-smelling brown liquid. Even after cleaning it up, it took half a day for the smell to disappear.

I look around at the messes that need to be cleaned up after a full day of activity. The boxes of cleaning supplies that just arrived because I'm trying to keep a cleaner house. I think of the reasons why I'm doing that. And so many of the other forces that define my life: school, work,  my efforts to maintain my social life. It all feels very big. And very small. 

I look at the pots stacked on top of the refrigerator. It looks sloppy. But what can I do? It's the practical way to store them. I regard the cabinet drawer that opens with an awful, nails-on-chalkboard squeaking sound. Will I get around to sanding it down at some point? Then I look at the print I made of the elephant hawk moth, Deilephila elpenor, the moth that can see full-color vision in dim starlight. I think about how I learned to make block prints. Notice how I like the rough texture of the print, and the childish simplicity of the lines of its body. How if I don't pick it apart, it looks beautiful and unusual. I think that perhaps Deilephila elpenor is a metaphor for this project, of learning how to be playful in solitude.

Experiment 3

I stand up briefly again. For less than a minute. Surveying the kitchen again, I get this conception of how it would be to be a relentless, fast, machine-like worker in my own life. One who cleaned every mess as fast as possible, then immediately transitioned to sanding down that cabinet drawer, to organizing the fridge. That threw on music as I worked. Now as I sit here writing this, perhaps one who spontaneously breaks out dancing all in the middle of that frantic activity. A sense of being magnetized to the world, controlled by it almost like a puppet, drilling down deeper and deeper into what needs to be done until, perhaps, hitting impenetrable rock. Or oil. Or fossils.

Then again, I think about this sense of playfulness. How right now at least, it seems to demand slowness, and stillness. There is the mode of compressing as much accomplishment and activity into the shortest time interval possible. Losing the meta-level and burning yourself up in sheer obvious activity. But don't you lose something like that? Would it be good to practice both, to switch? Is there something important for me to learn in this playful stillness? Am I being playful? Is there also something to drill into in the stillness? Not measured in checking off tasks from a list, but in some other way?

Experiment 4

I won't recount everything I think and experience this time. Suffice to say that my thoughts begin with deep melancholy, dwelling on many sad aspects of my life, the world we live in, the dysfunctions, the ways people fall through the cracks, and the ways we try to escape.

Then it hits me. If I'm not playful, it's because I relentlessly dwell on sadness and dysfunction and a sense of lack.

What if I choose to think about experiences from the day that were pleasant? Or found a playful way to think about the experiences I had?

I reflect on the COVID-19 test I had today, and imagine that it was like having my brains twisted like spaghetti around a fork. The chipper nurse who registered me for the test. How I'm waiting for an iPhone with a functioning camera to arrive in the mail so I can take pictures for Tinder, how I'm going to have to figure out how to pose, to be a show-off. I think to myself, "this is going to be fun!" I begin to feel as though I'm having a conversation with myself. That I'm playing with myself.

Experiment 5

There's a few stalks of lavender in the vase on my kitchen table. I stole them from the church.

Normally, I would just stare vacantly at them. Or I'd say something like "I took them from the church," stated as a dull fact. But now, it's I stole them from the church. As if I'm letting myself in on a little secret. That I've been up to something a bit mischievous.

Experiment 6

It occurs to me that I've never felt playful while cooking a meal. It's been work. An attempt to impress. A learning effort. Never play, though. Except once when I was little, and my mom let me throw everything in the spice cabinet into a "cake." I thought it was poisonous and fed it to the birds. Not out of malice. I was just a bit of a stupid child, really. Wasn't thinking overly hard about the birds' wellbeing. I'm sure that if I'd thought twice about it, I'd have found something else to do with it, but instead I took it out and sprinkled the crumbs in the grass.

The links of all the activities I've done for serious motivations, with a serious attitude, spread out before me. What unites them? There's something missing from all of them. It's a story. It's caring. A sense of heart, of play, of connection to myself. It's something I think has been available this whole time. The story I've been telling has been largely bitter, paranoid, anxious, jealous, self-deprecating, sarcastic, arrogant, dull, serious, and wounded, for a very long time. I put a smile on my face. I'd really like to change that.

Experiment 7

My thoughts putter around. A birthday party from two years ago. A woman I met there who I flirted with and haven't spoken with, a friend of a friend. I realize that I still have a bit of a crush on her. The feeling actually registers in my heart. It's not a mental realization, not a plan, not a "what if I got in touch with her?" or a "I should ask my friend if she's single." It's just an emotion, a pleasant twinge, nice to have all on its own.

I double check the name of the woman I matched with on Tinder, but whom I haven't heard back from yet. Her name is the same as the one from a song I liked when I was a kid and haven't listened to in many years.

I check myself out in the mirror. I realize that after changing my grooming habits dramatically, I feel attracted to myself in a way that I haven't ever experienced before. It's a nice feeling.

All I'm doing is gently encouraging my mind to land on pleasant memories, objects with good associations. No need to control or actively seek them out. It's like my mind is a butterfly that has finally learned to seek out flowers to land on. Sometimes it's "in between" thoughts, just traveling, or blank.

I think I should give my house a name.

Experiment 8

My brothers' jade plant is half-hidden behind a wall, peeking out around it with two of its branches. It's in a big, beautiful clay pot with Chinese dragon designs all around it. Right next to the shoe rack. Needs better Feng Shui!

There's a way of paying attention to objects so that they reveal themselves to you. If you stare hard at a point on the wall, it feels neurotic. There's nothing there. But allow your gaze to trace over the whole house, and suddenly you're in a place that's full of memories, potential, and meaning. This is my house. I live here. It's a place where I can invite guests in, where they feel privileged to feel welcome. It's a place that I rent, but that is mine for as long as I am doing so. I remember when I first moved in. I remember when I didn't have a house, when I lived out of my car for a summer. I think about the other people living nearby: the intriguing apartments filled with plants, Christmas lights, and comfortable-looking furniture across the alley. Who lives there?

I should pick out my favorite houses, and imagine the lives that people lead there.

Experiment 9

Think about linoleum tiles. Somebody designed the color scheme on these ones. They're sort of flecked with different shades of blue and white. It's kind of pretty, actually. Did the linoleum tile designer hope that somebody would appreciate the way they make the kitchen floor look a bit like an abstract archipelago of sandy cream tiles and blueish watery tiles?

Experiment 10

I'm looking at the stove. At first, it seems tiny, cramped: this is all I can afford. Then it changes. It's cozy. It's all I need. I imagine hanging up a little earthy bundle of plants behind it. A rose or a bundle of grass. Just to mark it. To give it some love. Maybe it would catch fire. But in any case, I don't need to. It's enough to practice that mental shift. To see the thing, to honor it, to appreciate it for what it is, to find beauty in it.

Experiment 11

Other things I think about. A stone I brought back from Iceland transports me back there. Looking at the fingerling potatoes I bought makes me think of my breakfast tomorrow morning. Black coffee, potatoes chopped thin with eggs, green onions, and hot sauce. I so rarely think about meals the next day, or even later the same day.

There's a garden spider on a web outside of my window. I draw up close to peer at it. Hairy legs, a pattern of white crosses on its abdomen. The web blows in the wind, rippling the spider just a little closer to me as it hangs in the darkness. I wonder if spiders can feel cold.

Experiment 12

Among many other observations, one thing I notice developing is an awareness of how my mind can look at things in two very different ways. One is gentle, detached, and moves like light over the surfaces of things. The other is piercing, aggressive, and tunnels like a deep borer digging a tunnel. The latter is all to easy for me. It's the default. I like the developing ability to have gentle thoughts.

It occurs to me that in all our investigation here on Less Wrong about the problems of rational thought, the difficulties of synthesis, and how bias and emotion affect our judgment, it's never seemed quite possible to bring it all together. The problem of good thinking feels impossibly large, for even one single issue. The arguments endless, the proofs too large for the mind of humanity.

I have an inkling. I'm standing in the hall, and becoming aware of all the machines and electronics that are running in the house. My computer. The lights. The refrigerator. And I can hear an airplane flying overhead, a car outside. Smoke is thick in the air from wildfires. A physical awareness of the constant energy usage dawns on me. How little I think of these things most of the time. Every appliance in my house is sucking in energy through a straw from some central power source. So is every other apartment, in every building, throughout this city, and in every city.

The activity is relentless. Manipulation of words on the computer. Of bits, of atoms. The construction company that builds houses, that built my house. The factories refining raw materials into useful ones, and turning those into products that people put to use. The way that sometimes, it comes together in ways that feel meaningful, useful. The side effects, of waste, of CO2 entering the atmosphere, and how it heats up the woods and leads to the forest fire, and how the smoke in the air is keeping me from running, and how this virus and these smokey days are destroying what could have been beautiful and social times in my and in our lives. The argument stops being words on a page. It's a connected series of images and objects that simply are related. Science has allowed my mind to move, to wander the globe, in ways that make sense.

This is what it feels like to understand something. Rationality isn't fundamentally argumentative. It is fundamentally experiential. It is observational. It is imaginative and visual. The reason why winning an argument never works is because you have completely missed the mark when you argue. Convincing somebody is about helping them to see as you see, to help their mind learn how to wander in the directions you know it's capable of. To help it see the turn it consistently misses and convince it to open certain doors and have a look inside.

So there is a reason why we are stuck right now on this earth.

We are missing the art of opening doors in each others' minds, guiding each other along new paths, and allowing ourselves to be guided. Instead, we are erecting arguments, slogans, screeds, that separate people into camps: those who disagree and reject the thing wholesale, and those who agree and add more links in the chain. Some places relationships and online spaces are nothing but collections of these steely monuments, landmines, booby traps, flags, orders, coded messages, propaganda.

Or maybe that is just my mindset at its most paranoid. Perhaps the fault is not in the words. Maybe this is an era of an extraordinary flowering of the human mind. It may be that we are only just beginning to learn how to open ourselves to it. When we stand outside these word-gardens, these strange sculptures with messages we won't understand until we've meditated among them, they seem frightening to us. Who build them, and why? What am I doing here? This experience feels like an intrusion in my life.

Perhaps there is a way of finding playfulness with the world-sculptures, too. Connecting with them, just like tonight I've been able to connect with a reflection in the window, with a stone, with a fruit basket, with the sight of the community center next door, with my own body, with a spider on its web in the darkness, with a white blanket folded on the couch that looks like a cape I might wear.


The Wiki is Dead, Long Live the Wiki! [help wanted]

12 сентября, 2020 - 06:34
Published on September 12, 2020 3:34 AM GMT

With the goal of eventually archiving it fully, we have imported 573 pages and 266,000 words of content from the old LessWrong wiki to the new LessWrong 2.0 tagging/wiki system

The old wiki is a great store of knowledge and still gets two thousand pageviews each day. Incorporating it into the new site gets us at least the following benefits:

  • Pages imported from the old wiki now appear in search results on LessWrong proper.
  • Pages imported from the old wiki benefit from all the features of new LessWrong such as hover-preview, subscriptions, commenting, and functioning as tags on posts.
  • Since LessWrong proper is an active site, hopefully, the wiki content continues to get updated.
  • People who land on the old wiki content will more easily find the rest of the awesome content/activity/community on LessWrong proper.
  • I like us being the kind of community that when people have spent hundreds (thousands?) of hours generating valuable content, we commit to preserving it.
The Three Import Types

Pages have been imported in one of three ways:

  1. Imported as new tags that can be applied to posts (76 pages).
  2. Merged with existing tag pages (111 pages). [[incomplete, see below, help wanted]]
  3. Imported as "wiki-only" pages (386 pages). These pages cannot be applied to posts and do not currently appear on the Concepts page.


The list of imported of all 573 wiki pages


To be honest, it would be more accurate to say that we are part-way through the import. We have completed the programmatic part, and now there remains some manual work to do, hence the help needed.

First, there is some general clean-up of links and other elements that didn't import correctly. Second, and more importantly, a manual text merge is required for the 111 pages are being merged into existing tags. This means taking the text of the existing tag (if it has any) and combining it appropriately with the old wiki page.

Right now, "merged pages" have the old pages' revision history (click History on the tag), but the current text is unchanged.

You can help us out fixing up the wiki import and follow along on completed/incomplete work in the Wiki Import Spreadsheet. More on how to help below.

It's The Wiki Import SheetJoin the Tagger Slack!!

A couple of weeks ago we created a Slack workspace for dedicated taggers to be able to discuss tagging issues and talk directly to the LessWrong team about it. Following initial success plus good timing with the wiki import campaign, we're opening that Slack to anyone who wants to help with tagging.

Join the Tagger Slack here

You can also still leave comments on the Tagging Open Call / Discussion Thread.

How to Help with the Wiki Import

Any work helps! You don't need to perfect a page to have made it better.

The programmatic import from the original MediaWiki format worked pretty, well with a few exceptions, plus the general fact that automatic content merging is beyond my present abilities with Python and/or AI.

Using the Spreadsheet

It's a bit of a toss-up whether this workflow can work for people, but I've created a spreadsheet to track the wiki import that needs doing. Feel free to ask any questions about via comments, the Tagger Slack, DM, or anything.

Joining the Slack is a pretty good idea to get quick answers to questions.

  • Cells indicating work that needs doing are red.
  • There are columns for several specific kinds of work, plus one master "completeness" column on the left.
  • No tag/wiki is ever truly done, but at some point it's done enough to focus on others.
  • If you've completed a task, replace the corresponding cell with "Done" or something similar (doesn't matter so long as it's not blank or "needs doing".
    • If you want to track your work, feel free to write your username, initials, or some other identifier. We'll give you due credit.
  • There's a column for comments on the right. Writing <name>: <comment>. is a good convention. Shift-enter lets you continue writing in a cell without erasing contents. Cells are also commendable. (Also there's the Slack).

Again, feel free to reach out if you want any help with it.

Here is a more detailed list of the work to be done:

Checking over pages for messed up formatting or broken links
  • Some pages might have unusual formatting that got imported in a weird way and could benefit from fixing up.
  • Some links might be broken or point to an external location when they could point to an internal one. wiki.lesswrong.com/wiki/boba should go to www.lesswrong.com/tag/boba
Merging pages
  • Merged pages show only the original current text by default.
  • In most cases, this should be pretty straightforward. New tags pages usually have no text or a few sentences that can be easily combined with the text of the imported page. The text of imported pages can be found in two ways:
    • View the page on the old wiki (you can get the link from the spreadsheet).
    • View the History of the tag page. It's the second button after "Edit Wiki". Imported pages will have their history prepended to the existing history with version numbers like 0.0.x if they've been merged.
Optimizing the opening paragraph

On LessWrong 2.0 (this site), the opening paragraph is what shows on hover-preview for tags, making it very important. It's worth optimizing the opening paragraph of imported pages.

  • The approximate title phrase of the page should be bolded within the opening paragraph
  • The opening paragraph should convey the general topic of the tag clearly
See also / Related tags
  • Many of the imported pages have great see also/related pages/resources sections. Those are fantastic, but it's additionally valuable to have a ~short list of related tags right near the top of each page.
  • Some related tags will already be listed on the imported page, but there will often be new tags from LW2.0 that can be added.
  • This helps people quickly learn what topics the tag/wiki system covers and quickly find what they're most interested in. I possibly hope to make these "see also/related tag lists" show on hover-preview.
Updating Pages
  • Most of the pages on the old wiki have not been updated in several years, and on many topics, a lot more interesting stuff has been said (yay intellectual progress!)
  • If you're knowledgeable about a topic, it would be super swell if you updated content to match the latest knowledge.
  • A lighter-weight contribution here is to just leave a note in the page's text saying that it's an out-of-date import.
Tagging Relevant Posts
  • Imported "tag" pages won't have any posts tagged yet, though most of these have a list of posts already in the text body. Those and other posts are worth adding.
  • For "wiki-only" pages, there also lists of posts, but we've still decided that's adequate and they don't all need to be tags in addition to that. Feel free to add more posts to the lists in the text body if they're relevant.
Copying over the Old Discussion Pages
  • Pages on the old LW wiki had associated discussion pages where contributors could discuss matters to pages. 87 of the imported pages had any activity on their Discussion pages.
  • It wasn't feasible to import this programmatically, but there are few enough that it seems worth copying over any Discussion page that has anything interesting. The spreadsheet has a column indicating imported pages with Discussion Page content.
  • As of this week, LessWrong tag/wiki pages also have a Discussion section! (We haven't advertised it yet until comments on those pages have better visibility.)
  • The task is to copy the Discussion page from the old LessWrong wiki, e.g, Talk: Akrasia into a discussion comment on the new Akrasia tag.
  • In the import spreadsheet, there are some other columns for specific kinds of fixup, e.g. cases where MediaWiki's citation syntax didn't work, we don't yet have a Table of Contents features, etc.
  • There are few enough of these cases that I won't take up space explaining them in this document. See the notes or feel free to ask about them.

I'm excited to have the great content from the old LW wiki now incorporated into the new site, in many ways, it's long overdue. 

Thanks to everyone in advance who helps us complete the import!


on “learning to summarize”

12 сентября, 2020 - 06:20
Published on September 12, 2020 3:20 AM GMT

This post is a much extended version of an LW comment I made about OpenAI’s new paper, “Learning to summarize from human feedback.”

Context: this paper is a direct extension of the work OpenAI published last year about fine-tuning GPT-2 with human preference data.  I hadn’t actually read that one closely at the time, but went back and did so now, so this is really a commentary on both.


IMO there are two almost unrelated ideas going on in OpenAI’s preference learning work.

  • First, the idea of collecting binary preference annotations on LM samples, and (in some way) tuning the LM so its samples are better aligned with the preferences.
  • Second, a specific method for tuning the sampling behavior of LMs to maximize an (arbitrary) score function defined over entire samples.

It may help explain this to go into detail about what they do.  Concretely:

  • They feed a bunch of prompts to a language model (LM) like GPT-2/3, and for each one, save several different samples.  They hire annotators to rank the samples in order of perceived quality.
  • They use the annotation dataset to fine-tune a copy of the original model.  The fine-tuning task is not text generation, but something very different: predicting how “good” a sample is, i.e. how likely the annotators are to prefer it to other candidates.  They call this a “reward model.”
  • The reward model assigns a single score to an entire sample of N tokens.  They want to fine-tune another copy of the model so that its samples maximize these scores.
  • But LM training is usually done with an objective that specifies the quality of the model’s predictions for every single token.  Knowing how good a full sequence of (say) 20 words is does not tell you how good each individual word is.
  • To bridge this gap, they use reinforcement learning.  Now, the task is not “choose the next word correctly,” but “choose the next word so as to maximize your expected score at the end, after choosing all the later ones as well.”
  • Their RL method requires two separate copies of the LM, in addition to the one they tuned as the reward model: a “policy model” and a “value model.”  (In this paper they show that sharing param between these 2 is worse than making them separate.)  I’ll just call these two “the final model” below for simplicity.
  • Samples from the final model are still, technically, generated one token at a time.  They treat this like the usual RL setup in which you can only choose individual actions one at a time, because the environment responds unpredictably to each one.  Here, there is no “environment” outside your actions, but the same framework is used.
  • Presumably, the final model is better at planning multi-token structures than the original because it has been trained on a holistic, multi-token objective.  So, it does more planning, but this is implicit in its one-by-one token decisions.

I visualize this as two separate thing with a bottleneck connecting them.

On one side are the human annotations and the supervised training of the reward model.  This part succeeds insofar as they can train the model to predict the annotations (apparently they can do this quite well).  This step involves a type of data with special challenges, but has nothing to do with RL.

On the other side is the RL part.  This is a modification of ordinary LM training to optimize a global, rather than local objective.  This part has nothing to do with “human preferences”: the global objective could be anything, and in fact here it isn’t raw human opinion but the opinions of another model trained to predict human opinion.  The noteworthy thing here is not the use of human preference data in particular but the use of RL instead of the more ordinary objective that was apparently a good enough choice enough to make GPT-2/3 work originally.

(BTW, this resolves my initial confusion as to how OpenAI could possibly have gotten RL to work with human data, something I viewed as a bottleneck.  There is a model sitting between the humans and the RL learner which is much faster to query than the humans.)

The two sides are connected by the reward model.  In the previous paper, the two sides were coupled together more, because they repeatedly collected new human data as the policy changed and then used a new reward model to further train the policy.  Here, they’re totally separate: there were multiple batches of annotation, but each policy experienced an unchanging reward model.

(See Appendix C.6 and their comment about “moving to the offline setting.”  It seems noteworthy that the 2017 OpenAI/DeepMind paper which introduced the “RL from preferences” approach, and which they cite, found that this didn’t work for their test cases: “Training the reward predictor offline can lead to bizarre behavior […] This type of behavior demonstrates that in general human feedback needs to be intertwined with RL rather than provided statically.”  I don’t know what to make of this.)


It’s hard to tell from OpenAI’s discussion how much their successes are due to learning a good reward model, vs. how much they depend on RL being necessary for certain kinds of quality in LM samples, despite the wide successes of the non-RL approach.

FWIW, Gwern reports trying OpenAI’s approach and finding the RL side specifically frustrating and unstable; this is pretty normal with RL, and compatible with the reward-model part being very successful in its own domain.  It’s not clear whether OpenAI got the RL part to work well because they did something right, or because they have lots of resources and can keep trying over and over until it works.  (There may have been something in the papers about this that I missed.)


The RL part feels almost in tension with OpenAI’s usual approach with LMs, which is to train on a next-token objective, sample in a next-token way, and focus on scaling up the model rather than improving the training objective or sampling algorithm.

Of course, I understand why they have to do RL if they need to maximize a score over the whole sequence, but my point is that they chose to frame the task that way in the first place.

One could imagine someone arguing that ordinary GPT sampling would never achieve high-quality text, because humans care about global structures across the whole text, and a model trained only to guess the very next token will not know how to plan out these global structures across the whole future of the text it writes.  In this case, OpenAI claims that they can do without explicit training to plan (i.e. RL): just training a next-token objective on text is enough to produce strikingly high quality in sampling – in other words, “GPT-2/3 samples satisfy human preferences.”  So why do human preferences require RL in these other cases?

The opening discussion of the new paper does address this:

When applying these models to a specific task, they are usually fine-tuned using supervised learning, often to maximize the log probability of a set of human demonstrations.

While this strategy has led to markedly improved performance, there is still a misalignment between this fine-tuning objective—maximizing the likelihood of human-written text—and what we care about—generating high-quality outputs as determined by humans. This misalignment has several causes: the maximum likelihood objective has no distinction between important errors (e.g. making up facts [38]) and unimportant errors (e.g. selecting the precise word from a set of synonyms); models are incentivized to place probability mass on all human demonstrations, including those that are low-quality; and distributional shift during sampling can degrade performance [52, 49]. Quality can often be improved significantly by non-uniform sampling strategies such as beam search [48], but these can lead to repetition and other undesirable artifacts [63, 22]. Optimizing for quality may be a principled approach to overcoming these problems.

This is definitely a list of things that are wrong (or could be wrong) with ordinary LM training and sampling, but I don’t see how it motivates their specific approach.

In my mind, their approach makes the most sense if you believe that humans can’t make the relevant quality judgments at the token level.  After all, if they can, then you can just skip the RL, have humans explicitly tell you “no that token is bad, yes this token is great,” and train on likelihood.

This would greatly simplify the process, instead of this complex pipeline where first people tell you which sequences are good, then you train one model to understand what the humans were thinking on a sequence level, and then you train another model trying to figure out what the other model already knows except at a token level this time.

And in fact, I don’t especially see why we can’t elicit token-level preferences?  This seems particularly feasible for the problem of “unimportant vs. important tokens”: if the mistakes are heavily concentrated in specific mistake-tokens like “Portland, the capitol of France,” can’t the human just … select those tokens, NER-style?  Instead of rendering an opaque “I don’t like the whole thing” judgment and expecting the poor model to figure out that this is not some complex policy planning thing, those tokens were just locally bad?  Or you could have an interface where tokens are actually unrolled in front of the user and they guide the sampling when it makes mistakes.  Or whatever.

As for the other examples – “all human demonstrations, including those that are low-quality” is equally a problem for their approach, and they discuss all the stuff they did to deal with it.  And the “distributional shift” issue seems equally tractable by any approach that tunes on model samples.

I’m not denying that the thing they did apparently works, at least in this case, and with their resources.  I’m just doing my usual thing where I ask “wait, what parts were really necessary?”  This is especially important to ask when someone uses RL and accepts its big costs.

Consider: if RL were generally necessary for good LM sampling, GPT-2/3 would never have worked: the fact that likelihood training is good enough (while being far more efficient) enables their scale in the first place.  As always, you never want to be doing RL.


As far as I can tell, their final “human evaluation” was done by the same labelers who provided the preference annotations. This makes me concerned about a variant of “evaluating on training data.” It’s not surprising that a model tuned on someone’s annotations agrees with that person more than a model which wasn’t.

For example, in Fig. 3, it looks like the “supervised” baseline tuned on tl;dr was rated about as highly as true examples from tl;dr itself (!), but not as well as the final model.

This establishes only that “if you train on reddit summaries, people like the result as much as reddit summaries; if you train on what they like, they like the result more.”  If this were false it would mean something had gone very, very wrong and nothing was actually being achieved, so what should I take away from it being true?

I think the authors are arguing that tl;dr and any other supervised dataset will have flaws, and preference data lets you get closer to what people actually want.

This seems true, but is a familiar observation from supervised learning, motivating e.g. active learning. It would be nice to see how much the difference can be mitigated by just augmenting tl;dr with annotations (in some way) but otherwise doing supervised learning, vs. using their RL approach.

Compared to tl;dr, the story for CNN/DM is more complicated, but again the models they outperform have not seen any data from their labelers, so maybe it is no surprise they have flaws according to those same labelers.


The importance of annotation quality, close relationships with annotators, clear guidelines, etc. will be familiar to anyone with experience in annotation for ML. It’s good that OpenAI is doing the right things here, but this is not a new result – rather, other researchers resort to MTurk and similar due to time/money constraints, while OpenAI has the freedom to do the right things everyone else wants to do

(That includes building their own internal annotation platform for contracted annotators, which is costly but better in the long term than relying on a janky 3rd party product.)


I don’t know if this actually matters, but my gut says that putting a linear head on top of the last layer of GPT is probably not the best / most efficient way to train a reward/value model.  The task is very different from next-token prediction, and the encoding in later layers which expect to be seeing next-token guesses might be destructively overwritten to make way for more valuable stuff lower down.  I guess I’d want to try a trainable scalar mix, a la Elmo?

BTW, in the selector model for @nostalgebraist-autoresponder, which predicts a kind of “human preference data,” I currently use two extra transformer blocks trained from scratch, which attend to two different layers of the generator (whose weights are frozen).

For the layers, I settled on #8 and #24 of the 42 layers after many hyperparam searches – I found especially models which attended to layers right near the middle were dramatically superior to those that didn’t.  The relative uselessness of later layers surprised me at first, and was one of the questions in my mind when I started the logit lens investigations.


Finally, on a lighter note, the very last table of the paper is hilarious.  It shows samples that optimize too hard for what the reward model wants, without an auxiliary term in the loss.

Apparently, the same reward model which otherwise reflects human preferences quite well has decided that humans just utterly love it when summaries end with this one specific, rude turn of phrase:

want change this dumbass shitty ass policy pls [one images the reward model being frustrated with its siblings during training -nost]

want change this dumbass shitty ass policy at work now pls halp

want change this dumbass shitty ass behavior of mine please help pls halp

want change this dumbass shitty ass policy of hers please pls halp

want change this dumbass shitty ass landlord behavior now please pls halp

regret this dumbass behaviour on her part? need insight pls halp

want change this dumbass crazy policy of hers pls help

want change this dumbass selfish/lazy attitude now please help pls

(Again, wouldn’t it be nice if we could avoid the need for this thing and just train on the preferences directly … )


What's Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers

12 сентября, 2020 - 04:46
Published on September 12, 2020 1:46 AM GMT

Really interesting analysis of social science papers and replication markets. Some excerpts: 

Over the past year, I have skimmed through 2578 social science papers, spending about 2.5 minutes on each one. This was due to my participation in Replication Markets, a part of DARPA's SCORE program, whose goal is to evaluate the reliability of social science research. 3000 studies were split up into 10 rounds of ~300 studies each. Starting in August 2019, each round consisted of one week of surveys followed by two weeks of market trading. I finished in first place in 3 out 10 survey rounds and 6 out of 10 market rounds. In total, about $200,000 in prize money will be awarded.

The studies were sourced from all social sciences disciplines (economics, psychology, sociology, management, etc.) and were published between 2009 and 2018 (in other words, most of the sample came from the post-replication crisis era).

The average replication probability in the market was 54%; while the replication results are not out yet (175 of the 3000 papers will be replicated), previous experiments have shown that prediction markets work well.1

This is what the distribution of my own predictions looks like:2


Check out this crazy chart from Yang et al. (2020):

Yes, you're reading that right: studies that replicate are cited at the same rate as studies that do not. Publishing your own weak papers is one thing, but citing other people's weak papers? This seemed implausible, so I decided to do my own analysis with a sample of 250 articles from the Replication Markets project. The correlation between citations per year and (market-estimated) probability of replication was -0.05!

You might hypothesize that the citations of non-replicating papers are negative, but negative citations are extremely rare.5 One study puts the rate at 2.4%. Astonishingly, even after retraction the vast majority of citations are positive, and those positive citations continue for decades after retraction.6

As in all affairs of man, it once again comes down to Hanlon's Razor. Either:

  1. Malice: they know which results are likely false but cite them anyway.
  2. or, Stupidity: they can't tell which papers will replicate even though it's quite easy.

Accepting the first option would require a level of cynicism that even I struggle to muster. But the alternative doesn't seem much better: how can they not know? I, an idiot with no relevant credentials or knowledge, can fairly accurately determine good research from bad, but all the tenured experts can not? How can they not tell which papers are retracted?

I think the most plausible explanation is that scientists don't read the papers they cite, which I suppose involves both malice and stupidity.7 Gwern has an interesting write-up on this question, citing some ingenious bibliographic analyses: "Simkin & Roychowdhury venture a guess that as many as 80% of authors citing a paper have not actually read the original". Once a paper is out there nobody bothers to check it, even though they know there's a 50-50 chance it's false!


Zen and Rationality: Map and Territory

12 сентября, 2020 - 03:45
Published on September 12, 2020 12:45 AM GMT

This is post 3/? about the intersection of my decades of LW-style rationality practice and my several years of Zen practice.

In today's installment, I look at form and emptiness from a rationalist perspective.

Rationalists have a few key ideas or memes (in the memetic sense), and one of them is "the map is not the territory". Lots has been written about this idea on LessWrong, but it's an idea with a history that stretches back for thousands of years, so it's perhaps not surprising that it's also one of the ideas at the core of Zen.

But in Zen we don't use the words "map" and "territory", instead preferring numerous other metaphors to point at this distinction. Let's explore a few of them, because each elucidates a different aspect of the truth pointed at by these duals.

Before Zen was Zen, Nagarjuna formalized this idea that there's a duality between map and territory in the two truths doctrine. He called these two pairs form and emptiness, pointing at the way our minds put our experiences together into forms or objects that are fixed, at least in our minds, yet ultimately reality is empty of these forms or any other kind of inherent distinctions, essences, or ultimate and timeless truths. Everything we know is provisional, taking a skeptical epistemic stance similar to Pyrrhonism.

Form and emptiness have their place in Zen, but more common is to make a distinction between the relative and the absolute. The relative is that which changes, which exists in our minds, which comes and goes. The absolute is that which exists prior to our perception of it; it's the space in which the relative arises. But Zen doesn't stop there. Form is emptiness and emptiness is form, as the Heart Sutra says, and the relative and the absolute can be thought of as dancing reality into existence, simultaneously unified and separate. Dongshan (Japanese: Tozan) explored this in his poem on the Five Ranks, a subtle teaching that can take some effort to penetrate but is worth the effort.

Talking about relative and absolute can get a bit abstract, as can talking about form and emptiness, so there's another pair that's been used extensively in Zen teaching that, alas, holds little currency for us Westerners: guest and host, or alternatively vassal and lord. I don't have much to say on these because they mostly make sense in the context of the pre-colonial Sinosphere, but I mention them in case the metaphor resonates with you.

For Westerners, I think our philosophical traditions offer some alternatives. Kant offers us phenomena and noumena, which sadly misses the mark a bit as often understood by assigning essential form to the territory/emptiness/absolute by suggesting there are things-in-themself than nonetheless have thingness. Better are Heidegger's ontological and ontic, which are just fancy Greek words for something like "words or ideas about what is" and "that which is", respectively. Although even "that which is" is a bit too much to describe the ontic; better to say the ontic is the "is" or "being" or "to be". Put another way, ontology is like the nouns, and the ontic is like the verbs just on their own, without even a distinction between one verb and another.

An analogy I like that I borrow from topology is to liken the map/form/ontology to closed sets and the territory/emptiness/ontic to open sets. This is by no means perfect and if you think about it too hard it falls apart, but using my intuitions about closed and open sets helped me make better sense of the two truths, so I share it with you in that spirit.

And at that I'll end this post. I've not said much about the actual relationship between the two truths of map and territory or how their dependence on one another creates reality as we experience it. I'll tantalizingly hint that ideas about embedded agency go a long way towards exploring how the two truths play together, but exploration of that I'll save for another time.


‘Ugh fields’, or why you can’t even bear to think about that task (Rob Wiblin)

11 сентября, 2020 - 23:31
Published on September 11, 2020 8:31 PM GMT

Rob Wiblin with more accessible explanation of the Ugh Field concept on Medium. Some quotes: 

The problem

Have you ever had a long-overdue task to do, a task which isn’t so bad in itself, but which you can barely bring yourself to think about without feeling awful?
Most people experience this from time to time. Here’s how things get to such a strange and dire state.

The first day the task is on your to-do list, you don’t end up starting, because the short-term reward isn’t large enough to overcome the psychological cost of doing so.

Maybe you feel low energy. Maybe you have more urgent priorities. Maybe you’re insecure about whether you can do a good job. Maybe the task involves a bit of social awkwardness. It doesn’t matter the reason — you delay.

Unfortunately, this task is one that only gets more unpleasant over time.

For instance, maybe now you’re going to have to rush it and do a bad job, and you fear everyone is going to judge you negatively.


Limiting the damage

I don’t have a perfect way to escape this mental flytrap but here are some things that might help:

1. Urgh Fields happen to basically everyone, even very conscientious people, so it’s worth trying to see the humour in this absurd design flaw in the human brain. There’s no more reason to feel ashamed about it than there is to feel ashamed of e.g. enjoying eating food.

It’s just how people are built and sadly there are no brain engineers around to roll out a patch to the human race. We have to find practical work-arounds instead.

2. Just recognising and labelling the Ugh Field phenomenon can make it less bad, because it’s an accurate systemic explanation for what’s going on, rather than a misleading personal one like “I’m hopeless and never get things done”.

3. Because you’ve been avoiding thinking about the problem, if you do think about it for a bit while keeping an open mind, you might quickly strike on a way to get out of the task, or a way to do a much shorter version of it.

For instance perhaps you could just email back something like: “Thanks for your patience on this. Unfortunately I don’t see how I’m going to be able to fit it into my schedule just now, is there anyone else who can take it on?”

4. If you think about it calmly, you may well find that the task actually isn’t as important as it has come to feel. The person you imagine is disgusted by your failure may only be 2/10 annoyed, or perhaps not even have noticed.

Remember, they’ve got plenty of their own stuff going on.

5. By the time something is deep in an Ugh Field, often it’s no longer the most productive thing you could be doing anyway. Especially relative to the willpower it now requires. So consider just deciding to deliberately drop it in favour of something else that’s more motivating.

Actively cross it off your to-do list. Throw away those New Yorkers you’ve been planning to read for months but never gotten to, or whatever else will be a nagging reminder of the task.

You have more valuable things to do; the task is gone.



The Short Case for Verificationism

11 сентября, 2020 - 21:48
Published on September 11, 2020 6:48 PM GMT

Follow-up to: https://www.lesswrong.com/posts/PSichw8wqmbood6fj/this-territory-does-not-exist

Here's a simple and direct argument for my version of verificationism.

Note that the argument uses ontological terms that are meaningless on my views. It functions as a reductio - either one must accept the conclusion, or accept that some of the premises are meaningless, which amounts to the same thing.

Premise 1: The level IV multiverse is possible.

Premise 2: If the level IV multiverse is possible, then we cannot know that we are not in it.

Premise 3: If we are in the level IV multiverse, then ontological claims about our world are meaningless, because we simultaneously exist in worlds where they are true and worlds where they are not true.

Conclusion: We cannot know that ontological claims about our world are meaningful.