Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 7 минут назад

Credit Cards for Giving

5 января, 2020 - 02:00
Published on January 4, 2020 11:00 PM UTC

The standard advice for how to physically make a donation is something like: if you're donating less than ~$1,000 use a credit card, otherwise use a check or other method with lower fees. For example, GiveWell writes: We recommend that gifts up to $1,000 be made online by credit card. If you are giving more than $1,000, please consider one of these alternatives: Check, Bank Transfer, ...

And the Against Malaria Foundation writes:

However you prefer to make a donation is fine.
All other things being equal, we prefer:
  • For donations less than US$5,000, an online donation using credit or debit card
  • For donations more than US$5,000, an offline donation - by bank transfer or by mail (cheque/check) - to eliminate fees.

This makes sense: if the charity is paying 2.25% on your donation, that's $25 on a $1,000 donation, and $125 on a $5,000 one.

The ideal, though, would to be able to donate with a credit card, get credit card rewards, but still have the charity get the full amount. I wouldn't expect this to exist, but it turns out that it does! There are platforms (ex: Facebook) which will cover the credit card fees for charities, so that if you donate $1,000 the charity receives the full $1,000. Not only that, but your purchase is eligible for regular credit card rewards, so 1-3% savings over sending a check.

The main downside of this approach is that you generally can't direct your donation. You can make a donation to the Malaria Consortium, but not to their Seasonal Malaria Chemoprevention program that GiveWell recommends. Likewise, you can donate to the Centre for Effective Altruism but not a specific EA fund.

Again, however, there is an exception: the EA Giving Tuesday team has been coordinating with EA charities to create FB fundraisers for specific programs. At least for 2019 these were only open for a couple weeks leading up to Giving Tuesday, but during that window you could use your credit card for no-fee donations to specific programs at EA charities. Potentially the EA Giving Tuesday folks should advertise this as something to use their fundraisers for, and keep them open a bit longer in December?

There could also be other opportunities available on short notice. For example, we could find out in November 2020 that there's something like the 1% PayPal Giving Fund match that was offered in 2017 and 2018. Because these can appear on short notice, I think it's worth trying to have sufficient credit across multiple cards that you could potentially run your annual donations all through your open credit cards on a single day.

The two main ways to increase your available credit are opening new credit cards and asking for credit limit increases on your existing cards. For limit increases there's generally a link in the online banking UI like "Request a Credit Limit Increase". Typically it will bring up a form with fields for annual income and monthly mortgage/rent:

I went through my cards yesterday and asked for 100% limit increases on each. Here's what happened:

Citi Double Cash (2%) Got +45%, immediately Capital One Quicksilver (1.5%) Got +6% immediately Barclays jetBlue (~1.5% equivalent) Got +100%, by email the next day Alliant Cashback (2.5%) Couldn't find a link for requesting an increase; I should probably call them.

In 2019 I was able to run about 80% of our donations through credit cards, which was worth about $3k in rewards.

If your goal is to get rich, donating money, even with tax deductions and credit card rewards, is not going to help. But for money you're donating anyway, ~2% savings is worth optimizing a bit for.

Comment via: facebook


Less Wrong Poetry Corner: Walter Raleigh's "The Lie"

5 января, 2020 - 01:22
Published on January 4, 2020 10:22 PM UTC

Followup to: Rationalist Poetry Fans, Unite!, Act of Charity

This is my favorite poem about revealing information about deception! It goes like this (sources: Wikipedia, Poetry Foundation, Bartleby)—

Go, Soul, the body's guest,
Upon a thankless arrant:
Fear not to touch the best;
The truth shall be thy warrant:
Go, since I needs must die,
And give the world the lie.

Say to the court, it glows
And shines like rotten wood;
Say to the church, it shows
What's good, and doth no good:
If church and court reply,
Then give them both the lie.

Tell potentates, they live
Acting by others' action;
Not loved unless they give,
Not strong, but by a faction:
If potentates reply,
Give potentates the lie.

Tell men of high condition,
That manage the estate,
Their purpose is ambition,
Their practice only hate:
And if they once reply,
Then give them all the lie.

Tell them that brave it most,
They beg for more by spending,
Who, in their greatest cost,
Seek nothing but commending:
And if they make reply,
Then give them all the lie.

Tell zeal it wants devotion;
Tell love it is but lust;
Tell time it is but motion;
Tell flesh it is but dust:
And wish them not reply,
For thou must give the lie.

Tell age it daily wasteth;
Tell honour how it alters;
Tell beauty how she blasteth;
Tell favour how it falters:
And as they shall reply,
Give every one the lie.

Tell wit how much it wrangles
In tickle points of niceness;
Tell wisdom she entangles
Herself in over-wiseness:
And when they do reply,
Straight give them both the lie.

Tell physic of her boldness;
Tell skill it is pretension;
Tell charity of coldness;
Tell law it is contention:
And as they do reply,
So give them still the lie.

Tell fortune of her blindness;
Tell nature of decay;
Tell friendship of unkindness;
Tell justice of delay;
And if they will reply,
Then give them all the lie.

Tell arts they have no soundness,
But vary by esteeming;
Tell schools they want profoundness,
And stand too much on seeming:
If arts and schools reply,
Give arts and schools the lie.

Tell faith it's fled the city;
Tell how the country erreth;
Tell, manhood shakes off pity;
Tell, virtue least preferreth:
And if they do reply,
Spare not to give the lie.

So when thou hast, as I
Commanded thee, done blabbing,—
Although to give the lie
Deserves no less than stabbing,—
Stab at thee he that will,
No stab the soul can kill.

The English is a bit dated; Walter Raleigh (probably) wrote it in 1592 (probably). "Give the lie" here is an expression meaning "accuse them of lying" (not "tell them this specific lie", as modern readers not familiar with the expression might interpret it).

The speaker is telling his soul to go to all of Society's respected institutions and reveal that the stories they tell about themselves are false: the court's shining standard of Justice is really about as shiny as a decaying stump; the chruch teaches what's good but doesn't do any good; kings think they're so powerful and mighty, but are really just the disposable figurehead of a coalition; &c. (I'm not totally sure exactly what all of the stanzas mean because of the dated language, but I feel OK about this.)

The speaker realizes this campaign is kind of suicidal ("Go, since I needs must die") and will probably result in getting stabbed. That's why he's telling his soul to do it, because—ha-ha!—immaterial souls can't be stabbed!

What about you, dear reader? Have you given any thought to revealing information about deception?!


Dec 2019 gwern.net newsletter

4 января, 2020 - 23:48
Published on January 4, 2020 8:48 PM UTC


Is there a previous LW discussion on ergodicity, like the difference of time-average vs. ensemble-average and its effect when working with expected values?

4 января, 2020 - 13:07
Published on January 4, 2020 10:07 AM UTC

I've recently stumbled across a blog-post explaining ergodicity, and if I gathered correctly, Taleb's book "Skin in the Game" is also explaining this (I haven't read that book).

It seems to be rather important. Some claim (don't know if correctly) that both decision theory and social and economic science has, or had quite a few errors because of not correctly dealing with non-ergodicity.

While I had some small bit of intuitive understanding of this (for instance, when fumbling around personal finances), I hadn't seen a clear "LW treaty" about it. Maybe I just forgot it already, but a short site-search didn't come up with something sensible.

Anyone having some pointers?


Underappreciated points about utility functions (of both sorts)

4 января, 2020 - 10:27
Published on January 4, 2020 7:27 AM UTC

In this post I'd basically like to collect some underappreciated points about utility functions that I've made in the comments of various places but which I thought were collecting into a proper, easily-referenceable post. The first part will review the different things referred to by the term "utility function", review how they work, and explain the difference between them. The second part will explain why -- contrary to widespread opinion on this website -- decision-theoretic utility functions really do need to be bounded.

(It's also worth noting that as a consequence, a number of the decision-theoretic "paradoxes" discussed on this site simply are not problems since they rely on unbounded decision-theoretic utility. An example is the original Pascal's Mugging (yes, I realize that term has since been applied to a bunch of things that have nothing to do with unbounded utility, but I mean the original problem).)

Anyway. Let's get on with it.

Part 1: "Utility function" refers to two different things that are often conflated and you should be sure you know which one you're talking about

The term "utility function" refers to two significantly different, but somewhat related, things, which are, due to the terminological and conceptual overlap, often conflated. This results in a lot of confusion. So, I want to cover the distinction here.

The two things called utility functions are:

  1. A function that describes the preferences of a given agent, assuming that agent's preferences satisfy certain rationality conditions, and does not depend on any particular ethical theory, but rather is useful in decision theory and game theory more generally. If I need to be unambiguous, I'll call this a decision-theoretic utility function.
  2. A function, used specifically in utilitarianism, that describes something like a person's preferences or happiness or something -- it's never been clearly defined, and different versions of utilitarianism suggest different ideas of what it should look like; these are then somehow aggregated over all people into an overall (decision-theoretic!) utility function (which is treated as if it were the decision-theoretic utility function describing what an ideal moral agent would do, rather than the preferences of any particular agent). If I need to be unambiguous, I'll call this an E-utility function.

(There's actually a third thing sometimes called a "utility function", which also gets confused with these other two, but this is a rarer and IMO less important usage; I'll get back to this in a bit.)

It's important to note that much discussion online conflates all of these and yields nonsense as a result. If you see someone talking nonsense about utility functions, before replying, it's worth asking -- are they mixing together different definitions of "utility function"?

So. Let's examine these in a bit more detail.

Decision-theoretic utility functions and their assumptions

Decision-theoretic utility functions describe the preferences of any consequentialist agent satisfying certain rationality conditions; by "it describes the agent's preferences", I mean that, given a choice between two options, the one yielding the higher expected utility is the one the agent chooses.

It's not obvious in advance that a rational agent's preferences need to be described by a utility function, but there are theorems guaranteeing this; Savage's theorem probably provides the best foundation for this, although the VNM theorem may be a little more familiar. (We'll discuss the difference between these two in more detail in the quick note below and discuss it further in the second part of this post.) Note that these functions are not entirely unique -- see below. Also note that these are conditions of rationality under uncertainty.

Again, a decision-theoretic utility function simply describes an agent's preferences. It has nothing to do with any particular idea of morality, such as utilitarianism. Although you could say -- as I've said above -- that it assumes a consequentialist agent, who cares only about consequences. So, any rational consequentialist agent has a decision-theoretic utility function; but only a utilitarian would admit the existence of E-utility functions. (While this doesn't exactly bear on the point here, it is worth noting that utilitarianism is a specific type of consequentialism and not identical with it!)

Note that real people will not actually obey the required rationality assumptions and thus will not actually have decision-theoretic utility functions; nonetheless, idealized rational agents, and therefore decision-theoretic utility functions are a useful abstraction for a number of purposes.

Decision-theoretic utility functions are usually stated as taking values in the real numbers, but they're only defined up to positive affine transformations (scaling by a positive constant, and translation); applying such a transformation to a utility function for an agent will yield another, equally-valid utility function. As such they may be better thought of not as taking values in R, exactly, but rather a sort of ordered 1-dimensional affine space over R. Outputs of a decision-theoretic utility function are not individually meaningful; in order to get meaningful numbers, with concrete meaning about the agent's preferences, one must take ratios of utilities, (a-b)/|c-d|. (Note the absolute value in the denominator but not the numerator, due to the importance of order.)

Decision-theoretic utility functions really need to be bounded -- a point seriously underappreciated on this website -- but I'll save discussion of that for the second part of this post.

A quick tangential note on probability and additivity

This is pretty tangential to the point of this post, but it's probably worth taking the time here to explain what the difference is between the Savage and VNM formalisms. (Well, one of two differences; the other will be discussed in the second part of this post, but as we'll see it's actually not such a difference.) The main difference is that the VNM theorem assumes that we already believe in the idea of probability -- it justifies decision-theoretic utility, but it does nothing to justify probability, it just assumes it. Savage's theorem, by contrast, provides a foundation for both probability and decision-theoretic utility simultaneously, based on just on rationality axioms about preferences, which is why I think it's the better foundation.

However, the probability measure it constructs need not actually be a probability measure as such, as it need only be finitely additive rather than countably additive. It's not clear what to make of this. Maybe countable additivity of probability just isn't necessary for a rational agent? It's hard to say. (If I'm not mistaken, the limiting probabilities of MIRI's logical inductor are merely (the analogue of) finitely additive, not countably additive, but I could be wrong about that...) But this is really off the point, so I'm just going to raise the question and then move on; I just wanted to mention it to ward off nitpicks on this point. As we'll see below, the choice of formalism doesn't actually matter much to my point here.

E-utility functions and their assumptions

This is the older meaning of the term if I'm not mistaken, but there is mostly not a lot to say about these because they're fairly ill-defined. They are, as mentioned above, specifically a utilitarian notion (not a general consequentialist notion). How to define these, as well as how to aggregate them, remain disputed.

Utilitarians say that one should try to maximize the expected value of the aggregated utility function, which means that the aggregated function is actually a weird sort of decision-theoretic utility function (corresponding to an ideal moral agent rather than any particular agent), not an E-utility function. One does not attempt to maximize expected value of E-utility functions.

One thing we can say about E-utility functions is that while only idealized rational agents have decision-theoretic utility functions and not real people, real people are supposed to have E-utility functions. Or at least so I gather, or else I don't see how utilitarianism makes sense?

Actually, one could say that it is not only utilitarians who rely on these -- there is also the notion of prioritarianism; one sometimes sees the term "aggregative consequentialism" to cover both of these (as well as other potential variants). But, because E-utility functions are so ill-defined, there is, as best I can tell, not really any meaningful distinction between the two. For example, consider a utilitarian theory that assigns to each agent p a real-valued E-utility function U_p, and aggregates them by summing. Let's suppose further that each U_p takes values in the nonnegative reals; then if we change the aggregation rule to summing the square roots of the U_p, we have changed our utilitarian theory into a prioritarian one. Except, instead of doing that, we could define U'_p = sqrt(U_p), and call U'_p the E-utilities; because there's no precise definition of E-utilities, there's nothing stopping us from doing this. But then the utilitarian theory described by the U'_p, describes exactly the same theory as the prioritarian theory described by the U_p! The theory could equally well be described as "utilitarian" or "prioritarian"; for this reason, unless one puts further restrictions on E-utility functions, I do not consider there to be any meaningful difference between the two.

As such, throughout this post I simply say "utilitarianism" rather than "aggregative consequentialism"; but if I'm wrong in identifying the two, well, whenever I say "utilitarianism" I really kind of mean "aggregative consequentialism". Hope that's OK.

Preference utilitarianism and Harsanyi's theorem (using decision-theoretic utility functions as E-utility functions)

Above I've made a point of emphasizing that decision-theoretic utility and E-utility functions are different things. But could there be cases where it makes sense to use one as the other? Specifically, to use decision-theoretic utility functions as E-utility functions? (The reverse clearly doesn't make much sense.)

Well, yes, that's basically what preference utilitarianism is! OK, precise formulations of preference utilitarianism may vary, but the idea is to use people's preferences as E-utility functions; and how are you going to encode people's preferences if not with decision-theoretic utility functions? (OK, this may only really work for a population of idealized agents, but it's still worth thinking about.)

Indeed, we can go further and formalize this with Harsanyi's theorem, which gives a series of moral assumptions (note: among them is that the agents in the population do indeed have decision-theoretic utility functions!) under which morality does indeed come down to maximizing a sort of aggregate of the population's decision-theoretic utility functions.

(Note that it also assumes that the population is fixed, which arguably assumes away a lot of the hard parts of utilitarianism, but it's still a useful starting point.)

But, what is this aggregation? If we think of the agent's utility functions as taking values in R, as they're usually thought of, then the aggregation consists of summing a utility function for each agent. But which one? As mentioned above, utility functions are only unique up to positive affine transformations. Harsanyi's theorem provides no guidance on which utility function to use for each agent -- how could it? They're all equally valid. And yet using different ones can yield very different (and meaningfully different) aggregated results, essentially letting you adjust weightings between agents! Except there's no meaningful notion of "equal weighting" to use as baseline. It's something of a problem.

(This is often discussed in terms of "weights", coefficients put in front of the utility functions; but I think this obscures the fundamental issue, in making it sound like there's a meaningful notion of "equal weights" when there really isn't.)

Still, despite these holes, preference utilitarianism and Harsanyi's theorem are definitely worth thinking about.

Brief note on that third sort of utility function

Finally, before we get to the second part of this post, I wanted to mention that third thing sometimes called a "utility function".

The term "utility function" is sometimes used for a real-valued function that describe's an agents deterministic preferences; i.e., if A and B are two options, and U the utility function, then the agent prefers A to B if and only if U(A) > U(B). Note the lack of any requirement here about expected value! This is a weaker sense than a decision-theoretic utility function as I described it above; any decision-theoretic utility function is one of these, but not vice-versa.

While you're occasionally encounter this, it's frankly a useless and even counterproductive notion. Why? Because fundamentally, it's the wrong abstraction for the situation. If uncertainty isn't coming into play, and you're only applying deterministic rationality constraints, then the right structure for describing an agent's preferences is a total preorder. Why would you introduce real numbers? That just restricts what you can express! Not every total preorder will embed in the real numbers. So, there isn't any sensible set of rationality conditions that will lead to this notion of utility function; they'll lead you instead to the idea of a total preorder, and then oops maybe that total preorder will fail to embed in R and the agent won't have a "utility function" in this sense.

Such a function is of course only unique up to order-preserving functions on R, meaning it's not very unique at all (one more sign of it being the wrong abstraction).

Why were such functions ever even used, when they're clearly the wrong abstraction? I think basically it's because a lot of people lack familiarity with mathematical structures, or how to build an abstraction to suit a set of requirements, and instead tend to just immediately reach for the real numbers as a familiar setting to put things in. (Honestly, that's probably why decision-theoretic utility functions were initially defined as R-valued as well; fortunately, in that case, it turns out to be the correct choice! The real numbers can indeed be quite useful...)

Of course, as discussed above, if your agent not only obeys requirements of deterministic rationality, but also requirements of rationality under uncertainty, then in fact they'll have a decision-theoretic utility function, taking values in R, and so will have one of these. So in that sense the assumption of taking these values in R is harmless. But still...

Part 2: Yes, decision-theoretic utility functions really do need to be bounded

OK. Now for the main point: Contrary to widespread opinion on this site, decision-theoretic utility functions really do need to be bounded.

First, I'm going to discuss this in terms of Savage's theorem. I realize this is the less familiar formalism for many here, but I think it's the better one; if you're not familiar with it I recommend reading my post on it. I'll discuss the point in terms of the more familiar VNM formalism shortly.

OK. So under Savage's formalism, well, Savage's theorem tells us that (under Savage's rationality constraints) decision-theoretic utility functions must be bounded. Um, OK, hm, that's not a very helpful way of putting it, is it? Let's break this down some more.

There's one specific axiom that guarantees the boundedness of utility functions: Savage's axiom P7. Maybe we don't need axiom P7? Is P7 really an important rationality constraint? It seems intuitive enough, like a constraint any rational agent should obey -- what rational agent could possibly violate it? -- but maybe we can do without it?

Let's hold that thought and switch tracks to the VNM formalism instead. I mean -- why all this discussion of Savage at all? Maybe we prefer the VNM formalism. That doesn't guarantee that utility functions are bounded, right?

Indeed, as usually expressed, the VNM formalism doesn't guarantee that utility functions are bounded... except the usual VNM formalism doesn't actually prove that utility functions do everything we want!

The point of a decision-theoretic utility function is that it describes the agent's preferences under uncertainty; given two gambles A and B, the one with the higher expected utility (according to the function) is the one the agent prefers.

Except, the VNM theorem doesn't actually prove this for arbitrary gambles! It only proves it for gambles with finitely many possible outcomes. What if we're comparing two gambles and one of them has infinitely many possible outcomes? This is something utility functions are often used for on this site, and a case I think we really do need to handle -- I mean, anything could potentially have infinitely many possible outcomes, couldn't it?

Well, in this case, the VNM theorem by itself provides absolutely no guarantee that higher expected utility actually describes the agent's preference! Our utility function might simply not work -- might simply fail to correctly describe the agent's preference -- once gambles with infintely many outcomes are involved!

Hm. How troublesome. OK, let's take another look at Savage and his axiom P7. What happens if we toss that out? There's no longer anything guaranteeing that utility functions are bounded. But also, there's no longer anything guaranteeing that the utility function works when comparing gambles with infinitely many outcomes!

Sounds familiar, doesn't it? Just like with VNM. If you don't mind a utility function that might fail to correctly describe your agent's preferences once infinite gambles get involved, then sure, utility functions can be unbounded. But, well, that's really not something we can accept -- we do need to be able to handle such cases; or at least, such cases are often discussed on this site. Which means bounded utility functions. There's not really any way around it.

And if you're still skeptical of Savage, well, this all has an analogue in the VNM formalism too -- you can add additional conditions to guarantee that the utility function continues to work even when dealing with infinite gambles, but you end up proving in addition that the utility function is bounded. I'm not so familiar with this, so I'll just point to this old comment by AlexMennen for that...

Anyway, point is, it doesn't really matter which formalism you use -- either you accept that utility functions are bounded, or you give up on the idea that utility functions produce meaningful results in the face of infinite gambles, and, as I've already said, the second of these is not acceptable.

Really, the basic reason should go through regardless of the particular formalism; you can't have both unbounded utility functions, and meaningful expected utility comparisons for infinite gambles, because, while the details will depend on the particular formalism, you can get contradictions by considering St. Petersburg-like scenarios. For instance, in Savage's formalism, you could set up two St. Petersburg-like gambles A and B such that the agent necessarily prefers A to B but also necessarily is indifferent between them; forcing the conclusion that in fact the agent's utility function must have been bounded, preventing this setup.

I'd like to note here a consequence of this I already noted in the intro -- a number of the decision-theoretic "paradoxes" discussed on this site simply are not problems since they rely on unbounded decision-theoretic utility. An example is the original Pascal's Mugging; yes, I realize that term has since been applied to a bunch of things that have nothing to do with unbounded utility, but the original problem, the one Yudkowsky was actually concerned with, crucially does.

And I mean, it's often been noted before that these paradoxes go away if bounded utility is assumed, but the point I want to make is stronger -- that the only reason these "paradoxes" seem to come up at all is because contradictory assumptions are being made! That utility functions can be unbounded, and that utility functions work for infinite gambles. One could say "utility functions have to be bounded", but from a different point of view, one could say "expected utility is meaningless for infinite gambles"; either of these would dissolve the problem, it's only insisting that neither of these are acceptable that causes the conflict. (Of course, the second really is unacceptable, but that's another matter.)

Does normalization solve the weighting problem? (I wouldn't bet on it)

One interesting note about bounded utility functions is that it suggests a solution to the weighting problem discussed above with Harsanyi's theorem; notionally, one could use boundedness to pick a canonical normalization -- e.g., choosing everyone's utility function to have infimum 0 and supremum 1. I say it suggests a solution rather than that it provides a solution, however, in that I've seen nothing to suggest that there's any reason one should actually do that other than it just seeming nice, which, well, is not really a very strong reason for this sort of thing. While I haven't thought too much about it, I'd bet someone can come up with an argument as to why this is actually a really bad idea.

(And, again, this would still leave the problem of population ethics, as well as many others, but still, in this idealized setting...)

Some (bad) arguments against boundedness

Finally, I want to take a moment here to discuss some arguments against boundedness that have come up here.

Eliezer Yudkowsky has argued against this (I can't find the particular comment at the moment, sorry) basically on the idea of that total utilitarianism in a universe that can contain arbitrarily many people requires unbounded utility functions. Which I suppose it does. But, to put it simply, if your ethical assumptions contradict the mathematics, it's not the mathematics that's wrong.

That's being a bit flip, though, so let's examine this in more detail to see just where the problem is.

Eliezer would point out that the utility function is not up for grabs. To which I can only say, yes, exactly -- except that this way of formulating it is slightly less than ideal. We should say instead, preferences are not up for grabs -- utility functions merely encode these, remember. But if we're stating idealized preferences (including a moral theory), then these idealized preferences had better be consistent -- and not literally just consistent, but obeying rationality axioms to avoid stupid stuff. Which, as already discussed above, means they'll correspond to a bounded utility function. So if your moral theory is given by an unbounded utility function, then it is not, in fact, a correct description of anyone's idealized preferences, no matter how much you insist it is, because you're saying that people's idealized (not real!) preferences are, essentially, inconsistent. (I mean, unless you claim that it's not supposed to be valid for infinite gambles, in which case it can I suppose be correct within its domain of applicability, but it won't be a complete description of your theory, which will need some other mechanism to cover those cases; in particular this means your theory will no longer be utilitarian, if that was a goal of yours, and so in particular will not be total-utilitarian.)

One could question whether the rationality constraints of Savage (or VNM, or whatever) really apply to an aggregated utility function -- above I claimed this should be treated as a decision-theoretic utility function, but is this claim correct? -- but I think we have to conclude that they do for the same reason that they apply to preferences of ideal agents, i.e., they're supposed to be a consistent set of preferences; an incosistent (or technically consistent but having obvious perversities) moral system is no good. (And one could imagine, in some idealized world, one's ethical theory being programmed as the preferences of an FAI, so...)

Basically the insistence on unbounded utility functions strikes me as, really, backwards reasoning -- the sort of thing that only makes sense if one starts with the idea of maximizing expected utility (and maybe not distinguishing too strongly between the two different things called "utility functions"), rather than if one starts from agents' actual preferences and the rationality constraints these must obey. If one remembers that utility functions are merely meant to describe preferences that obey rationality constraints, there's no reason you'd ever want them to be unbounded; the math rules this out. If one reasons backwards, however, and starts with the idea of utility functions, it seems like a harmless little variant (it isn't). So, I'd like to encourage everyone reading this to beware of this sort of backwards thinking, and to remember that the primary thing is agents' preferences, and that good rationality constraints are directly interpretable in terms of these. Whereas "the agent has a decision-theoretic utility function"... what does that mean, concretely? Why are there real numbers involved, where did those come from? These are a lot of very strong assumptions to be making with little reason! Of course, there are good reason to believe these strong-sounding claims, such as the use of real numbers specifically; but they make sense as conclusions, not assumptions.

Tangential note about other formalisms (or: I have an axe to grind, sorry)

One final tangential note: Eliezer Yudkowsky has occasionally claimed here that probability and decision-theoretic utility should be grounded not in Savage's theorem but rather in the complete class theorem (thus perhaps allowing unbounded utilities, despite the reasons above the particular formalism shouldn't matter?), but the arguments he has presented for this do not make any sense to me and as best I can tell contain a number of claims that are simply incorrect. Like, obviously, the complete class theorem cannot provide a foundation for probability, when it already assumes a notion of probability; I may be mistaken but it looks to me like it assumes a notion of decision-theoretic utility as well; and his claims about it requiring weaker assumptions than Savage's theorem are not only wrong but likely exactly backwards. Apologies for grinding this axe here, but given how this has come up here before I thought it was necessary. Anyway, see previous discussion on this point, not going to discuss it more here. (Again, sorry for that.)


Anyway I hope this has clarified the different things meant by the term "utility function", so you can avoid getting these mixed up in the future, and if you see confused discussion of them you can come in and de-confuse the issue.

...and yes, decision-theoretic utility functions really do need to be bounded.


Running and Optimizing

4 января, 2020 - 05:40
Published on January 4, 2020 2:40 AM UTC

Two years ago I decided I was going to start running between my house and the subway station each day on my commute, for a total of about five miles a week. Initially I was very interested in getting faster, and would time myself a couple days a week, trying to beat my previous best:

Unfortunately, after six months I was pushing myself too hard, and my knees started hurting. I stopped timing my runs and stopped running so hard, and they got better again. I'm still running, but at a gentler pace.

This experience was a good illustration of how optimizing for a metric can often bring you towards your overall goals for a while, but then continuing to optimize on it can start to bring you away from them again.


Meta-discussion from "Circling as Cousin to Rationality"

4 января, 2020 - 00:38
Published on January 3, 2020 9:38 PM UTC

A lot of the discussion on this post ended up being about LessWrong norms. I've moved that particular thread over to the comments here, and left a comment there pointing over here.


Illness anxiety disorder: how to become more rational?

4 января, 2020 - 00:16
Published on January 3, 2020 9:05 PM UTC

Disclaimer: When I think about dying I mostly think about cancer (I don't know much about other diseases). In the following, I'll be refering to "cancer" to point at deadly diseases where early detection is crucial.

[Epistemic status: thinking out loud; trying to debug my irrational mind.]

Situation I strongly believe that I have final stage cancer. When I touch my pectoral muscles, I can feel lumps. I have been suffering unbearable anxiety for the past one year and a haf or so, feeling that something very wrong is happening.

Medical Evidence One year ago, a cardiologist told me my heart was fine and didn't see anything from ultrasound. In the past two months, I checked two GPs. The first one said "it's probably just muscular". The second one (extremely talented, known as Dr. House) recommended additional ultrasound/computed tomography.

Internal Evidence It feels real. It hurts. It's cumbersome. It affects the left part of my chest and my left arm. Whenever I touch it I can feel that my left chest is of a very different consistency compared to my right one.

What happens I can't do any meaningful work. My mind is immersed in daily pain and the possibility of a final stage cancer. When I think about doing the additional tests, I'm TERRIFIED by the idea of having a doctor tell me "you have 4 months left". Yesterday, I managed to make an appointment for an ultrasound. However, I'm broke, and it turns out that the particular doctor is really expensive and has really bad reputation online (3/5 google reviews) so I ended up canceling it this morning. Also, I spend a lot of time in self-loathing, insulting myself for not having done any medical exams sooner, when the disease was "not yet deadly".

What I tell myself when I'm trying to be more rational I check the actual statistics of having some kind of cancer in my left chest. For males, breast cancer is 100 times less likely than females. I'm in my 20s, so the probability of having a cancer is smaller than for old people. When I look at actual symptoms of breast cancer, I have none. Most importantly, after checking actual cancer statistics, there's the chance of having a thing and still surviving (probability depends on the type of cancer and stage).

Also, I've been feeling pain for about one year and a half. So if it was something like a cancer, the actual pain would only appear at end, and wouldn't keep on going for a year. Plus, Dr. House not being impressed by my lumps is some evidence that they are not so impressive. And when I touch my right chest, I can also feel weird things when I search enough. The pain in the left chest could be something a psychosomatic phenomenon (at least to some degree).

What I actually tell myself I'm dead. The lumps are actually there. Look, when you press here you can feel this really strange thing. It's gross. It's getting worse and worse. It's very large. Oh my god it's a final stage cancer. I won't survive. I'm such a piece of shit for not having gone to a doctor. It's been over a year. Fuck. I should be doing more tests. But I'm broke. And the outcome of doing more tests will be one of the following:

1. The doctors are not able to identify my disease and I'll keep being in pain / anxious / dying.

2. They identify something and tell me "you have four months to live".

What I'm asking you I need help. I know most of what I think doesn't make sense, but it FEELS very real. It's like having some really strong internal evidence that something bad is happening. I know that doing more tests if the way to go. But because it takes 2-3 weeks to get an appointement for a good GP, so there's always the bias of trying to avoid to think about death and procrastinating the appointment. And there's this additional cost of having to borrow money to do any test to begin with, which is painful to think about.

In short It feels like I'm Pascal Mugging myself when thinking about death. But I'm also the complete opposite of a hypochodriac that would go see a ton of doctor because he is in distress. I FEAR going to a doctor because I'm afraid he will either not find anything or tell me I'm dead. It's unbearable, so I need to debug my brain.

What should I tell myself to become more rational?


Is cardio enough for longevity benefits of exercise?

3 января, 2020 - 22:57
Published on January 3, 2020 7:57 PM UTC

I guess the answer is that cardio alone is not optimal. But how non-optimal is it?


What are the best self-help book summaries you've read?

3 января, 2020 - 20:45
Published on January 3, 2020 5:45 PM UTC

There is an adage: "Every book should be a blog post."

I disagree with this adage as a general rule, but one specific context where it does seem fairly true is self-help books. Books in or close to the self-help domain seem reliably to be horribly padded, excessively anecdote-laden, and generally somewhat mawkish.

But they're so damn attractive, though. They promise so much, and some seem to have the capacity to be generally transformative to those best-suited to hearing their advice. There's a decent list of 10 or so self-help-ish books whose insights I'd genuinely like to have (in expectation), if I didn't have to wade through a self-help book to get them.

This combination of traits makes self-help books prime candidates for blogification. But any old summary post won't do; a lot of blog-post summaries of books manage to be just as badly written and excessively hype-y as the original while also being too short, vague or unconvincing to be helpful.

What we ideally want is some resource accumulating longer-form, high-quality (think Slate-Star-Codex-level) summaries of self-help books, from trustworthy authors who we can expect to apply some basic due-diligence to the claims being made.

Some day I may get around to co-ordinating a project like this (with enough interested parties we could cover a lot of books in a fairly short space of time) but in the meantime: what are some particularly good summaries of self-help (or self-help-adjacent) books you think more LessWrong readers should read?


CFAR Participant Handbook now available to all

3 января, 2020 - 18:43
Published on January 3, 2020 3:43 PM UTC

Google Drive PDF

Hey, guys—I wrote this, and CFAR has recently decided to make it publicly available. Much of it involved rewriting the original work of others, such as Anna Salamon, Kenzie Ashkie, Val Smith, Dan Keys, and other influential CFAR founders and staff, but the actual content was filtered through me as single author as part of getting everything into a consistent and coherent shape.

I have mild intentions to update it in the future with a handful of other new chapters that were on the list, but which didn't get written before CFAR let me go. Note that such updates will likely not be current-CFAR-approved, but will still derive directly from my understanding of the curriculum as former Curriculum Director.


What cognitive biases feel like from the inside

3 января, 2020 - 17:24
Published on January 3, 2020 2:24 PM UTC

Building on the recent SSC post Why Doctors Think They’re The Best...

Why this is better than how we usually talk about biases

Communication in abstracts is very hard. (See: Illusion of Transparency: Why No One Understands You) Therefore, it often fails. (See: Explainers Shoot High. Aim Low!) It is hard to even notice communication has failed. (See: Double Illusion of Transparency) Therefore it is hard to appreciate how rarely communication in abstracts actually succeeds.

Rationalists have noticed this. (Example) Scott Alexander uses a lot of concrete examples and that should be a major reason why he’s our best communicator. Eliezer’s Sequences work partly because he uses examples and even fiction to illustrate. But when the rest of us talk about rationality we still mostly talk in abstracts.

For example, this recent video was praised by many for being comparatively approachable. And it does do many things right, such as emphasize and repeat that evidence alone should not generate probabilities, but should only ever update prior probabilities. But it still spends more than half of its runtime displaying mathematical notation that no more than 3% of the population can even read. For the vast majority of people, only the example it uses can possibly “stick”. Yet the video uses its single example as no more than a means for getting to the abstract explanation.

This is a mistake. I believe a video with three to five vivid examples of how to apply Bayes’ Theorem, preferably funny or sexy ones, would leave a much more lasting impression on most people.

Our highly demanding style of communication correctly predicts that LessWrongians are, on average, much smarter, much more STEM-educated and much younger than the general population. You have to be that way to even be able to drink the Kool Aid! This makes us homogeneous, which is probably a big part of what makes LW feel tribal, which is emotionally satisfying. But it leaves most of the world with their bad decisions. We need to be Raising the Sanity Waterline and we can’t do that by continuing to communicate largely in abstracts.

The tables above show one way to do better that does the following.

  • It aims low - merely to help people notice the flaws in their thinking. It will not, and does not need to, enable readers to write scientific papers on the subject.
  • It reduces biases into mismatches between Inside View and Outside View. It lists concrete observations from both views and juxtaposes them.
  • These observations are written in a way that is hopefully general enough for most people to find they match their own experiences.
  • It trusts readers to infer from these juxtaposed observations their own understanding of the phenomena. After all, generalizing over particulars is much easier than integrating generalizations and applying them to particulars. The understanding gained this way will be imprecise, but it has the advantage of actually arriving inside the reader’s mind.
  • It is nearly jargon free; it only names the biases for the benefit of that small minority who might want to learn more.

What do you think about this? Should we communicate more concretely? If so, should we do it in this way or what would you do differently?

Would you like to correct these tables? Would you like to propose more analogous observations or other biases?

Thanks to Simon, miniBill and others for helping with the draft of this post.


Excitement vs childishness

3 января, 2020 - 16:47
Published on January 3, 2020 1:47 PM UTC

I've heard Robin Hanson and others make the argument that people will be biased towards the fast takeoff scenario because it's exciting to think about (or "sexy"). On the other hand, there is a bias towards disbelieving the fast-takeoff scenario because it's childish.

I think most of us can agree that both are indeed biases, i.e. should be assigned zero weight, because both are about attributes that don't correlate with what's true. So we have the excitement-bias and the childishness-bias. The question is, how do they compare?

It feels totally obvious to me that the childishness bias is far stronger. I see people signaling maturity all the time, and childishness seems to be extremely low status in the relevant circles. It's so low status that it's not uncommon to see people say things about dangers from AI that are accurately summarized as "even though I don't know anything about this topic, I will evaluate its legitimacy based on how many childish sounding arguments I hear, because obviously they are not the real concern and people who defend them have zero credibility". Even among those who provide other arguments, it seems more common than not that they also assume that the fast takeoff scenario is less likely a priori because it's childish. Conversely, I've never heard anyone imply that fast takeoff must be true, or even likely, because it's exciting. It seems to me that, in order to have a similar effect, the excitement bias would have to do some rather heavy lifting in a purely subconscious way, which doesn't seem very plausible.

Despite this, I've seen far more discussion about the excitement-bias than the childishness-bias. That seems wrong.

Disclaimer: I don't think this is a strong argument that fast takeoff is likely, and I also don't think that bias towards excitement is weak in general – just that it's weak among the relevant class of people.


Has anyone used TAPs to combat BFRBs?

3 января, 2020 - 14:54
Published on January 2, 2020 9:28 PM UTC

BFRB=body focused repetitive behavior such as skin picking or hair pulling. What were your TAPs? Did you experience remission, and how did you handle that?

I managed to stop my BFRB in a few months ago using a multi-dimensional strategy, but I’ve had a few close calls with recurrence and was wondering if anyone had experience using TAPs to combat these.


LW Tel Aviv: Global Poverty: An eye level view

3 января, 2020 - 11:01
Published on January 3, 2020 8:01 AM UTC

Ofir Reich, former Data Scientist at Center for Effective Global Action (CEGA), will be speaking on "Global Poverty, an eye level view."

We meet at Google, 12th floor 98 Yigal Alon Street, Tel Aviv.


What do people living in extreme poverty, who make up somewhere between one sixth and half of the world's population (depending on how you count), look like? What houses do they live in? What do they work in? What do they eat? How do they conduct themselves financially, socially and with their families? Why do they have so many children? Is their situation getting better or worse? And how is it actually that some countries are 100 times poorer than others? We'll have experiences and photos, statistics, economic analysis and modern research.


Predictive coding & depression

3 января, 2020 - 05:38
Published on January 3, 2020 2:38 AM UTC

Epistemic status: wild speculation.

This is a follow-up to the 2017 blog post Slate Star Codex: Toward a Predictive Theory of Depression. I think that post has fundamentally the right idea, but that it had some pieces of the puzzle that didn't quite fit satisfyingly.

Well I personally have no special knowledge about depression (happily!), but I have thought an awful lot about the how predictive coding, motivation, and emotions interact in the brain. So I think that I can flesh out Scott's basic story and make it more complete, coherent and plausible. Here goes!

Let's start with my cartoon of predictive coding as I see it:

For lots of details and discussion and caveats, see Predictive coding = RL + SL + Bayes + MPC.

In fact, don't bother reading on here until you've read that.


... done? OK, let's move on.

One more piece of background: In predictive coding, hypotheses carry a "precision" term on different aspects of the model. It basically specifies a confidence interval, outside of which prediction errors get flagged. So let's say you have the hypothesis:

Hypothesis: The thing I am looking at is a TV screen showing static.

This hypothesis...

  • ...assigns low precision on its predictions of visual inputs within the screen area (where there's static),
  • ...assigns high precision on its predictions of visual inputs within the thin black frame of the TV,
  • ...and assigns zero precision on its predictions of sensations coming from my foot.

The sensations from my foot are just not part of this hypothesis. Thus a separate hypothesis about my foot—say, that my foot is tapping the floor and will continue to do so—can be active simultaneously, and these two hypotheses won't conflict.

The proposed grand unified theory of depression in this post is the same as in Scott's post: Severe depression is when all the brain's hypotheses have anomalously low precision attached to all their predictions (a.k.a. anomalously wide confidence intervals.)

I'm now going to go through the symptoms of depression listed in Scott's article, plus a couple others he left out:


... Depressed people describe the world as gray, washed-out, losing its contrast.

This one is just like Scott says. Normally we understand an apple via a hypothesis like "This thing is a bright red apple". But in severe depression, the hypotheses are less precise: "This thing is probably an apple and it's vaguely reddish".

Psychomotor retardation

Again, this one is just like Scott says. In predictive coding theory, top-down predictions about proprioceptive inputs are exactly the same signals as (semi)low-level motor control commands.[1]

If a hypothesis makes predictions about proprioceptive inputs with zero precision (infinitely wide confidence intervals), then the muscles don't move at all, like the TV static + foot example above. If it makes predictions with the normal high precision, then the muscles move normally. Thus, if a hypothesis makes predictions with nonzero but anomalously low precision, as in depression, well, then the muscles move slowly.

Lack of motivation

Hypotheses make predictions, and assign precisions, about both sensory inputs (vision etc.) and internal inputs (satiety, pain, etc.); I don't think there's any mechanistic difference between those two types of predictions. (By the same token, even though I drew (b) and (c) as separate rows in the picture above, they're actually implemented by the same basic mechanism.)

In depression, the hypotheses' predictions about internal inputs have anomalously low precision, just like all their other predictions. Thus instead of a normal hypothesis like "I will eat and then I will feel full", when depressed you get stuck with the low-precision hypothesis "I will eat and then I will very slightly feel full."

"Hang on a second!" you say. "Why does the low-precision hypothesis say 'I will very slightly feel full', as opposed to 'Maybe I won't feel full, or maybe I will feel especially full!' The latter sounds more like a wide confidence interval, right?"

But I think this is a misunderstanding of how these things are encoded, and thus how the brain interprets these precisions. Remember the TV static + foot example above: If there's a hypothesis that the thing in my visual field is a TV, it assigns precision 0 to sensory inputs to my foot, and we interpret that as: "nothing in this hypothesis has anything to do with sensory inputs to my foot". Likewise, if a hypothesis assigns precision 0 to the satiety input signal, it means "nothing in this hypothesis has anything to do with me feeling full". And thus, nonzero but anomalously low precision on the satiety input basically means you're almost completely not expecting to feel full on account of the things in this hypothesis. (This is actually a lot like the motor control example above.)

Now look at process (e). I claim that process (e) does not change at all in depression: we still have the normal innate drives, and we still favor hypotheses which predict that those drives will be satisfied. But (e) votes much less strongly for the low-precision hypothesis ("I will eat and then I will very slightly feel full") than for the normal hypothesis ("I will eat and then I will feel full"). Now, I think that we universally have a bias towards inaction—don't take an action unless there's a good reason to. So in depression, the bias towards inaction often wins out over the feeble vote from process (e), and thus we don't bother to get up and eat.

Low self-confidence

All the predictive coding machinery operates basically the same way during (i) actual experience, (ii) memory recall, and (iii) imagination. Now, when we try to figure out, "Will I succeed at X?", we imagine/plan doing X, which entails searching through the space of hypotheses for one in which X successfully occurs at the end. In depression, none of the hypotheses will make a high-precision prediction that X will occur, because of course they don't make high-precision predictions of anything at all. We interpret that as "I cannot see any way that I will succeed; thus I will fail".

Why isn't it symmetric? None of the theories are making high-precision predictions of failure either, right? Well, I think that generically when people are planning out how to do something, they search through their hypothesis space for a hypothesis that has a specific indicator of success; they don't search for a hypothesis that lacks a specific indicator of failure. I just don't think you can run the neural hypothesis search algorithm in that opposite way.[2]

There's another consideration that points in the same direction. Everyone always lets their current mood leak into their memories and expectations: when you're happy, it's tricky to remember being sad, etc. I think this is just because memory leaves lots of gaps that we fill in with our current selves. Anyway, I hypothesize that this effect is even stronger when depressed: if you feel sad right now, and if none of your hypotheses contain a high-precision prediction of feeling happiness or any other emotion, well then there's no emotion to be found except sadness in your remembered past, or in your present, or in your imagined future. So, ask such a person whether they'll succeed, and they'll just try to imagine themselves feeling the joy of victory. They can't. Ask whether they'll fail, and they'll try to imagine themselves feeling miserable. Well, they can do that! They are feeling miserable feelings right now, and those feelings can flood into an imagined future scenario.

So, putting these two considerations together, a depressed person should have a very hard time imagining a specific course of events in which they accomplish anything in particular, and should have an equally hard time imagining themselves having proudly accomplished it. But they can easily imagine themselves not getting anything done, and continuing to feel the same misery that they feel right now. I think that's a recipe for low self-confidence.

Feelings of sadness, worthlessness, self-hatred, etc.

Everything about predictive coding happens in the cortex.[3] But when we talk about mood, we need to bring in the amygdala. My hypothesis is that the amygdala is functioning normally (i.e., according to specification) in depression. The problem lies in the signals it receives from the cortex.

Let's step back for a minute and talk about the interesting relationship between the cortex and amygdala. I discussed it a bit in Human instincts, symbol grounding, and the blank-slate neocortex. The amygdala is nominally responsible for emotions, yet figuring out what emotions to feel requires excruciatingly complex calculations that only the cortex is capable of. ("No, YOU were supposed to do the dishes, because remember I went shopping three times but you only vacuumed once and...").

I think the cortex-amygdala relationship for emotions is analogous to the cortex-muscle relationship for motor control: The cortex sends signals to the amygdala, which are "predictions" from the cortex's perspective and "commands" from the amygdala's perspective. Sometimes the commands are not obeyed, and then the cortex needs to learn a better model (and will do so with the help of processes (b-c)). In the motor case, an example would be my prediction that I will walk normally despite wearing clown shoes, but then I trip. In the amygdala case, an example would be my prediction that the shoes will feel comfortable, but actually they are really painful. The pain signal goes straight to the amygdala, and the amygdala responds by emitting a pain emotion, overriding the comfort emotion suggested by the cortex. The cortex then sees that it predicted the wrong emotion, and so process (c) tosses out the offending hypothesis, and the predictions will be better next time.

Back to depression. Again, outgoing signals from the cortex are synonymous with top-down predictions. So when all the top-down predictions have anomalously low precision, the cortex more-or-less stops sending any strong signals into the amygdala whatsoever.

Now what?

In principle, you could imagine two worlds. In one extreme world, the messages from the cortex to the amygdala only contain bad news—I've been insulted, that's disgusting, I'm in trouble, etc. Then, faced with silence from the cortex, the amygdala would flood us with joy. Everything is perfect!

In the opposite extreme world, the messages from the cortex to the amygdala only contain good news—I'm not being insulted right now, I am not disgusted right now, I have lots of friends, etc. Then, faced with silence from the cortex, the amygdala would flood us with misery. Everything must be terrible!

In the real world, the cortex presumably sends both good-news messages and bad-news messages to the amygdala. But unfortunately for those suffering depression, it seems that all things considered, "no news is bad news". Presumably the amygdala is especially expecting the cortex to send various good-news signals that say that our innate drives are being satisfied, that we have high status, that we are eating well, etc. In the absence of these signals, the amygdala responds by emitting all sorts of negative emotions.

Evolutionarily speaking, what is sadness for? In my view, sadness has an effect of causing us to abandon our plans when they're not working out, and it also has a social effect of signaling to others that we are in a bad, unsustainable situation and need help. Well, the feeling that our innate drives are not being satisfied, and will not be satisfied in the foreseeable future ... that sure seems like it ought to precipitate sadness if anything does, right?

Difficulty thinking and concentrating

Precision and attention are closely related. If you want to attend to the taste of the thing your eating, your brain modifies the active hypothesis to have impossibly high precision on the sensory inputs from your taste buds. The actual sensory input will then trigger a prediction error and bubble up to top-level attention.

When you string together a bunch of little hypotheses into an extended train of thought, attention has to be deftly steered around, to get the right information into the right places in working memory, and manipulate them the right way. It follows that if your brain can't deploy hypotheses with high-precision components, you will probably be unable to fully control your attention and think complex thoughts.


Beats me. I don't understand sleep!


I'm pretty pleased that everything seems to fit together, without too much special pleading! But again, this is wild speculation, I especially don't know anything about depression. I'm happy to hear feedback.

PS—In case you're wondering, I don't feel like writing this article gave me any insight into what kinds of things would cause depression, or cure it.

  1. I say "semi-low-level" because the commands get further processed by the cerebellum etc. ↩︎

  2. This has to do with neural algorithm implementation details that I won't get into. ↩︎

  3. As usual, "cortex" here is technically short for "predictive-world-model-building-system involving primarily the cortex, thalamus, and hippocampus." ↩︎


Becoming Unusually Truth-Oriented

3 января, 2020 - 04:27
Published on January 3, 2020 1:27 AM UTC

This is a post on "the basics" -- the simplest moment-to-moment attitudes one can take to orient toward truth, without any special calculations such as Fermi estimates or remembering priors to avoid base-rate neglect. At the same time, it's something almost everyone can fruitfully work on (I suspect), including myself.

Somewhat similar to track-back meditation.

MemoryTip of the Tongue

The central claim here is that there's a special art associated with what you do when something is "on the tip of your tongue" and you can't quite remember it. Most people have the skill to some extent, but, it can be sharpened to a fine point.

Improved memory helps you become truth-oriented in a fact-oriented, detail-oriented sense. It works against inaccuracy. It also works against misspeaking, and thus propagating falsehoods.

Remembering Dreams

I first explicitly noticed the effectiveness of this technique for remembering dreams. When I wake up, I often have only one significant memory from my dreams. However, when I focus on the memory, explicitly naming each detail I can recall, and gently waiting for more, I can often unfold the memory into far, far more than I initially thought I could remember.

  • Each detail you recall can open up more details.
  • There's something special about explicitly naming details. I might have a general sense that there was a portal in the sky that looked a certain way, but explicitly confirming in my head that it looked as if the sky were broken glass, but at the same time the portal was perfectly round, might bring back more memories.
    • Writing things down on paper is probably a good way of making sure you're explicitly confirming each detail, if you want to go that far.
  • It's also very important to sit with memories and give them time to bring something more. Sometimes there will be a rush of memories, with each new item bringing more and more. Other times, you'll be stuck. It's easy to fail at that step, assuming that no more is coming. In my experience, if you sit with the memories, avoid getting distracted, and gently ask for more, more will often come to you fairly soon. You'll surprise yourself with what you can remember.

Sometimes I don't even remember any images from the dream at all, but have a vague sense of the dream (excitement, peace, more complicated emotions). I can still sometimes recall much more if I explicitly describe the left-over feeling to myself in as much detail as possible, and sit with it patiently waiting for more.

Think of it as forming a better relationship with your memory. It's easier to wait patiently when you've had several experiences where it's paid off. Explicitly processing details of what you've remembered lets your memory know you're interested, helping to keep it engaged in searching for more (and, potentially, training it to retain more).

Eventually, if you're better calibrated, you won't have to wait 5 minutes trying fruitlessly if you really don't think you will remember. But in order to be well-calibrated about that, you have to try it sometimes.

Remembering Events

My claim is that this technique generalizes to any memory. Dreams might be a good practice case, especially if you don't have too many cognitively demanding distractions in the morning.

But you can try the same thing with anything. Someone I knew with especially good memory told me that he thought this was most of his skill; he might have started out with slightly above-average memory, at some point he started taking pride in his reputation for good memory. This prompted him to put effort into it, rehearsing memories much more than he otherwise would. People would then remark on his good memory, further reinforcing the behavior.

Conversations, and interactions with people generally, might make a good practice case. Many people already re-visit conversations mentally over and over (perhaps thinking of things they wish they'd said). You can treat these the same way as dreams, trying to recall as much detail as you can each time you think of them.

Of course, rehearsing certain memories again and again might not be a good thing. Watch whether you're worsening any mental problems such as depression. It may be good to couple this practice with staring into regrets and other emotionally balancing techniques, so that rehearsing memories is useful rather than intensifying emotional damage from those memories.

False Memories

Some studies about memory may give you pause.

  • First of all, there is evidence that people fabricate false memories. So, how can we trust recall? Maybe trying harder to recall something actually generates false memories.
  • Second, there's been some research suggesting that in some sense we "retrieve" memories (take them out of storage), and then "put them back"; and if the process is disrupted before we "put them back", we can be made to forget the memory. This suggests that memories might be altered every time they get touched, which would mean they'd last longer if we didn't think about them.

Unfortunately, forgetting is also a thing, so making memories last longer by avoiding them doesn't seem to be an option. Rehearsal is necessary for sharper memory.

Still, false memories seem like a significant concern. Memories just seem real. If false memories are really common and easy to create, what are we supposed to do about that?

I think the situation isn't really hopeless. I think most false memories are more like mistaken inferences. I might be sure I put my keys in my pants pocket, where I always put them. But then I might eventually recall that I put them somewhere else yesterday. What seemed like a memory was actually an inference.

As long as you're aware of this, I would expect that gently tugging on memories to recall more details would improve things rather than lead to more confabulation. I could be wrong, of course. This is a critical question in how good/important the overall practice is.

Gendlin's Focusing

There's an obvious similarity between what I'm describing and Gendlin's focusing. I similarly gently interact with a "felt sense" and try to name it, and iterate the process to get more detail. However, the "felt sense" is not especially located in my body the way it's described in Gendlin's focusing. It's possible that body sensations are actually involved at a subconscious level.

In any case, you may find the "gentle tugging" kind of stance useful for untangling emotions, not just recalling memories. Also, learning Gendlin's Focusing might help with memory and the other things I'm describing in this post?

Remembering Ideas

I tend to place a high value on remembering ideas. A forgotten idea is like a little death. I generally prefer the conversation norm of pausing if someone has forgotten an idea, possibly for a significant amount of time, so they can try and recover it. Ideas are important.

This habit gave me a lot of practice with tip-of-the-tongue type recollection and the "gentle tugging" technique. Practicing this stuff seems quite important for being able to do it when you need it. So I think giving yourself significant time to try and remember forgotten ideas is quite valuable if only as practice.

I think a similar sort of mental motion is involved in developing ideas, as well. Let's move on from the memory section...

Truth-Oriented ThinkingDeveloping Ideas

When you have an idea, you start with a kind of "pointer" -- a felt sense which says that there should be a think in a particular direction. You can unpack the pointer by explicitly naming things about it, checking for "fit" with the felt sense. The more you name, the easier it is to pull more details out.

Sometimes it turns out that the idea really doesn't make any sense at all; the things with the best "fit" don't actually do anything good when you explicitly spell them out. Then the felt sense changes.

To me, it feels like the felt sense traces out natural "pathways" across a "landscape" which you're exploring. An idea might be a pointer which leads to a dead end, but there's still "really a path there" -- you had it, which must mean that it was a natural thought to have in some sense. I take interest not just in what's true, but what the natural development of certain ideas is. This kind of attitude helps you explore alternative pathways.

Gendlin describes his notion of Focusing as involved in scientific research. It's not just about emotions. I think I'm describing the same thing here.

Inner Sim

CFAR teaches a class on "inner sim", the intuitive expectations you have. When you try to balance one object on top of another, you have an intuition about whether it will fall. If someone tells you something, you might have an intuition about whether they're lying. You can't necessarily unpack these intuitions very well. Nor are they perfectly accurate. But they are quite useful.

The surprising thing is that it seems many people don't naturally make use of their inner sims as much as they could. Let's say you're at work, and you come up with a plan for completing a project within a week. The words "planning fallacy" might come to mind, but let's set that aside and ask a different question -- does your inner sim really expect the project to be done in a week? This kind of question can give useful information surprisingly often. And if your inner sim doesn't think the plan will work, you can try and ask yourself questions like why it will fail.

So, once you've developed an idea via the methodology in the previous section, another thing you can do is ask your inner sim about the idea. Is it true? Is it real? Can it work? What do you actually expect?

Using gentle tugging for idea development is just as good for creating fact or fiction, so you have to add this kind of reality check.

Also, communicating with the inner sim can be a lot like communicating with memory. You can gently sit with the question "what do I actually expect?" and see what comes up. And you similarly want to try and explicitly name what comes up; each detail of your expectations which you explicitly name can help pull more out.

Motivated Cognition

Just like we worried about false memories, we might worry about motivated cognition. Does asking your inner sim really provide a truth check? Does following your felt sense create a bias in what ideas you develop?

In my experience, if I'm caught up in motivated cognition, it is literally harder to remember things which go against what I'm saying -- it seems like I just don't remember them. But the same memory techniques which I've mentioned do help. I might not want to say the contrary facts once I recall them, but I can at least consciously decide that.

Similarly, I think the inner-sim checks are indeed useful in combating motivated cognition. Is it true? Is it real? What do I actually expect? What do I actually think? Giving yourself a little pause to sit with these questions can make you change your mind during an argument in a number of seconds (in my experience).

Explaining Things to Others

Just as explicitly naming things within your own head can help you pull detail out, once you think you understand something, explaining it to someone else can help pull a whole lot more detail out. This is probably true for memory, too.

It's not even necessarily about the interaction with the other person. Just trying to write something for someone else (and then never sharing it) can be similarly useful, whether it's a specific audience or a broad one. The need to bridge the inferential gap makes many more details feel relevant, which didn't feel relevant when you were explaining it to yourself.

Naturally, communicating an idea to another person is also great for uncovering problems.

This goes back to the reason why the overall technique I'm discussing works at all. Explicitly naming details of a memory helps to unpack it because what you know you know is different than what you know. You have a kind of mental illusion that you're remembering a whole conversation, but you're not really fitting all those details in short-term memory, which means you're not successfully pulling on all the associations. Similarly, you might think you understand something, but be unable to really explain all the details.

Gears Thinking

Gears-level thinking is like unpacking an idea with exceptionally high standards about whether you really understand it. I mentioned that explaining things to others is helpful because you "pull on" details which you wouldn't ordinarily pull on, since you think you understand them. Gears thinking doesn't literally pull on "everything", but it pulls on a lot more.

I'm afraid that someone will read that and kind of nod along without getting it. I'm not talking about just generally having higher standards. I'm talking about the moment-to-moment experience of thinking. I'm saying there's a mental stance you can take where you "stop being lazy about your thinking" -- you don't re-check really solid things like 1+1=2, but you aren't satisfied with a thought until you've really gotten all the details in a significant sense.

The question you ask isn't whether something is true; the question you ask is exactly why it's true. No matter how confident you are that, say, a theorem you're using holds, you want the proof. You're trying to see all the pieces and how they fit together.

It's like pulling out a moth-eaten map and looking at the holes, trying to fill them in. Maybe you can't fill them in right away; maybe you have to make a voyage across the sea. It's hard. But you want those details; you want the map to be complete, not just "good enough".

Understanding Others

There's a closely related mental stance which I call "ask all the questions". You might think, from the kind of Focusing-like habits I've been describing, that you have to turn within to get the answers. But your focusing object can also be outside of you.

You can orient this toward typical social small-talk. What cognitive habits lead someone to ask questions like "what school did you go to" or "do you have any siblings"? You could have a mental list of standard questions you ask people in social settings. But a different way, which I think is more efficient, is to focus on your "picture" of the person (sort of mentally rehearsing it) and asking questions to fill in the gaps.

Something which surprised me when I tried this attitude on was how self-centred it felt. You're still looking at your map for holes. And, you're kind of dominating the conversation, in terms of steering. But, you can bring in the gentle/patient attitude I keep talking about.

You can do the same for topics other than small talk. Maybe you are trying to understand how someone things about X. What many people do is focus mainly on their own picture of X, and let what the other person says kind of land in that map, focusing questions on problems. And that's useful. But you can also focus on your map of their map. (This might start out being a copy of your map, since you might assume that they mostly think about X like you and just have some different details. But the cognitive operation is already different; you bring your attention to the places least likely to be the same as for you.)

Again I want to emphasize that I'm talking about a moment-to-moment stance. Not occasionally thinking "what's my map of their map?" during a conversation. Focusing on it primarily, letting it drive most of your questions.

This can be a good way of absorbing technical subjects from people.


Dominic Cummings: "we’re hiring data scientists, project managers, policy experts, assorted weirdos"

3 января, 2020 - 03:33
Published on January 3, 2020 12:33 AM UTC

Dominic Cummings (discussed previously on LW, most recently here) is a Senior Advisor to the new UK PM, Boris Johnson. He also seems to be essentially a rationalist (at least in terms of what ideas he's paying attention to).

He has posted today that his team is hiring "data scientists, project managers, policy experts, assorted weirdos". Perhaps some LW readers should apply.

Extensive quotes below:

‘This is possibly the single largest design flaw contributing to the bad Nash equilibrium in which … many governments are stuck. Every individual high-functioning competent person knows they can’t make much difference by being one more face in that crowd.’ Eliezer Yudkowsky, AI expert, LessWrong etc.


Now there is a confluence of: a) Brexit requires many large changes in policy and in the structure of decision-making, b) some people in government are prepared to take risks to change things a lot, and c) a new government with a significant majority and little need to worry about short-term unpopularity while trying to make rapid progress with long-term problems.

There is a huge amount of low hanging fruit — trillion dollar bills lying on the street — in the intersection of:

  • the selection, education and training of people for high performance
  • the frontiers of the science of prediction
  • data science, AI and cognitive technologies (e.g Seeing Rooms, ‘authoring tools designed for arguing from evidence’, Tetlock/IARPA prediction tournaments that could easily be extended to consider ‘clusters’ of issues around themes like Brexit to improve policy and project management)
  • communication (e.g Cialdini)
  • decision-making institutions at the apex of government.

We want to hire an unusual set of people with different skills and backgrounds to work in Downing Street with the best officials, some as spads and perhaps some as officials. If you are already an official and you read this blog and think you fit one of these categories, get in touch.

The categories are roughly:

  • Data scientists and software developers
  • Economists
  • Policy experts
  • Project managers
  • Communication experts
  • Junior researchers one of whom will also be my personal assistant
  • Weirdos and misfits with odd skills


A. Unusual mathematicians, physicists, computer scientists, data scientists

You must have exceptional academic qualifications from one of the world’s best universities or have done something that demonstrates equivalent (or greater) talents and skills. You do not need a PhD — as Alan Kay said, we are also interested in graduate students as ‘world-class researchers who don’t have PhDs yet’.


A few examples of papers that you will be considering:

  • [...]
  • The papers on computational rationality below.
  • The work of Judea Pearl, the leading scholar of causation who has transformed the field. 


B. Unusual software developers

We are looking for great software developers who would love to work on these ideas, build tools and work with some great people. You should also look at some of Victor’s technical talks on programming languages and the history of computing.

You will be working with data scientists, designers and others.

C. Unusual economists

We are looking to hire some recent graduates in economics. You should a) have an outstanding record at a great university, b) understand conventional economic theories, c) be interested in arguments on the edge of the field — for example, work by physicists on ‘agent-based models’ or by the hedge fund Bridgewater on the failures/limitations of conventional macro theories/prediction, and d) have very strong maths and be interested in working with mathematicians, physicists, and computer scientists.


The sort of conversation you might have is discussing these two papers in Science (2015): Computational rationality: A converging paradigm for intelligence in brains, minds, and machines, Gershman et al and Economic reasoning and artificial intelligence, Parkes & Wellman

You will see in these papers an intersection of:

  • von Neumann’s foundation of game theory and ‘expected utility’,
  • mainstream economic theories,
  • modern theories about auctions,
  • theoretical computer science (including problems like the complexity of probabilistic inference in Bayesian networks, which is in the NP–hard complexity class),
  • ideas on ‘computational rationality’ and meta-reasoning from AI, cognitive science and so on.

If these sort of things are interesting, then you will find this project interesting.

It’s a bonus if you can code but it isn’t necessary.

D. Great project managers.

If you think you are one of the a small group of people in the world who are truly GREAT at project management, then we want to talk to you.


It is extremely interesting that the lessons of Manhattan (1940s), ICBMs (1950s) and Apollo (1960s) remain absolutely cutting edge because it is so hard to apply them and almost nobody has managed to do it. The Pentagon systematically de-programmed itself from more effective approaches to less effective approaches from the mid-1960s, in the name of ‘efficiency’. Is this just another way of saying that people like General Groves and George Mueller are rarer than Fields Medallists?


E. Junior researchers

In many aspects of government, as in the tech world and investing, brains and temperament smash experience and seniority out of the park.

We want to hire some VERY clever young people either straight out of university or recently out with with extreme curiosity and capacity for hard work.


F. Communications

In SW1 communication is generally treated as almost synonymous with ‘talking to the lobby’. This is partly why so much punditry is ‘narrative from noise’.

With no election for years and huge changes in the digital world, there is a chance and a need to do things very differently.


G. Policy experts

One of the problems with the civil service is the way in which people are shuffled such that they either do not acquire expertise or they are moved out of areas they really know to do something else. One Friday, X is in charge of special needs education, the next week X is in charge of budgets.


If you want to work in the policy unit or a department and you really know your subject so that you could confidently argue about it with world-class experts, get in touch.


G. Super-talented weirdos

People in SW1 talk a lot about ‘diversity’ but they rarely mean ‘true cognitive diversity’. They are usually babbling about ‘gender identity diversity blah blah’. What SW1 needs is not more drivel about ‘identity’ and ‘diversity’ from Oxbridge humanities graduates but more genuine cognitive diversity.

We need some true wild cards, artists, people who never went to university and fought their way out of an appalling hell hole, weirdos from William Gibson novels like that girl hired by Bigend as a brand ‘diviner’ who feels sick at the sight of Tommy Hilfiger or that Chinese-Cuban free runner from a crime family hired by the KGB. If you want to figure out what characters around Putin might do, or how international criminal gangs might exploit holes in our border security, you don’t want more Oxbridge English graduates who chat about Lacan at dinner parties with TV producers and spread fake news about fake news.

By definition I don’t really know what I’m looking for but I want people around No10 to be on the lookout for such people.

We need to figure out how to use such people better without asking them to conform to the horrors of ‘Human Resources’ (which also obviously need a bonfire).



As Paul Graham and Peter Thiel say, most ideas that seem bad are bad but great ideas also seem at first like bad ideas — otherwise someone would have already done them. Incentives and culture push people in normal government systems away from encouraging ‘ideas that seem bad’. Part of the point of a small, odd No10 team is to find and exploit, without worrying about media noise, what Andy Grove called ‘very high leverage ideas’ and these will almost inevitably seem bad to most.

I will post some random things over the next few weeks and see what bounces back — it is all upside, there’s no downside if you don’t mind a bit of noise and it’s a fast cheap way to find good ideas…

H/T ioannes_shade


Normalization of Deviance

3 января, 2020 - 01:58
Published on January 2, 2020 10:58 PM UTC

An important, ongoing part of the rationalist project is to build richer mental models for understanding the world. To that end I'd like to briefly share part of my model of the world that seems to be outside the rationalist cannon in an explicit way, but which I think is known well to most, and talk a bit about how I think it is relevant to you, dear reader. Its name is "normalization of deviance".

If you've worked a job, attended school, driven a car, or even just grew up with a guardian, you've most likely experienced normalization of deviance. It happens when your boss tells you to do one thing but all your coworkers do something else and your boss expects you to do the same as them. It happens when the teacher gives you a deadline but lets everyone turn in the assignment late. It happens when you have to speed to keep up with traffic to avoid causing an accident. And it happens when parents lay down rules but routinely allow exceptions such that the rules might as well not even exist.

It took a much less mundane situation for the idea to crystalize and get a name. Diane Vaughan coined the term as part of her research into the causes of the Challenger explosion, where she described normalization of deviance as what happens when people within an organization become so used to deviant behavior that they don't see the deviance, even if that deviance is actively working against an important goal (in the case of Challenger, safety). From her work the idea has spread to considerations in healthcare, aeronautics, security, and, where I learned about it, software engineering. Along the way the idea has generalized from being specifically about organizations, violations of standard operating procedures, and safety to any situation where norms are so regularly violated that they are replaced by the de facto norms of the violations.

I think normalization of deviance shows up all over the place and is likely quietly happening in your life right now just outside where you are bothering to look. Here's some ways I think this might be relevant to you, and I encourage you to mention more in the comments:

  • If you are trying to establish a new habit, regular violations of the intended habit may result in a deviant, skewed version of the habit being adopted.
  • If you are trying to live up to an ideal (truth telling, vegetarianism, charitable giving, etc.), regularly tolerating violations of that ideal draws you away from it in a sneaky, subtle way that you may still claim to be upholding the ideal when in fact you are not and not even really trying to.
  • If you are trying to establish norms in a community, regularly allowing norm violations will result in different norms than those you intended being adopted.

Those mentioned, my purpose in this post is to be informative, but I know that some of you will read this and make the short leap to treating it as advice that you should aim to allow less normalization of deviance, perhaps by being more scrupulous or less forgiving. Maybe, but before you jump to that, I encourage you to remember the adage about reversing all advice. Sometimes normalized "deviance" isn't so much deviance as an illegible norm that is serving an important purpose and "fixing" it will actually break things or otherwise make things worse. And not all deviance is normalized deviance: if you don't leave yourself enough slack you'll likely fail from trying too hard. So I encourage you to know about normalization of deviance, to notice it, and be deliberate about how you choose to respond to it.


Does GPT-2 Understand Anything?

2 января, 2020 - 21:50
Published on January 2, 2020 5:09 PM UTC

Some people have expressed that “GPT-2 doesn’t understand anything about language or reality. It’s just huge statistics.” In at least two senses, this is true.
First, GPT-2 has no sensory organs. So when it talks about how things look or sound or feel and gets it right, it is just because it read something similar on the web somewhere. The best understanding it could have is the kind of understanding one gets from reading, not from direct experiences. Nor does it have the kind of understanding that a person does when reading, where the words bring to mind memories of past direct experiences.
Second, GPT-2 has no qualia. This is related to the previous point, but distinct from it. One could imagine building a robotic body with cameras for eyes and microphones for ears that fed .png and .wav files to something like GPT-2 rather than .html files. Such a system would have what might be called experiences of the world. It would not, however, create an direct internal impression of redness or loudness, the ineffable conscious experience that accompanies sensation.
However, this is too high a bar to rule out understanding. Perhaps we should call the understanding that comes from direct personal experience “real understanding” and the kind that comes solely from reading with no connection to personal experience “abstract understanding.” Although I can’t “really understand” what it was like to fight in the Vietnam War (because I wasn’t there, man) I can still understand it in an abstract sense. With an abstract understanding, here are some things one can do:
• answer questions about it in one’s own words
• define it
• use it appropriately in a sentence
• provide details about it
• summarize it
Professional teachers distinguish between tests of knowledge (which can be handled by mere memorization) and tests of understanding, with the latter being more difficult and useful (see Bloom’s Taxonomy). Understanding requires connecting a new idea to ideas a student is already familiar with.
GPT-2 is able to pass many such tests of understanding. With an appropriate prompt (such as giving examples of what form the answer to a question should take) it is able to answer questions, define terms, use words appropriately in a sentence, provide details, and summarize.
This is understanding for most practical purposes. It shows that when GPT-2 uses a word, that word has the appropriate kinds of connections to other words. The word has been integrated into a large graph-like structure of relationships between what can reasonably be called concepts or ideas. When probabilities for the next token have been generated, it has a certain propensity for using a particular word; but if that word is artificially blocked, other ways of saying the same thing also have been activated and will be used instead. It is reasonable to interpret this as having an “idea” of what it “wants” to “say” and at some point the quotation marks are no longer helpful, and we may as well dispense with them.
Here is an example. I input the following prompt into GPT-2 1.5B, with top-k=10 sampling:
"Indiana Jones ducked as he entered the cave to avoid being decapitated." In this sentence, the word "decapitated" means
Here are the first 10 results (truncated after the first sentence):
• "to be cut down" as well as "to be slain."
• "to chop off".
• "to cut off one of the branches of a tree."
• "The captain of the ship was killed in the cave."
• "to cut off, cut off by decapitation."
• "cut off".
• "cut off."
• to be "sliced off."
• "to be killed," which is the same thing as "to be killed by the sword."
• to fall from high altitude or to be cut down.
• "to have a head chopped off."
The system has a strong notion that “decapitated" means “to cut off” and “to kill” but is less likely to mention that the word has anything to do with a head. So its concept of “decapitation” appears to be approximately (but not completely) right. When prompted to write a sentence using the word “decapitate,” the sentences the system usually generates are consistent with this, often being used in a way consistent with killing, but only rarely mentioning heads. (This has all gotten rather grisly.)
However, one shouldn't take this too far. GPT-2 uses concepts in a very different way than a person does. In the paper “Evaluating Commonsense in Pre-trained Language Models,” the probability of generating each of a pair of superficially similar sentences is measured. If the system is correctly and consistently applying a concept, then one of the two sentences will have a high probability and the other a low probability of being generated. For example, given the four sentences
1. People need to use their air conditioner on a hot day.
2. People need to use their air conditioner on a lovely day.
3. People don’t need to use their air conditioner on a hot day.
4. People don’t need to use their air conditioner on a lovely day.
Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood. The same problem occurred with word vectors, like word2vec. “Black” is the opposite of “white,” but except in the one dimension they differ, nearly everything else about them is the same: you can buy a white or black crayon, you can paint a wall white or black, you can use white or black to describe a dog’s fur. Because of this, black and white are semantically close, and tend to get confused with each other.
The underlying reason for this issue appears to be that GPT-2 has only ever seen sentences that make sense, and is trying to generate sentences that are similar to them. It has never seen sentences that do NOT make sense and makes no effort to avoid them. The paper “Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training” introduces such an “unlikelihood objective” and shows it can help with precisely the kinds of problems mentioned in the previous paper, as well as GPT-2’s tendency to get stuck in endless loops.
Despite all this, when generating text, GPT-2 is more likely to generate a true sentence than the opposite of a true sentence. “Polar bears are found in the Arctic” is far more likely to be generated than “Polar bears are found in the tropics,” and it is also more likely to be generated than “Polar bears are not found in the Arctic” because “not found” is a less likely construction to be used in real writing than “found.”
It appears that what GPT-2 knows is that the concept polar bear has a found in relation to Arctic but that it is not very particular about the polarity of that relation (found in vs. not found in.) It simply defaults to expressing the more commonly used positive polarity much of the time.
Another odd feature of GPT-2 is that its writing expresses equal confidence in concepts and relationships it knows very well, and those it knows very little about. By looking into the probabilities, we can often determine when GPT-2 is uncertain about something, but this uncertainty is not expressed in the sentences it generates. By the same token, if prompted with text that has a lot of hedge words and uncertainty, it will include those words even if it is a topic it knows a great deal about.
Finally, GPT-2 doesn’t make any attempt to keep its beliefs consistent with one another. Given the prompt The current President of the United States is named, most of the generated responses will be variations on “Barack Obama.” With other prompts, however, GPT-2 acts as if Donald Trump is the current president. This contradiction was present in the training data, which was created over the course of several years. The token probabilities show that both men’s names have fairly high likelihood of being generated for any question of the kind. A person discovering that kind of uncertainty about two options in their mind would modify their beliefs so that one was more likely and the other less likely, but GPT-2 doesn't have any mechanism to do this and enforce a kind of consistency on its beliefs.
In summary, it seems that GPT-2 does have something that can reasonably be called “understanding” and holds something very much like “concepts” or “ideas” which it uses to generate sentences. However, there are some profound differences between how a human holds and uses ideas and how GPT-2 does, which are important to keep in mind.