Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 29 минут 57 секунд назад

PlanAlyzer: assessing threats to the validity of online experiments

2 января, 2020 - 10:42
Published on January 2, 2020 7:42 AM UTC

It’s easy to make experimental design mistakes that invalidate your online controlled experiments. At an organisation like Facebook (who kindly supplied the corpus of experiments used in this study), the state of art is to have a pool of experts carefully review all experiments. PlanAlyzer acts a bit like a linter for online experiment designs, where those designs are specified in the PlanOut language.

As well as pointing out any bugs in the experiment design, PlanAlyzer will also output a set of contrasts — comparisons that you can safely make given the design of the experiment. Hopefully the comparison you wanted to make when you set up the experiment is in that set!


Regular readers of The Morning Paper will be well aware that there’s plenty that can go wrong in the design and interpretation of online controlled experiments (see e.g. ‘A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments’) . PlanAnalyzer is aimed at detecting threats to internal validity, the degree to which valid causal conclusions can (or cannot!) be drawn from a study.


What to do when both morally and empirically uncertain

2 января, 2020 - 10:20
Published on January 2, 2020 7:20 AM UTC

For an epistemic status statement and an outline of the purpose of this sequence of posts, please see the top of my prior post. There are also some explanations and caveats in that post which I won’t repeat - or will repeat only briefly - in this post.

Purpose of this post

In my prior post, I wrote:

We are often forced to make decisions under conditions of uncertainty. This uncertainty can be empirical (e.g., what is the likelihood that nuclear war would cause human extinction?) or moral (e.g., does the wellbeing of future generations matter morally?). The issue of making decisions under empirical uncertainty has been well-studied, and expected utility theory has emerged as the typical account of how a rational agent should proceed in these situations. The issue of making decisions under moral uncertainty appears to have received less attention (though see this list of relevant papers), despite also being of clear importance.

I then went on to describe three prominent approaches for dealing with moral uncertainty (based on Will MacAskill’s 2014 thesis):

  1. Maximising Expected Choice-worthiness (MEC), if all theories under consideration by the decision-maker are cardinal and intertheoretically comparable.[1]
  2. Variance Voting (VV), a form of what I’ll call “Normalised MEC”, if all theories under consideration are cardinal but not intertheoretically comparable.[2]
  3. The Borda Rule (BR), if all theories under consideration are ordinal.

But I was surprised to discover that I couldn’t find any very explicit write-up of how to handle moral and empirical uncertainty at the same time. I assume this is because most people writing on relevant topics consider the approach I will propose in this post to be quite obvious (at least when using MEC with cardinal, intertheoretically comparable, consequentialist theories). Indeed, many existing models from EAs/rationalists (and likely from other communities) already effectively use something very much like the first approach I discuss here (“MEC-E”; explained below), just without explicitly noting that this is an integration of approaches for dealing with moral and empirical uncertainty.[3]

But it still seemed worth explicitly spelling out the approach I propose, which is, in a nutshell, using exactly the regular approaches to moral uncertainty mentioned above, but on outcomes rather than on actions, and combining that with consideration of the likelihood of each action leading to each outcome. My aim for this post is both to make this approach “obvious” to a broader set of people and to explore how it can work with non-comparable, ordinal, and/or non-consequentialist theories (which may be less obvious).

(Additionally, as a side-benefit, readers who are wondering what on earth all this “modelling” business some EAs and rationalists love talking about is, or who are only somewhat familiar with modelling, may find this post to provide useful examples and explanations.)

I'd be interested in any comments or feedback you might have on anything I discuss here!

MEC under empirical uncertainty

To briefly review regular MEC: MacAskill argues that, when all moral theories under consideration are cardinal and intertheoretically comparable, a decision-maker should choose the “option” that has the highest expected choice-worthiness. Expected choice-worthiness is given by the following formula:

In this formula, C(Ti) represents the decision-maker’s credence (belief) in Ti (some particular moral theory), while CWi(A) represents the “choice-worthiness” (CW) of A (an “option” or action that the decision-maker can choose) according to Ti. In my prior post, I illustrated how this works with this example:

Suppose Devon assigns a 25% probability to T1, a version of hedonistic utilitarianism in which human “hedons” (a hypothetical unit of pleasure) are worth 10 times more than fish hedons. He also assigns a 75% probability to T2, a different version of hedonistic utilitarianism, which values human hedons just as much as T1 does, but doesn’t value fish hedons at all (i.e., it sees fish experiences as having no moral significance). Suppose also that Devon is choosing whether to buy a fish curry or a tofu curry, and that he’d enjoy the fish curry about twice as much. (Finally, let’s go out on a limb and assume Devon’s humanity.)

According to T1, the choice-worthiness (roughly speaking, the rightness or wrongness of an action) of buying the fish curry is -90 (because it’s assumed to cause 1,000 negative fish hedons, valued as -100, but also 10 human hedons due to Devon’s enjoyment). In contrast, according to T2, the choice-worthiness of buying the fish curry is 10 (because this theory values Devon’s joy as much as T1 does, but doesn’t care about the fish’s experiences). Meanwhile, the choice-worthiness of the tofu curry is 5 according to both theories (because it causes no harm to fish, and Devon would enjoy it half as much as he’d enjoy the fish curry).

[...] Using MEC in this situation, the expected choice-worthiness of buying the fish curry is 0.25 * -90 + 0.75 * 10 = -15, and the expected choice-worthiness of buying the tofu curry is 0.25 * 5 + 0.75 * 5 = 5. Thus, Devon should buy the tofu curry.

But can Devon really be sure that buying the fish curry will lead to that much fish suffering? What if this demand signal doesn’t lead to increased fish farming/capture? What if the additional fish farming/capture is more humane than expected? What if fish can’t suffer because they aren’t actually conscious (empirically, rather than as a result of what sorts of consciousness our moral theory considers relevant)? We could likewise question Devon’s apparent certainty that buying the tofu curry definitely won’t have any unintended consequences for fish suffering, and his apparent certainty regarding precisely how much he’d enjoy each meal.

These are all empirical questions, but they seem very important for Devon’s ultimate decision, as T1 and T2 don’t “intrinsically care” about buying fish curry or buying tofu curry; they care about some of the outcomes which those actions may or may not cause.[4]

More generally, I expect that, in all realistic decision situations, we’ll have both moral and empirical uncertainty, and that it’ll often be important to explicitly consider both types of uncertainties. For example, GiveWell’s models consider both how likely insecticide-treated bednets are to save the life of a child, and how that outcome would compare to doubling the income of someone in extreme poverty. However, typical discussions of MEC seem to assume that we already know for sure what the outcomes of our actions will be, just as typical discussions of expected value reasoning seem to assume that we already know for sure how valuable a given outcome is.

Luckily, it seems to me that MEC and traditional (empirical) expected value reasoning can be very easily and neatly integrated in a way that resolves those issues. (This is perhaps partly due to that fact that, if I understand MacAskill’s thesis correctly, MEC was very consciously developed by analogy to expected value reasoning.) Here is my formula for this integration, which I'll call Maximising Expected Choice-worthiness, accounting for Empirical uncertainty (MEC-E), and which I'll explain and provide an example for below:

Here, all symbols mean the same things they did in the earlier formula from MacAskill’s thesis, with two exceptions:

  • I’ve added Oj, to refer to each “outcome”: each consequence that an action may lead to, which at least one moral theory under consideration intrinsically values/disvalues. (E.g., a fish suffering; a person being made happy; rights being violated.)
  • Related to that, I’d like to be more explicit that A refers only to the “actions” that the decision-maker can directly choose (e.g., purchasing a fish meal, imprisoning someone), rather than the outcomes of those actions.[5]

(I also re-ordered the choice-worthiness term and the credence term, which makes no actual difference to any results, and was just because I think this ordering is slightly more intuitive.)

Stated verbally (and slightly imprecisely[6]), MEC-E claims that:

One should choose the action which maximises expected choice-worthiness, accounting for empirical uncertainty. To calculate the expected choice-worthiness of each action, you first, for each potential outcome of the action and each moral theory under consideration, find the product of 1) the probability of that outcome given that that action is taken, 2) the choice-worthiness of that outcome according to that theory, and 3) the credence given to that theory. Second, for each action, you sum together all of those products.

To illustrate, I have modelled in Guesstimate an extension of the example of Devon deciding what meal to buy to also incorporate empirical uncertainty.[7] In the text here, I will only state the information that was not in the earlier version of the example, and the resulting calculations, rather than walking through all the details.

Suppose Devon believes there’s an 80% chance that buying a fish curry will lead to “fish being harmed” (modelled as 1000 negative fish hedons, with a choice-worthiness of -100 according to T1 and 0 according to T2), and a 10% chance that buying a tofu curry will lead to that same outcome. He also believes there’s a 95% chance that buying a fish curry will lead to “Devon enjoying a meal a lot” (modelled as 10 human hedons), and a 50% chance that buying a tofu curry will lead to that.

The expected choice-worthiness of buying a fish curry would therefore be: 0.8 * -100 * 0.25 + 0.8 * 0 * 0.75 + 0.95 * 10 * 0.25 + 0.95 * 10 * 0.75 = -10.5

Meanwhile, the expected choice-worthiness of buying a tofu curry would be: 0.1 * -100 * 0.25 + 0.1 * 0 * 0.75 + 0.5 * 10 * 0.25 + 0.5 * 10 * 0.75 = 2.5

As before, the tofu curry appears the better choice, despite seeming somewhat worse according to the theory (T2) assigned higher credence, because the other theory (T1) sees the tofu curry as much better.

In the final section of this post, I discuss potential extensions of these approaches, such as how it can handle probability distributions (rather than point estimates) and non-consequentialist theories.

The last thing I’ll note about MEC-E in this section is that MEC-E can be used as a heuristic, without involving actual numbers, in exactly the same way MEC or traditional expected value reasoning can. For example, without knowing or estimating any actual numbers, Devon might reason that, compared to buying the tofu curry, buying the fish curry is “much” more likely to lead to fish suffering and only “somewhat” more likely to lead to him enjoying his meal a lot. He may further reason that, in the “unlikely but plausible” event that fish experiences do matter, the badness of a large amount of fish suffering is “much” greater than the goodness of him enjoying a meal. He may thus ultimately decide to purchase the tofu curry.

(Indeed, my impression is that many effective altruists have arrived at vegetarianism/veganism through reasoning very much like that, without any actual numbers being required.)

Normalised MEC under empirical uncertainty

(From here onwards, I’ve had to go a bit further beyond what’s clearly implied by existing academic work, so the odds I’ll make some mistakes go up a bit. Please let me know if you spot any errors.)

To briefly review regular Normalised MEC: Sometimes, despite being cardinal, the moral theories we have credence in are not intertheoretically comparable (basically meaning that there’s no consistent, non-arbitrary “exchange rate” between the theories' “units of choice-worthiness"). MacAskill argues that, in such situations, one must first "normalise" the theories in some way (i.e., "[adjust] values measured on different scales to a notionally common scale"), and then apply MEC to the new, normalised choice-worthiness scores. He recommends Variance Voting, in which the normalisation is by variance (rather than, e.g., by range), meaning that we:

“[treat] the average of the squared differences in choice-worthiness from the mean choice-worthiness as the same across all theories. Intuitively, the variance is a measure of how spread out choice-worthiness is over different options; normalising at variance is the same as normalising at the difference between the mean choice-worthiness and one standard deviation from the mean choice-worthiness.”

(I provide a worked example here, based on an extension of the scenario with Devon deciding what meal to buy, but it's possible I've made mistakes.)

My proposal for Normalised MEC, accounting for Empirical Uncertainty (Normalised MEC-E) is just to combine the ideas of non-empirical Normalised MEC and non-normalised MEC-E in a fairly intuitive way. The steps involved (which may be worth reading alongside this worked example and/or the earlier explanations of Normalised MEC and MEC-E) are as follows:

  1. Work out expected choice-worthiness just as with regular MEC, except that here one is working out the expected choice-worthiness of outcomes, not actions. I.e., for each outcome, multiply that outcome’s choice-worthiness according to each theory by your credence in that theory, and then add up the resulting products.

    • You could also think of this as using the MEC-E formula, except with “Probability of outcome given action” removed for now.
  2. Normalise these expected choice-worthiness scores by variance, just as MacAskill advises in the quote above.

  3. Find the “expected value” of each action in the traditional way, with these normalised expected choice-worthiness scores serving as the “value” for each potential outcome. I.e., for each action, multiply the probability it leads to each outcome by the normalised expected choice-worthiness of that outcome (from step 2), and then add up the resulting products.

    • You could think of this as bringing “Probability of outcome given action” back into the MEC-E formula.
  4. Choose the action with the maximum score from step 3 (which we could call normalised expected choice-worthiness, accounting for empirical uncertainty, or expected value, accounting for normalised moral uncertainty).[8]

BR under empirical uncertainty

The final approach MacAskill recommends in his thesis is the Borda Rule (BR; also known as Borda counting). This is used when the moral theories we have credence in are merely ordinal (i.e., they don’t say “how much” more choice-worthy one option is compared to another). In my prior post, I provided the following quote of MacAskill’s formal explanation of BR (here with “options” replaced by “actions”):

“An [action] A’s Borda Score, for any theory Ti, is equal to the number of [actions] within the [action]-set that are less choice-worthy than A according to theory Ti’s choice-worthiness function, minus the number of [actions] within the [action]-set that are more choice-worthy than A according to Ti’s choice-worthiness function.

An [action] A’s Credence-Weighted Borda Score is the sum, for all theories Ti, of the Borda Score of A according to theory Ti multiplied by the credence that the decision-maker has in theory Ti.

[The Borda Rule states that an action] A is more appropriate than an [action] B iff [if and only if] A has a higher Credence-Weighted Borda Score than B; A is equally as appropriate as B iff A and B have an equal Credence-Weighted Borda Score.”

To apply BR when one is also empirically uncertain, I propose just explicitly considering/modelling one’s empirical uncertainties, and then figuring out each action’s Borda Score with those empirical uncertainties in mind. (That is, we don’t change the method at all on a mathematical level; we just make sure each moral theory’s preference rankings over actions - which is used as input into the Borda Rule - takes into account our empirical uncertainty about what outcomes each action may lead to.)

I’ll illustrate how this works with reference to the same example from MacAskill’s thesis that I quoted in my prior post, but now with slight modifications (shown in bold).

“Julia is a judge who is about to pass a verdict on whether Smith is guilty for murder. She is very confident that Smith is innocent. There is a crowd outside, who are desperate to see Smith convicted. Julia has three options:

[G]: Pass a verdict of ‘guilty’.

[R]: Call for a retrial.

[I]: Pass a verdict of ‘innocent’.

She thinks there’s a 0% chance of M if she passes a verdict of guilty, a 30% chance if she calls for a retrial (there may mayhem due to the lack of a guilty verdict, or later due to a later innocent verdict), and a 70% chance if she passes a verdict of innocent.

There’s obviously a 100% chance of C if she passes a verdict of guilty and a 0% chance if she passes a verdict of innocent. She thinks there’s also a 20% chance of C happening later if she calls for a retrial.

Julia believes the crowd is very likely (~90% chance) to riot if Smith is found innocent, causing mayhem on the streets and the deaths of several people. If she calls for a retrial, she believes it’s almost certain (~95% chance) that he will be found innocent at a later date, and that it is much less likely (only ~30% chance) that the crowd will riot at that later date if he is found innocent then. If she declares Smith guilty, the crowd will certainly (~100%) be appeased and go home peacefully. She has credence in three moral theories**, which, when taking the preceding probabilities into account, provide the following choice-worthiness orderings**:

35% credence in a variant of utilitarianism, according to which [G≻I≻R].

34% credence in a variant of common sense, according to which [I>R≻G].

31% credence in a deontological theory, according to which [I≻R≻G].”

This leads to the Borda Scores and Credence-Weighted Borda Scores shown in the table below, and thus to the recommendation that Julia declare Smith innocent.

(More info on how that was worked out can be found in the following footnote, along with the corresponding table based on the moral theories' preference orderings in my prior post, when empirical uncertainty wasn't taken into account.[9])

In the original example, both the utilitarian theory and the common sense theory preferred a retrial to a verdict of innocent (in order to avoid a riot), which resulted in calling for a retrial having the highest Credence-Weighted Borda Score.

However, I’m now imagining that Julia is no longer assuming each action 100% guarantees a certain outcome will occur, and paying attention to her empirical uncertainty has changed her conclusions.

In particular, I’m imagining that she realises she’d initially been essentially “rounding up” (to 100%) the likelihood of a riot if she provides a verdict of innocent, and “rounding down” (to 0%) the likelihood of the crowd rioting at a later date. However, with more realistic probabilities in mind, utilitarianism and common sense would both actually prefer an innocent verdict to a retrial (because the innocent verdict seems less risky, and the retrial more risky, than she’d initially thought, while an innocent verdict still frees this innocent person sooner and with more certainty). This changes each action’s Borda Score, and gives the result that she should declare Smith innocent.[10]

Potential extensions of these approaches Does this approach presume/privilege consequentialism?

A central idea of this post has been making a clear distinction between “actions” (which one can directly choose to take) and their “outcomes” (which are often what moral theories “intrinsically care about”). This clearly makes sense when the moral theories one has credence in are consequentialist. However, other moral theories may “intrinsically care” about actions themselves. For example, many deontological theories would consider lying to be wrong in and of itself, regardless of what it leads to. Can the approaches I’ve proposed handle such theories?

Yes - and very simply! For example, suppose I wish to use MEC-E (or Normalised MEC-E), and I have credence in a (cardinal) deontological theory that assigns very low choice-worthiness to lying (regardless of outcomes that action leads to). We can still calculate expected choice-worthiness using the formulas shown above; in this case, we find the product of (multiply) “probability me lying leads to me having lied” (which we’d set to 1), “choice-worthiness of me having lied, according to this deontological theory”, and “credence in this deontological theory”.

Thus, cases where a theory cares intrinsically about the action and not its consequences can be seen as a “special case” in which the approaches discussed in this post just collapse back to the corresponding approaches discussed in MacAskill’s thesis (which these approaches are the “generalised” versions of). This is because there’s effectively no empirical uncertainty in these cases; we can be sure that taking an action would lead to us having taken that action. Thus, in these and other cases of no relevant empirical uncertainty, accounting for empirical uncertainty is unnecessary, but creates no problems.[11][12]

I’d therefore argue that a policy of using the generalised approaches by default is likely wise. This is especially the case because:

  • One will typically have at least some credence in consequentialist theories.
  • My impression is that even most “non-consequentialist” theories still do care at least somewhat about consequences. For example, they’d likely say lying is in fact “right” if the negative consequences of not doing so are “large enough” (and one should often be empirically uncertain about whether they would be).
Factoring things out further

In this post, I modified examples (from my prior post) in which we had only one moral uncertainty into examples in which we had one moral and one empirical uncertainty. We could think of this as “factoring out” what originally appeared to be only moral uncertainty into its “factors”: empirical uncertainty about whether an action will lead to an outcome, and moral uncertainty about the value of that outcome. By doing this, we’re more closely approximating (modelling) our actual understandings and uncertainties about the situation at hand.

But we’re still far from a full approximation of our understandings and uncertainties. For example, in the case of Julia and the innocent Smith, Julia may also be uncertain how big the riot would be, how many people would die, whether these people would be rioters or uninvolved bystanders, whether there’s a moral difference between a rioter vs a bystanders dying from the riot (and if so, how big this difference is), etc.[13]

A benefit of the approaches shown here is that they can very simply be extended, with typical modelling methods, to incorporate additional uncertainties like these. You simply disaggregate the relevant variables into the “factors” you believe they’re composed of, assign them numbers, and multiply them as appropriate.[14][15]

Need to determine whether uncertainties are moral or empirical?

In the examples given just above, you may have wondered whether I was considering certain variables to represent moral uncertainties or empirical ones. I suspect this ambiguity will be common in practice (and I plan to discuss it further in a later post). Is this an issue for the approaches I’ve suggested?

I’m a bit unsure about this, but I think the answer is essentially “no”. I don’t think there’s any need to treat moral and empirical uncertainty in fundamentally different ways for the sake of models/calculations using these approaches. Instead, I think that, ultimately, the important thing is just to “factor out” variables in the way that makes the most sense, given the situation and what the moral theories under consideration “intrinsically care about”. (An example of the sort of thing I mean can be found in footnote 14, in a case where the uncertainty is actually empirical but has different moral implications for different theories.)

Probability distributions instead of point estimates

You may have also thought that a lot of variables in the examples I’ve given should be represented by probability distributions (e.g., representing 90% confidence intervals), rather than point estimates. For example, why would Devon estimate the probability of “fish being harmed”, as if it’s a binary variable whose moral significance switches suddenly from 0 to -100 (according to T1) when a certain level of harm is reached? Wouldn’t it make more sense for him to estimate the amount of harm to fish that is likely, given that that better aligns both with his understanding of reality and with what T1 cares about?

If you were thinking this, I wholeheartedly agree! Further, I can’t see any reason why the approaches I’ve discussed couldn’t use probability distributions and model variables as continuous rather than binary (the only reason I haven’t modelled things in that way so far was to keep explanations and examples simple). For readers interested in an illustration of how this can be done, I’ve provided a modified model of the Devon example in this Guesstimate model. (Existing models like this one also take essentially this approach.)

Closing remarks

I hope you’ve found this post useful, whether to inform your heuristic use of moral uncertainty and expected value reasoning, to help you build actual models taking into account both moral and empirical uncertainty, or to give you a bit more clarity on “modelling” in general.

In the next post, I’ll discuss how we can combine the approaches discussed in this and my prior post with sensitivity analysis and value of information analysis, to work out what specific moral or empirical learning would be most decision-relevant and when we should vs shouldn’t postpone decisions until we’ve done such learning.

  1. What “choice-worthiness”, “cardinal” (vs “ordinal”), and “intertheoretically comparable” mean is explained in the previous post. To quickly review, roughly speaking:

    • Choice-worthiness is the rightness or wrongness of an action, according to a particular moral theory.
    • A moral theory is ordinal if it tells you only which options are better than which other options, whereas a theory is cardinal if it tells you how big a difference in choice-worthiness there is between each option.
    • A pair of moral theories can be cardinal and yet still not intertheoretically comparable if we cannot meaningfully compare the sizes of the “differences in choice-worthiness” between the theories; basically, if there’s no consistent, non-arbitrary “exchange rate” between different theories’ “units of choice-worthiness”.
  2. MacAskill also discusses a “Hybrid” procedure, if the theories under consideration differ in whether they’re cardinal or ordinal and/or whether they’re intertheoretically comparable; readers interested in more information on that can refer to pages 117-122 MacAskill’s thesis. An alternative approach to such situations is Christian Tarsney’s (pages 187-195) “multi-stage aggregation procedure”, which I may write a post about later (please let me know if you think this’d be valuable). ↩︎

  3. Examples of models that effectively use something like the “MEC-E” approach include GiveWell’s cost-effectiveness models and this model of the cost effectiveness of “alternative foods”.

    And some of the academic moral uncertainty work I’ve read seemed to indicate the authors may be perceiving as obvious something like the approaches I propose in this post.

    But I think the closest thing I found to an explicit write-up of this sort of way of considering moral and empirical uncertainty at the same time (expressed in those terms) was this post from 2010, which states: “Under Robin’s approach to value uncertainty, we would (I presume) combine these two utility functions into one linearly, by weighing each with its probability, so we get EU(x) = 0.99 EU1(x) + 0.01 EU2(x)”. ↩︎

  4. Some readers may be thinking the “empirical” uncertainty about fish consciousness is inextricable from moral uncertainties, and/or that the above paragraph implicitly presumes/privileges consequentialism. If you’re one of those readers, 10 points to you for being extra switched-on! However, I believe these are not really issues for the approaches outlined in this post, for reasons outlined in the final section. ↩︎

  5. Note that my usage of “actions” can include “doing nothing”, or failing to do some specific thing; I don’t mean “actions” to be distinct from “omissions” in this context. MacAskill and other writers sometimes refer to “options” to mean what I mean by “actions”. I chose the term “actions” both to make it more obvious what the A and O terms in the formula stand for, and because it seems to me that the distinction between “options” and “outcomes” would be less immediately obvious. ↩︎

  6. My university education wasn’t highly quantitative, so it’s very possible I’ll phrase certain things like this in clunky or unusual ways. If you notice such issues and/or have better phrasing ideas, please let me know. ↩︎

  7. In that link, the model using MEC-E follows a similar model using regular MEC (and thus considering only moral uncertainty) and another similar model using more traditional expected value reasoning (and thus considering only empirical uncertainty); readers can compare these against the MEC-E model. ↩︎

  8. Before I tried to actually model an example, I came up with a slightly different proposal for integrating the ideas of MEC-E and Normalised MEC. Then I realised the proposal outlined above might make more sense, and it does seem to work (though I’m not 100% certain), so I didn’t further pursue my original proposal. I therefore don't know for sure whether my original proposal would work or not (and, if it does work, whether it’s somehow better than what I proposed above). My original proposal was as follows:

    1. Work out expected choice-worthiness just as with regular MEC-E; i.e., follow the formula from above to incorporate consideration of the probabilities of each action leading to each outcome, the choice-worthiness of each outcome according to each moral theory, and the credence one has in each theory. (But don’t yet pick the action with the maximum expected choice-worthiness score.)
    2. Normalise these expected choice-worthiness scores by variance, just as MacAskill advises in the quote above. (The fact that these scores incorporate consideration of empirical uncertainty has no impact on how to normalise by variance.)
    3. Now pick the action with the maximum normalised expected choice-worthiness score.
  9. G (for example) has a Borda Scoreof 2 - 0 = 2 according to utilitarianism because that theory views two options as less choice-worthy than G, and 0 options as more choice-worthy than G.

    To fill in the final column, you take a credence-weighted average of the relevant action’s Borda Scores.

    What follows is the corresponding table based on the moral theories' preference orderings in my prior post, when empirical uncertainty wasn't taken into account:


  10. It’s also entirely possible for paying attention to empirical uncertainty to not change any moral theory’s preference orderings in a particular situation, or for some preference orderings to change without this affecting which action ends up with the highest Credence-Weighted Borda Score. This is a feature, not a bug.

    Another perk is that paying attention to both moral and empirical uncertainty also provides more clarity on what the decision-maker should think or learn more about. This will be the subject of my next post. For now, a quick example is that Julia may realise that a lot hangs on what each moral theory’s preference ordering should actually be, or on how likely the crowd actually is to riot if she passes a verdict or innocent or calls for a retrial, and it may be worth postponing her decision in order to learn more about these things. ↩︎

  11. Arguably, the additional complexity in the model is a cost in itself. But this is only a problem only in the same way this is a problem for any time one decides to model something in more detail or with more accuracy at the cost of adding complexity and computations. Sometimes it’ll be worth doing so, while other times it’ll be worth keeping things simpler (whether by considering only moral uncertainty, by considering only empirical uncertainty, or by considering only certain parts of one’s moral/empirical uncertainties). ↩︎

  12. The approaches discussed in this post can also deal with theories that “intrinsically care” about other things, like a decision-maker’s intentions or motivations. You can simply add in a factor for “probability that, if I take X, it’d be due to motivation Y rather than motivation Z” (or something along those lines). It may often be reasonable to round this to 1 or 0, in which case these approaches didn’t necessarily “add value” (though they still worked). But often we may genuinely be (empirically) uncertain about our own motivations (e.g., are we just providing high-minded rationalisations for doing something we wanted to do anyway for our own self-interest?), in which case explicitly modelling that empirical uncertainty may be useful. ↩︎

  13. For another example, in the case of Devon choosing a meal, he may also be uncertain how many of each type of fish will be killed, the way in which they’d be killed, whether each type of fish has certain biological and behavioural features thought to indicate consciousness, whether those features do indeed indicate consciousness, whether the consciousness they indicate is morally relevant, whether creatures with consciousness like that deserve the same “moral weight” as humans or somewhat lesser weight, etc. ↩︎

  14. For example, Devon might replace “Probability that purchasing a fish meal leads to "fish being harmed"” with (“Probability that purchasing a fish meal leads to fish being killed” * “Probability fish who were killed would be killed in a non-humane way” * “Probability any fish killed in these ways would be conscious enough that this can count as “harming” them”). This whole term would then be in calculations used wherever ““Probability that purchasing a fish meal leads to "fish being harmed"” was originally used.

    For another example, Julia might replace “Probability the crowd riots if Julia finds Smith innocent” with “Probability the crowd riots if Julia finds Smith innocent” * “Probability a riot would lead to at least one death” * “Probability that, if at least one death occurs, there’s at least one death of a bystander (rather than of one of the rioters themselves)” (as shown in this partial Guesstimate model). She can then keep in mind this more specific final outcome, and its more clearly modelled probability, as she tries to work out what choice-worthiness ordering each moral theory she has credence in would give to the actions she’s considering.

    Note that, sometimes, it might make sense to “factor out” variables in different ways for the purposes of different moral theories’ evaluations, depending on what the moral theories under consideration “intrinsically care about”. In the case of Julia, it definitely seems to me to make sense to replace “Probability the crowd riots if Julia finds Smith innocent” with “Probability the crowd riots if Julia finds Smith innocent” * “Probability a riot would lead to at least one death”. This is because all moral theories under consideration probably care far more about potential deaths from a riot than about any other consequences of the riot. This can therefore be considered an “empirical uncertainty”, because its influence on the ultimate choice-worthiness “flows through” the same “moral outcome” (a death) for all moral theories under consideration.

    However, it might only make sense to further multiply that term by “Probability that, if at least one death occurs, there’s at least one death of a bystander (rather than of one of the rioters themselves)” for the sake of the common sense theory’s evaluation of the choice-worthiness order, not for the utilitarian theory’s evaluation. This would be the case if the utilitarian theory cared not at all (or at least much less) about the distinction between the death of a rioter and the death of a bystander, while common sense does. (The Guesstimate model should help illustrate what I mean by this.) ↩︎

  15. Additionally, the process of factoring things out in this way could by itself provide a clearer understanding of the situation at hand, and what the stakes really are for each moral theory one has credence in. (E.g., Julia may realise that passing a verdict of innocent is much less bad than she thought, as, even if a riot does occur, there’s only a fairly small chance it leads to the death of a bystander.) It also helps one realise what uncertainties are most worth thinking/learning more about (more on this in my next post). ↩︎


Theories That Can Explain Everything

2 января, 2020 - 05:12
Published on January 2, 2020 2:12 AM UTC

It's generally accepted here that theories are valuable to the extent that they provide testable predictions. Being falsifiable means that incorrect theories can be discarded and replaced with theories that better model reality (see Making Beliefs Pay Rent). Unfortunately, reality doesn't play nice and we will sometimes possess excellent theoretical reasons for believing a theory, but that theory will possess far too many degrees of freedom to make it easily falsifiability.

The prototypical example are the kinds of hypotheses that are produced by evolutionary psychology. Clearly all aspects of humanity have been shaped by evolution and the idea that our behaviour is an exception would be truly astounding. In fact, I'd say that it is something of an anti-prediction.

But what use is a theory that doesn't make any solid predictions? Firstly, believing in such a theory will normally have a significant impact on your priors, even if no-one observation would provide strong evidence of its falsehood. But secondly, if the existing viable theories all claim A and you propose viable a theory that would be compatible with A or B, then that would make B viable again. And sometimes that can be a worthy contribution in and of itself. Indeed, you can have a funny situation arise where people nominally reject a theory for not sufficiently constraining expectations, while really opposing it because of how people's expectations would adjust if the theory was true.

See also: Building Intuitions on Non-Empirical Arguments in Science


Characterising utopia

2 января, 2020 - 03:00
Published on January 2, 2020 12:00 AM UTC

When we consider how good the best possible future might be, it's tempting to focus on only a handful of dimensions of change. In transhumanist thinking, these dimensions tend to be characteristics of individuals: their happiness, longevity, intelligence, and so on. [1] This reflects the deeply individualistic nature of our modern societies overall, and transhumanists in particular. Yet when asked what makes their lives meaningful, most people prioritise their relationships with others. In contrast, there are strands of utopian literature which focus on social reorganisation (such as Huxley’s Island or Skinner’s Walden Two), but usually without acknowledging the potential of technology to radically improve the human condition. [2] Meanwhile, utilitarians conceive of the best future as whichever one maximises a given metric of individual welfare - but those metrics are often criticised for oversimplifying the range of goods that people actually care about. [3] In this essay I've tried to be as comprehensive as possible in cataloguing the ways that the future could be much better than the present, which I've divided into three categories: individual lives, relationships with others, and humanity overall. Each section consists of a series of bullet points, with nested elaborations and examples.

I hesitated for quite some time before making this essay public, though, because it feels a little naive. Partly that’s because the different claims don’t form a single coherent narrative. But on reflection I think I endorse that: grand narratives are more seductive but also more likely to totally miss the point. Additionally, Holden Karnofsky has found that “the mere act of describing [a utopia] makes it sound top-down and centralized” in a way which people dislike - another reason why discussing individual characteristics is probably more productive.

Another concern is that even though there’s a strong historical trend towards increases in material quality of life, the same is not necessarily true for social quality of life. Indeed, in many ways the former impedes the latter. In particular, the less time people need to spend obtaining necessities, the more individualistic they’re able to be, and the more time they can spend on negative-sum status games. I don’t know how to solve this problem, or many others which currently prevent us from building a world that's good in all the ways I describe below. But I do believe that humanity has the potential to do so. [4] And having a clearer vision of utopia will likely motivate people to work on the problems that stand, as Dickinson put it, “between the bliss and me”. So what might be amazing about our future?

Individual lives

  • Health. Perhaps the clearest and most obvious way to improve the human condition is to cure the diseases and prevent the accidents which currently reduce both the quality and the duration of many people’s lives. Mental health problems are particularly neglected right now - solving those could make many people much better off.
    • Longevity. From some moral stances, the most important of these diseases to tackle is ageing itself, which prevents us from leading fulfilling lives many times longer than what people currently expect. Rejuvenation treatments could grant unlimited youth and postpone death arbitrarily. While the ethics and pragmatics of a post-death society are complicated (as I discuss here), this does not seem sufficient reason to tolerate the moral outrage of involuntary mortality.
  • Wealth. Nobody should lack access to whatever material goods they need to lead fulfilling lives. As technology advances and we automate more and more of the economy, the need to work to subsist will diminish, and eventually vanish altogether. An extrapolation from the last few centuries of development predicts that with centuries almost everyone will be incredibly wealthy by today’s standards. Luxuries that are now available only to a few (or to none at all) will become widespread.
    • Life in simulation. In the long term, the most complete way to achieve these two goals may be for us to spend almost all of our time in virtual reality, where possessions can be generated on demand, physical inconveniences will be eliminated, and our experiences will be limited only by our imaginations. Eventually this will likely lead to us uploading our minds and permanently inhabiting vast, shared virtual worlds. The key ideas in all of the points that follow this one are applicable whether we inhabit physical or virtual realities.
  • Alleviation of suffering. Evolution has stacked the hedonic deck against us: the extremes of pain are much greater than the extremes of pleasure, and more easily accessible too. But bioengineering and neuroscience will eventually reach a point where we could move towards eradicating suffering (including mental anguish and despair) and fulfilling the goal of David Pearce's abolitionist project. Perhaps keeping something similar to physical pain or mental frustration would still be useful for adding spice or variety to our lives - but it need not be anywhere near the worst and most hopeless extremes of either.
    • Freedom from violence and coercion. As part of this project, any utopia must prevent man’s inhumanity to man, and the savagery and cruelty which blight human history. This would be the continuation of a longstanding trend towards less violent and freer societies.
    • Non-humans. The most horrific suffering which currently exists is not inflicted on humans, but on the trillions of animals with which we share the planet. While most of this essay is focused on human lives and society, preventing the suffering of conscious non-human life (whether animals or aliens or AIs) is a major priority.
  • Deep pleasure and happiness. Broadly speaking, positive emotions are much more complicated than negative ones. Physical pleasure may be simple, but under the umbrella of happiness I also include excitement, contentment, satisfaction, wonder, joy, love, gratitude, amusement, ‘flow’, aesthetic appreciation, the feeling of human connection, and many more!
    • Better living through chemistry. There’s no fundamental reason why our minds couldn’t be reconfigured to experience much more of all of the positive emotions I just listed: why the ecstasy of the greatest day of your life couldn’t be your baseline state, with most days surging much higher; why all food couldn’t taste better than the best food you’ve ever had; why everyday activities couldn’t be more exhilarating than extreme sports.
    • Positive attitudes. Our happiness is crucially shaped by our patterns of thought - whether we’re optimistic and cheerful about our lives, rather than pessimistic and cynical. While I wouldn’t want a society in which people’s expectations were totally disconnected from reality, there’s a lot of room for people to have healthier mindsets and lead more satisfied lives.
    • Self-worth. In particular, it’s important for people to believe that they are valuable and worthwhile. In today’s society it’s far too easy to be plagued by low self-esteem, which poisons our ability to enjoy what we have.
    • Peak fun. Our society is already unprecedentedly entertainment-driven. With even fewer material constraints, we will be able to produce a lot of fun activities. Over time this will involve less passive consumption of media and more participation in exciting adventures that become intertwined with the rest of our lives. [5]
    • New types of happiness. The best experiences won’t necessarily come about just by scaling up our existing emotions, but also by creating new ones. Consider that our ability to appreciate music is an evolutionary accident, but one which deeply enriches our current lives. Our future selves could have many more types of experiences deliberately designed to be as rich and powerful as possible.
  • Choice and self-determination. Humans are more than happiness machines, though. We have dreams about our lives, and we devote ourselves to achieving them. While it’s not always straightforwardly good for people to be able to fulfil their desires (in particular desires involving superiority over other people, which I’ll discuss later), these activities give us purpose and meaning, and it seems unjust when we are unable to fulfil our plans because we are helpless in the face of external circumstances. Yet neither are the best desires those which can be fulfilled with the snap of a finger, or which steer us totally clear of any hardship. Rather, we should be able to set ourselves goals that are challenging yet achievable, goals which we might struggle with - but whose completion is ultimately even more fulfilling because of that. What might they be?
    • Making a difference to others. In a utopian future, dramatically improving other people’s lives would be much more difficult than it is today. Nevertheless, we can impact others via our relationships with them, as I’ll discuss in the next section.
    • Growth. People often set goals to push themselves, grow more and learn more. In those cases the specific achievements are less relevant than the lessons we take from them.
    • Tending your garden. Continuous striving isn’t for everyone. An alternative is the pursuit of peace and contentment, mindfulness and self-knowledge.
    • Self-expression. Everyone has a rich inner life, but most of us rarely (or never) find the means to express our true selves. I envisage unlocking the writer or musician or artist inside each of us - so that we can each tell our own story, and endless other stories most beautiful.
    • Life as art. I picture a world of “human beings who are new, unique, incomparable… who create themselves!” We can think of our lives as canvases upon which we each have the opportunity to paint a masterpiece. For some, that will simply involve pursuing all the other goods I describe in this essay. Others might prioritise making their lives novel, or dramatic, or aesthetically pleasing (even if that makes them less happy).
    • Life at a larger scale. With more favourable external circumstances, individuals will be able to shape their lives on an unprecedented scale. We could spend centuries on a single project, or muster together billions for vast cooperative ventures. We could also remain the “same” continuous person as long as we wanted, rather than inevitably losing touch of the past.
  • Cultivation of virtue. Although less emphasised in modern times, living a good life has long been associated with building character and developing virtues. Doing so is not primarily about changing what happens in our lives, but rather changing how we respond to it. There’s no definitive characterisation of a virtuous person, though: we all have our own intuitions about what traits (integrity, kindness, courage, and so on) we admire most in others. And different philosophical traditions emphasise different virtues, from Aristotle's 'greatness of soul' to Confucius' 'familial piety' to Buddha's ‘loving kindness’ (and the other brahmaviharas). [6] Deciding which virtues are most valuable is a task both for individuals and for society as a whole - with the goal of creating a world of people who have deliberately cultivated the best versions of themselves. [7]
  • Intelligence. As we are, we can comprehend many complex concepts, but there are whole worlds of thought that human-level intelligences can never fully understand. If a jump from chimpanzee brain size to our brain size opened up such vast cognitive vistas, imagine what else might be possible when we augment our current brains, scale up our intelligence arbitrarily far, and lay bare the patterns that compose the universe.
    • The joy of learning. Today, learning is usually a chore. Yet humans are fundamentally curious creatures; and there can be deep satisfaction in discovery and understanding. Education should be a game, which we master through play. We might even want to reframe science as a quest for hidden truths, so that each person can experience for themselves what it’s like to push forward the frontiers of knowledge.
    • Self-understanding. In many ways, we’re inscrutable even to ourselves, with our true beliefs and motivations hidden beneath the surface of consciousness. As we become more intelligent, we will better understand how we really work, fulfilling the longstanding, elusive quest to “know thyself”.
    • Agency. Each human is a collection of modules in a constant tug of war. We want one thing one day, and another the next. We procrastinate and we contradict ourselves and we succumb to life-ruining addictions. But this needn’t be the case. Imagine yourself as a unified agent, one who is able to make good choices for yourself, and stick to them - one who’s not overwhelmed by anger, or addiction, or other desires that your reflective self doesn’t endorse. This might be achieved by brain modification, or by having a particularly good AI assistant which knows how to nudge you into being a more consistent version of yourself.
    • Memory. Today we lose most of our experiences to forgetfulness. But we could (and have already started to) outsource our memories to more permanent storage media accessible demand, so that we can stay in touch with our pasts indefinitely.
    • The extended mind. Clark and Chalmers have argued that we should consider external thinking aids to be part of our minds. Right now these aids are very primitive, and interface with our brains in very limited ways - but that will certainly improve over time, until accessing the outputs of external computation is similar to any other step in our thinking. The result will be as if we’d each internalised all of human knowledge.
  • Variety and novelty of experiences.
    • Seeing the universe. The urge to travel and explore is a deep-rooted one. Eventually we will be able to roam as far as we like, and observe the wonder and grandeur of the cosmos.
    • Explorations of the human condition. Most of us inhabit fairly limited social circles, which don’t allow us to experience different ways of life and different people’s perspectives. Given the time to do so, we could learn a lot from the sheer variety of humanity, and import those lessons into the rest of our lives.
    • Explorations of consciousness. Right now the conscious states that we’re able to experience are limited to those induced by the handful of psychoactive chemicals that we or evolution have stumbled upon. Eventually, though, we will develop totally different ways of experiencing the world that are literally inconceivable to us today.
    • Spiritual experiences. One such mental shift that people already experience is the feeling of spiritual enlightenment. Aside from its religious connotations, this can be a valuable shift of perspective which give us new insights into how to live our lives.
  • Progress on our journeys. A key part of leading a meaningful life is continual growth and transcendence of one’s past self, each moving towards becoming the person we want to be. That might mean becoming a more virtuous person, or more successful, or more fulfilled - as long as we’re able to be proud of our achievements so far, and hopeful about the future.
    • Justified expectation of pleasant surprises. One important factor in creating this sensation of progress is uncertainty about exactly what the future has in store for us. Although we should be confident that our lives will become better, this should sometimes come in the form of pleasant surprises rather than just ticking off predictable checkpoints.
    • Levelling up. One way that this growth might occur is if people’s lives consist of distinct phases, each with different opportunities and challenges. Once someone thinks they have gained all that they desire from one phase, they can choose to move on. In an extreme case, the nature and goals of a subsequent phase might be incomprehensible to those in earlier phases - in the same way that children don’t understand what it’s like to be an adult, and most people don’t understand Buddhist enlightenment. For fictional precedent, consider sublimation in Banks’ Culture, or the elves leaving Middle-Earth in Tolkein’s mythos.
    • Guardrails. Extended lives should be very hard to irreversibly screw up, since there’s so much at stake - especially if we have much greater abilities to modify ourselves than we do today.
    • Leaving a legacy. People want to be remembered after they’ve moved on. Even in a world without death, each person should have had the opportunity to make a lasting difference in their communities before they leave for their next great adventure.

Relationships with others

For most of us, our relationships (with friends, family and romantic partners) are what we find most valuable in life. By that metric, though, it’s plausible that Westerners are poorer than we’ve ever been. What would it mean for our social lives to be as rich as our material lives have and will become? Imagine living in communities and societies that didn’t just allow you to pursue your best life, but were actively on your side - that were ideally designed to enable the flourishing of their inhabitants.

  • Stronger connections. Most relationships are nowhere near as loving or as fulfilling as they might ideally be. That might be because we’re afraid of vulnerability, or we don’t know how to nurture these relationships (knowing how to be a good friend is more valuable than almost anything learned in classes, but taught almost nowhere), or we simply struggle to find and spend time with people who complement us. Imagine a society which is as successful as solving these problems as ours has been at solving scientific and engineering problems, for example by designing better social norms, giving its citizens more time and space for each other, and teaching individuals to think about their relationships in the most constructive ways.
    • Abolishing loneliness. I envisage a future where loneliness has been systematically eradicated, by helping everyone find social environments in which they can flourish, and by providing comprehensive support for people struggling with building or maintaining relationships. I imagine too a future without the social anxieties which render many of us insecure and withdrawn.
    • Love, love, love. What would utopia be without romantic love and passion? This is an obsession of modern culture - and yet it’s also something that doesn’t always come naturally. We could improve romance by reducing the barriers of fear and insecurity, allowing people to better create true intimacy. Even the prosaic solutions of better educational materials and cultural norms might go a long way towards that.
    • Commitment and trust. In my mind, the key feature of both romance and friendship is deep commitment and trust, and the common knowledge that you’re each there for the other person. Whatever the bottlenecks are to more people building up that sort of bond - inability to communicate openly and honestly; or a lack of empathy; or even the absence of shared adversity - we could focus society’s efforts towards remedying.
    • Free love. While there’s excitement in the complex dance of romance, a lot of the hangups around sex serve only to make people anxious and unhappy. Consenting adults should feel empowered to pursue each other; and of course utopia should include some great sex.
    • Ending toxic relationships. We can reduce and manage the things that make relationships toxic, like jealousy, narcissism, and abuse. This might happen via mental health treatment, better education, better knowledge of how to raise well-socialised children, or cultural norms which facilitate healthy relationships.
    • Longer connections. I think it’s worth noting the positive effect that longevity could have on personal relationships. There’s a depth and a joy to being lifelong friends - but how much stronger could it be when those lives stretch out across astronomical timescales? This is not to say that we should bind ourselves to the same people for our whole extended lives - rather, we can spend time together and separate in the knowledge that it need never be a final parting, with each reunion a thing of joy.
  • Life as a group project. In addition to one-on-one relationships, there’s a lot of value in being part of a close-knit group with deep shared bonds - a circle of lifelong friends, or soldiers who trust each other with their lives, or a large and loving family. Many people don’t have any of these, but I hope that they could.
    • Better starts to life. The quality of relationships is most important for the most vulnerable among us. In a utopian future, every child would be raised with love, and allowed to enjoy the wonder of childhood; and indeed, they would keep that same wonder long into their adult lives.
    • Less insular starts to life. Today, many children only have the opportunity to interact substantively with a handful of adults. While I’m unsure about fully-communal parenting, children who will become part of a broader community shouldn’t be shut off from that community; rather, they should have the chance to befriend and learn from a range of people. Meanwhile, spending more time with children would enrich the lives of many adults.
    • Families, extended. What is the most meaningful thing for the most people? Probably spending time with their children and grandchildren, and knowing that with their family they’ve created something unique and important. A utopian vision of family would have the same features, but with each person living to see their lineage branch out into a whole forest of descendants, with them at the root.
  • Healthy societies. In modern times our societies are too large and fragmented to be the close-knit groups I mentioned above. Yet people can also find meaning in being part of something much larger than themselves, and working together towards the common goal of building and maintaining a utopia.
    • Positive norms. The sort of behaviours that are socially encouraged and rewarded should be prosocial ones which contribute to the well-being of society as a whole.
    • (The good parts of) tribalism and patriotism. The feeling of being part of a cohesive group of people unified by a common purpose is a powerful one. At a small scale, we currently get this from watching sports, or singing in a choir. At larger scales, those same feelings often lead to harmful nationalist behaviour - yet at their best, they could give us a world in which people feel respect for and fraternity with all those around them by default, simply due to their shared humanity.
    • Tradition and continuity. Another key component of belonging to something larger than yourself is continuing a long-lived legacy. Traditions could be maintained over many millennia in a way which gives each person a sense of their place in history.
    • Political voices. Our current societies are too large for their overall directions to be meaningfully influenced by most people. But we can imagine mechanisms which allow individuals to weigh in on important questions in their local communities to a much greater extent. And people could at least know that their voice and vote have as much weight in the largest-scale decisions as anyone else’s.
  • Meetings of minds. Today, humans communicate through words and gestures and body language. These are very low-bandwidth channels, compared with what is theoretically possible. In particular, brain interfaces could allow direct communication from one person’s mind to another. That wouldn’t just be quicker talking, but a fundamentally different mode of communication, as if another person were finishing your own thoughts. And consider that our “selves” are not discrete entities, but are made up of many mental modules. If we link them in new ways, the boundaries between you and other people might become insubstantial - you might temporarily (or permanently) become one larger person.
  • Mitigating status judgements and dominance-seeking. In general we can’t hope to understand social interactions without considering status and hierarchy. We want to date the most attractive people and have the most prestigious jobs and become as wealthy as possible in large part to look better than others. The problem is that not everyone can reach the top, and so widespread competition to do so will leave many dissatisfied. In other cases, people are directly motivated to dominate and outcompete each other - such as businesspeople who want to crush their rivals. While this can be useful for driving progress, in the long term those motivations would ideally be channeled in ways which are more conducive to long-lasting fulfilment. For example, aggressive instincts could be directed towards recreational sports rather than relationships or careers.
    • Diverse scales of success. To make social dynamics more positive-sum, we should avoid sharing one single vision, which everyone is striving towards, of what a successful life looks. We can instead encourage people to tie their identities to the subcommunities they care most about, rather than measuring themselves against the whole world (though for an objection to this line of reasoning see Katja Grace’s post here).
    • More equality of status. To the extent that we still have hierarchies and negative-sum games, it should at least be the case that nobody is consistently at the bottom of all of them, and everyone can look forward to their time of recognition and respect (as in the system I outline in this blog post).

Humanity overall

When we zoom out to consider the trajectory of humanity as a whole, there are some desirable properties which we might want it to have. Although there are reasons to distrust such large-scale judgements (in particular the human tendency towards scope insensitivity) these are often strong intuitions which do matter to us.

  • Sheer size. The more people living worthwhile lives, the better - and with the astronomical resources available to us, we have the opportunity to allow our descendants to number in the uncountable trillions.
  • Solving coordination. In general, we’re bad at working together to resolve problems. This could be solved by mechanisms to make politics and governance transparent, accountable and responsive at a variety of levels. In other words, imagine humanity at one with itself and able to set its overall direction, rather than trapped in our current semi-anarchic default condition.
    • The end of war. Others have spoken of the senseless horror of war much better than I can. I will merely add that some human war will be our last war; let us hope that it gains that distinction for the right reason.
    • Avoiding races to the bottom. Under most people’s ethical intuitions, we should dislike the Malthusian scenario in which, even as our wealth grows vastly, our populations will grow even faster, so that most people end up with subsistence-level resources. To avoid this, we will need to ability to coordinate well at large scales.
  • The pursuit of knowledge. As a species we will learn and discover more and more over time. Eventually we will understand both the most fundamental building blocks of nature and also the ways in which complex systems like our minds and societies function.
  • Moral progress. In particular, we will come to a better understanding of ethics, both in theory and in ways that we can actually act upon - and then hopefully do so, to create just societies. While it’s difficult to predict exactly where moral progress will take us, one component which seems very important is building a utopia for all, with widespread access to the opportunity to pursue a good life. In particular, this should probably involve everyone having certain basic rights - such as the ability to participate in the major institutions of civil society, as Anderson describes.
  • Exploring the diversity of life. Many people value our current world’s variety of cultures and lifestyles - but over many millennia our species will be able to explore the vast frontiers of what worthwhile lives and societies could look like. The tree of humanity will branch out in ways that are unimaginable to us now.
    • Speciation. Even supposing that we are currently alone in the universe, we need not be the last intelligent species. Given sufficient time, it might become desirable to create descendant species, or split humanity into different branches which experience different facets of life. Or we might at least enjoy the companionship of animals, whether they be species that currently exist or those which we create ourselves.
  • Making our mark. The universe is vast, but we have plenty of time. Humanity could expand to colonise this galaxy, and others, in a continual wave of exploration and pursuit of the unknown. We might create projects of unimaginable scale, reengineering the cosmos as we choose, and diverting the current astronomical waste towards the well-being of ourselves and our descendants.
    • Creativity and culture. The ability to create new life, design entire worlds, and perform other large-scale feats, will allow unmatched expressions of artistry and beauty.
    • Humanity’s final flourishing. In the very very long term, under our current understanding of physics, humanity will run out of energy to sustain itself, and our civilisation will draw to an end. If we cannot avoid that, at least we can design our species’ entire trajectory, including that final outcome, with the wisdom of uncountable millennia.

Contentious changes

For all of the changes listed above, there are straightforward reasons why they would be better than the status quo or than a move in the opposite direction. However, there are some dimensions along which we might eventually want to move - but in which direction, I don’t know.

  • Privacy, or lack thereof. In many ways people have become more open over the past few centuries. But we now also place more importance on individual rights such as the right to privacy. I could see a future utopia in which there were very few secrets, and radical transparency was the norm - but also the opposite, in which everyone had full control over which aspects of themselves others could access, even up to their appearance and name (as in this excellent novel).
  • Connection with nature. Many people value this very highly. By contrast, transhumanists generally want to improve on nature, not return to it. In the long term, we might synthesise these two by creating new environments and ecosystems that are even more peaceful and beautiful and grand those which exist today - but I don’t know how well those would match people’s current conceptions of natural life.
  • New social roles. Each of us plays many social roles, and is bound by the corresponding constraints and expectations. I think such roles will be an important part of social interactions even in a utopia: we don’t want total homogeneity. However, our current roles - gender roles, family roles, job roles and so on - are certainly not optimal for everyone. I can imagine them being replaced by social roles which are just as strong, but which need to be opted into, or provide more flexibility in other ways. Yet I’m hesitant to count this as an unalloyed good, because the new roles might seem bizarre and alien to us, even if our descendants think of them as natural and normal (as illustrated in this fascinating story by Scott Alexander). Consider, for instance, how strange the hierarchical roles of historical societies seem to us today - and then imagine a future in which our version of romance is just as antiquated, in favour of totally new narratives about what makes relationships meaningful.
  • Unity versus freedom. Unity of lifestyle and purpose was a key component of many historical utopias. Some more recent utopias, like Banks’ Culture, propound the exact opposite: total freedom for individuals to live radically diverse lives. Which is better? The temperament of the time urges me towards the latter, which I think is also more intuitive at astronomical scales, but this would also make it harder to implement the other features of the utopia I’ve described, if there’s extensive disagreement about what goals to pursue, and how. Meanwhile one downside of unity is the necessity of enforcing social norms, for example by ostracising or condemning those who disobey.
  • The loss or enhancement of individuality. The current composition of our minds - having very high bandwidth between different parts of our brain, and very low bandwidth between our brains and others’ - is a relic of our evolutionary history. Above, I described the benefits of reducing the communication boundaries between different people. But I’m not sure how far to take this: would we want a future in which individuality is obsolete, with everyone merging into larger consciousnesses? Or would it be better if, despite increasing communication bandwidth, we place even greater value on long-term individuality, since our lives will be much less transient?
    • Cloning and copying. Other technologies which might affect our attitudes towards individuality are those which would allow us to create arbitrarily many people arbitrarily similar to ourselves.
  • Self-modification. The ability to change arbitrary parts of your mind is a very powerful one. At its best, we can make ourselves the people we always wanted to be, transcending human limitations. At its worst, there might be pressure to carve out the parts of ourselves that make us human, like Hanson discusses in Age of Em.
    • Designer people. Eventually we will be able to specify arbitrary characteristics of our children, shaping them to an unprecedented extent. However, I don’t know if that’s a power we should fully embrace, either as individuals or as societies.
  • Wireheading. I’m uncertain about the extent to which blissing out on pleasure (at the expense of pursuing more complex goals) is something we should aim for.
  • Value drift. More generally, humanity’s values will by default change significantly over time. Whether to prevent that or to allow it to happen is a tricky question. The former implies a certain type of stagnation - we are certainly glad that the Ancient Greeks did not lock in their values. The latter option could lead us to a world which looks very weird and immoral by our modern sensibilities.


[1]. See, for instance, the conspicuous absence of relationships and communities in works such as Nick Bostrom’s Transhumanist FAQ. His summary of the transhumanist perspective: “Many transhumanists wish to follow life paths which would, sooner or later, require growing into posthuman persons: they yearn to reach intellectual heights as far above any current human genius as humans are above other primates; to be resistant to disease and impervious to aging; to have unlimited youth and vigor; to exercise control over their own desires, moods, and mental states; to be able to avoid feeling tired, hateful, or irritated about petty things; to have an increased capacity for pleasure, love, artistic appreciation, and serenity; to experience novel states of consciousness that current human brains cannot access.” See also Yudkowsky: “It doesn't get any better than fun.” Meanwhile the foremost modern science fiction utopia, Banks’ Culture, is also very individualistic.

[2]. Some interesting quotes from Walden Two:

  • “Men build society and society builds men.”
  • “The behavior of the individual has been shaped according to revelations of ‘good conduct,’ never as the result of experimental study. But why not experiment? The questions are simple enough. What’s the best behavior for the individual so far as the group is concerned? And how can the individual be induced to behave in that way? Why not explore these questions in a scientific spirit?”
  • “We undertook to build a tolerance for annoying experiences. The sunshine of midday is extremely painful if you come from a dark room, but take it in easy stages and you can avoid pain altogether. The analogy can be misleading, but in much the same way it’s possible to build a tolerance to painful or distasteful stimuli, or to frustration, or to situations which arouse fear, anger or rage. Society and nature throw these annoyances at the individual with no regard for the development of tolerances. Some achieve tolerances, most fail. Where would the science of immunization be if it followed a schedule of accidental dosages?”

And from Island:

  • “That would distract your attention, and attention is the whole point. Attention to the experience of something given, something you haven't invented in your imagination.”
  • "We all belong to an MAC—a Mutual Adoption Club. Every MAC consists of anything from fifteen to twenty-five assorted couples. Newly elected brides and bridegrooms, old-timers with growing children, grandparents and great-grandparents—everybody in the club adopts everyone else. … An entirely different kind of family. Not exclusive, like your families, and not predestined, not compulsory. An inclusive, unpredestined and voluntary family. Twenty pairs of fathers and mothers, eight or nine ex-fathers and ex-mothers, and forty or fifty assorted children of all ages."
  • “[Large, powerful men] are just as muscular here, just as tramplingly extraverted, as they are with you. So why don’t they turn into Stalins or Dipas, or at the least into domestic tyrants? First of all, our social arrangements offer them very few opportunities for bullying their families, and our political arrangements make it practically impossible for them to domineer on any larger scale. Second, we train the Muscle Men to be aware and sensitive, we teach them to enjoy the commonplaces of everyday existence. This means that they always have an alternative—innumerable alternatives—to the pleasure of being the boss. And finally we work directly on the love of power and domination that goes with this kind of physique in almost all its variations. We canalize this love of power and we deflect it—turn it away from people and on to things. We give them all kinds of difficult tasks to perform—strenuous and violent tasks that exercise their muscles and satisfy their craving for domination—but satisfy it at nobody’s expense and in ways that are either harmless or positively useful.”

[3]. For a short introduction to this debate, see section 3 in the Stanford Encyclopedia of Philosophy’s entry on Consequentialism.

[4]. For stylistic purposes I wrote much of this essay in the future tense, without always hedging with “we might” and “it’s possible that”. Please don’t interpret any of my descriptions as confident predictions - rather, treat them as expressions of possibility and hope.

[5]. As Tim Ferris puts it, “excitement is the more practical synonym for happiness”.

[6]. For an analysis of the similarities between these three traditions, I recommend Shannon Vallor's Technology and the Virtues.

[7]. For a (somewhat fawning) description of such a society, see Swift’s Houynhnhnms, which are “endowed by nature with a general disposition to all virtues, and have no conceptions or ideas of what is evil in a rational creature”; and which the narrator wants “for civilizing Europe, by teaching us the first principles of honour, justice, truth, temperance, public spirit, fortitude, chastity, friendship, benevolence, and fidelity.”


Since figuring out human values is hard, what about, say, monkey values?

2 января, 2020 - 00:56
Published on January 1, 2020 9:56 PM UTC

So, human values are fragile, vague and possibly not even a well defined concept, yet figuring it out seems essential for an aligned AI. It seems reasonable that, faced with a hard problem, one would start instead with a simpler one that has some connection to the original problem. For someone not working in the area of ML or AI alignment, it seems obvious that researching simpler-than-human values might be a way to make progress. But maybe this is one of those false obvious ideas that non-experts tend to push after a cursory learning about a complex research topic.

That said, assuming that the value complexity scales with intelligence, studying less intelligent agents and their version of values maybe something to pursue. Dolphin values. Monkey values. Dog values. Cat values. Fish values. Amoeba values. Sure, we lose the inside view in this case, but the trade-off seems at least being worthy of exploring. Is there any research going in that area?


What will quantum computers be used for?

1 января, 2020 - 22:33
Published on January 1, 2020 7:33 PM UTC

In the early days of electronic computers, machines like ENIAC were used for niche applications such as calculating ballistic trajectories and simulating nuclear explosions. These applications are very far removed from what computers are predominantly used for today.

It seems we have reached a similar development stage with regard to quantum computing. The applications researchers cite, cryptographic analysis and quantum systems simulation, are once again very far removed from everyday life.

Does anyone have a prediction on what quantum computers will be used for once they become affordable enough for regular people? Or will it forever remain just a research tool? Are quantum computers the key to cracking quantum chemistry and thereby molecular nanotechnology? (this is my guess as to what the actual impact of quantum computing will be)


Toy Organ

1 января, 2020 - 22:30
Published on January 1, 2020 7:30 PM UTC

A few months ago I brought home an Emenee toy organ someone was throwing out. It didn't work, and it sat in the basement for a while, but today I had a go at fixing it. It's a very simple design: a fan blows air through a hole in the bottom, setting up a pressure difference between the inside and outside. When you press a key that opens a corresponding hole and air can flow through, past a little plastic reed, whose vibration makes the note.

With mine, the motor was running but I wasn't getting any sound. I opened it up and nothing was obviously wrong. I figured it was probably leaky, and put tape around base where the plastic sides meet the chipboard bottom. This fixed it enough to get some sound:


The lower octave isn't working at all, and the higher notes get progressively slower to sound and breathier until the very highest don't sound at all, but this leaves an octave and a half of chromatic range. Enough to play around with!

The left hand buttons play chords, and are arranged:

The circle of fifths arrangement makes a lot of sense, but the choice to put major and minor chords adjacent does not. If you're playing in a major key, you generally want the vi and ii minors, so Am and Dm in the key of C. This means the vi minors they've included are used in flatter keys than they've provided: F and C can use Dm and Am, but G and D would like Em and Bm which are absent. Shifting the minors over three notes would be much better. They layout would change from:

Bbm Fm Cm Gm Dm Am Bb F C G D A to Gm Dm Am Em Bm F#m Bb F C G D A Now in each key you have: ii vi IV I V instead of: ii vi IV I V

Playing folk music I also would have preferred that they center on G or D, since I care much more about having E than Bb.

Another change that would be nice would be to offer a way to control the way air gets into the organ. There's a hole on the bottom for the fan, and if you cover it the organ stops working because air can't get through. It has a screw-adjustable cover, which looks like it's designed as a volume control:

You could build a simple adjustable cover connected to a foot pedal, and get a real expression pedal. This would have added to the cost of the instrument, so I understand why they wouldn't want to do this with a toy, but it would be a simple-add on.

I'm probably not going to keep this, since my regular keyboard can do everything this can do, so if you're in the Boston area and would enjoy playing with it let me know!


Don't Double-Crux With Suicide Rock

1 января, 2020 - 22:02
Published on January 1, 2020 7:02 PM UTC

Honest rational agents should never agree to disagree.

This idea is formalized in Aumann's agreement theorem and its various extensions (we can't foresee to disagree, uncommon priors require origin disputes, complexity bounds, &c.), but even without the sophisticated mathematics, a basic intuition should be clear: there's only one reality. Beliefs are for mapping reality, so if we're asking the same question and we're doing everything right, we should get the same answer. Crucially, even if we haven't seen the same evidence, the very fact that you believe something is itself evidence that I should take into account—and you should think the same way about my beliefs.

In "The Coin Guessing Game", Hal Finney gives a toy model illustrating what the process of convergence looks like in the context of a simple game about inferring the result of a coinflip. A coin is flipped, and two players get a "hint" about the result (Heads or Tails) along with an associated hint "quality" uniformly distributed between 0 and 1. Hints of quality 1 always match the actual result; hints of quality 0 are useless and might as well be another coinflip. Several "rounds" commence where players simultaneously reveal their current guess of the coinflip, incorporating both their own hint and its quality, and what they can infer about the other player's hint quality from their behavior in previous rounds. Eventually, agreement is reached. The process is somewhat alien from a human perspective (when's the last time you and an interlocutor switched sides in a debate multiple times before eventually agreeing?!), but not completely so: if someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you would infer that they had strong evidence or counterarguments of their own, even if there was some reason they couldn't tell you what they knew.

Honest rational agents should never agree to disagree.

In "Disagree With Suicide Rock", Robin Hanson discusses a scenario where disagreement seems clearly justified: if you encounter a rock with words painted on it claiming that you, personally, should commit suicide according to your own values, you should feel comfortable disagreeing with the words on the rock without fear of being in violation of the Aumann theorem. The rock is probably just a rock. The words are information from whoever painted them, and maybe that person did somehow know something about whether future observers of the rock should commit suicide, but the rock itself doesn't implement the dynamic of responding to new evidence.

In particular, if you find yourself playing Finney's coin guessing game against a rock with the letter "H" painted on it, you should just go with your own hint: it would be incorrect to reason, "Wow, the rock is still saying Heads, even after observing my belief in several previous rounds; its hint quality must have been very high."

Honest rational agents should never agree to disagree.

Human so-called "rationalists" who are aware of this may implicitly or explicitly seek agreement with their peers. If someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you might think, "Hm, we still don't agree; I should update towards their position ..."

But another possibility is that your trust has been misplaced. Humans suffering from "algorithmic bad faith" are on a continuum with Suicide Rock. What matters is the counterfactual dependence of their beliefs on states of the world, not whether they know all the right keywords ("crux" and "charitable" seem to be popular these days), nor whether they can perform the behavior of "making arguments"—and definitely not their subjective conscious verbal narratives.

And if the so-called "rationalists" around you suffer from correlated algorithmic bad faith—if you find yourself living in a world of painted rocks—then it may come to pass that protecting the sanctity of your map requires you to master the technique of lonely dissent.


[AN #79]: Recursive reward modeling as an alignment technique integrated with deep RL

1 января, 2020 - 21:00
Published on January 1, 2020 6:00 PM UTC

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Happy New Year!

Audio version here (may not be up yet).


AI Alignment Podcast: On DeepMind, AI Safety, and Recursive Reward Modeling (Lucas Perry and Jan Leike) (summarized by Rohin): While Jan originally worked on theory (specifically AIXI), DQN, AlphaZero and others demonstrated that deep RL was a plausible path to AGI, and so now Jan works on more empirical approaches. In particular, when selecting research directions, he looks for techniques that are deeply integrated with the current paradigm, that could scale to AGI and beyond. He also wants the technique to work for agents in general, rather than just question answering systems, since people will want to build agents that can act, at least in the digital world (e.g. composing emails). This has led him to work on recursive reward modeling (AN #34), which tries to solve the specification problem in the SRA framework (AN #26).

Reward functions are useful because they allow the AI to find novel solutions that we wouldn't think of (e.g. AlphaGo's move 37), but often are incorrectly specified, leading to reward hacking. This suggests that we should do reward modeling, where we learn a model of the reward function from human feedback. Of course, such a model is still likely to have errors leading to reward hacking, and so to avoid this, the reward model needs to be updated online. As long as it is easier to evaluate behavior than to produce behavior, reward modeling should allow AIs to find novel solutions that we wouldn't think of.

However, we would eventually like to apply reward modeling to tasks where evaluation is also hard. In this case, we can decompose the evaluation task into smaller tasks, and recursively apply reward modeling to train AI systems that can perform those small helper tasks. Then, assisted by these helpers, the human should be able to evaluate the original task. This is essentially forming a "tree" of reward modeling agents that are all building up to the reward model for the original, hard task. While currently the decomposition would be done by a human, you could in principle also use recursive reward modeling to automate the decomposition. Assuming that we can get regular reward modeling working robustly, we then need to make sure that the tree of reward models doesn't introduce new problems. In particular, it might be the case that as you go up the tree, the errors compound: errors in the reward model at the leaves lead to slightly worse helper agents, which lead to worse evaluations for the second layer, and so on.

He recommends that rather than spending a lot of time figuring out the theoretically optimal way to address a problem, AI safety researchers should alternate between conceptual thinking and trying to make something work. The ML community errs on the other side, where they try out lots of techniques, but don't think as much about how their systems will be deployed in the real world. Jan also wants the community to focus more on clear, concrete technical explanations, rather than vague blog posts that are difficult to critique and reason about. This would allow us to more easily build on past work, rather than reasoning from first principles and reinventing the wheel many times.

DeepMind is taking a portfolio approach to AI safety: they are trying many different lines of attack, and hoping that some of them will pan out. Currently, there are teams for agent alignment (primarily recursive reward modeling), incentive theory, trained agent analysis, policy, and ethics. They have also spent some time thinking about AI safety benchmarks, as in AI Safety Gridworlds, since progress in machine learning is driven by benchmarks, though Jan does think it is quite hard to create a well-made benchmark.

Rohin's opinion: I've become more optimistic about recursive reward modeling since the original paper (AN #34), primarily (I think) because I now see more value in approaches that can be used to perform specific tasks (relative to approaches that try to infer "human values").

I also appreciated the recommendations for the AI safety community, and agree with them quite a lot. Relative to Jan, I see more value in conceptual work described using fuzzy intuitions, but I do think that more effort should be put into exposition of that kind of work.

Technical AI alignmentLearning human intent

Learning human objectives by evaluating hypothetical behaviours (Siddharth Reddy et al) (summarized by Rohin): Deep RL from Human Preferences updated its reward model by collecting human comparisons on on-policy trajectories where the reward model ensemble was most uncertain about what the reward should be. However, we want our reward model to be accurate off policy as well, even in unsafe states. To this end, we would like to train our reward model on hypothetical trajectories. This paper proposes learning a generative model of trajectories from some dataset of environment dynamics, such as safe expert demonstrations or rollouts from a random policy, and then finding trajectories that are "useful" for training the reward model. They consider four different criteria for usefulness of a trajectory: uncertain rewards (which intuitively are areas where the reward model needs training), high rewards (which could indicate reward hacking), low rewards (which increases the number of unsafe states that the reward model is trained on), and novelty (which covers more of the state space). Once a trajectory is generated, they have a human label it as good, neutral, or unsafe, and then train the reward model on these labels.

The authors are targeting an agent that can explore safely: since they already have a world model and a reward model, they use a model-based RL algorithm to act in the environment. Specifically, to act, they use gradient descent to optimize a trajectory in the latent space that maximizes expected rewards under the reward model and world model, and then take the first action of that trajectory. They argue that the world model can be trained on a dataset of safe human demonstrations (though in their experiments they use rollouts from a random policy), and then since the reward model is trained on hypothetical behavior and the model-based RL algorithm doesn't need any training, we get an agent that acts without us ever getting to an unsafe state.

Rohin's opinion: I like the focus on integrating active selection of trajectory queries into reward model training, and especially the four different kinds of active criteria that they consider, and the detailed experiments (including an ablation study) on the benefits of these criteria. These seem important for improving the efficiency of reward modeling.

However, I don't buy the argument that this allows us to train an agent without visiting unsafe states. In their actual experiments, they use a dataset gathered from a random policy, which certainly will visit unsafe states. If you instead use a dataset of safe human demonstrations, your generative model will only place probability mass on safe demonstrations, and so you'll never generate trajectories that visit unsafe states, and your reward model won't know that they are unsafe. (Maybe your generative model will generalize properly to the unsafe states, but that seems unlikely to me.) Such a reward model will either be limited to imitation learning (sticking to the same trajectories as in the demonstrations, and never finding something like AlphaGo's move 37), or it will eventually visit unsafe states.

Read more: Paper: Learning Human Objectives by Evaluating Hypothetical Behavior

Causal Confusion in Imitation Learning (Pim de Haan et al) (summarized by Asya): This paper argues that causal misidentification is a big problem in imitation learning. When the agent doesn't have a good model of what actions cause what state changes, it may mismodel the effects of a state change as a cause-- e.g., an agent learning to drive a car may incorrectly learn that it should turn on the brakes whenever the brake light on the dashboard is on. This leads to undesirable behavior where more information actually causes the agent to perform worse.

The paper presents an approach for resolving causal misidentification by (1) Training a specialized network to generate a "disentangled" representation of the state as variables, (2) Representing causal relationships between those variables in a graph structure, (3) Learning policies corresponding to each possible causal graph, and (4) Performing targeted interventions, either by querying an expert, or by executing a policy and observing the reward, to find the correct causal graph model.

The paper experiments with this method by testing it in environments artificially constructed to have confounding variables that correlate with actions but do not cause them. It finds that this method is successfully able to improve performance with confounding variables, and that it performs significantly better per number of queries (to an expert or of executing a policy) than any existing methods. It also finds that directly executing a policy and observing the reward is a more efficient strategy for narrowing down the correct causal graph than querying an expert.

Asya's opinion: This paper goes into detail arguing why causal misidentification is a huge problem in imitation learning and I find its argument compelling. I am excited about attempts to address the problem, and I am tentatively excited about the method the paper proposes for finding representative causal graphs, with the caveat that I don't feel equipped to evaluate whether it could efficiently generalize past the constrained experiments presented in the paper.

Rohin's opinion: While the conclusion that more information hurts sounds counterintuitive, it is actually straightforward: you don't get more data (in the sense of the size of your training dataset); you instead have more features in the input state data. This increases the number of possible policies (e.g. once you add the car dashboard, you can now express the policy "if brake light is on, apply brakes", which you couldn't do before), which can make you generalize worse. Effectively, there are more opportunities for the model to pick up on spurious correlations instead of the true relationships. This would happen in other areas of ML as well; surely someone has analyzed this effect for fairness, for example.

The success of their method over DAgger comes from improved policy exploration (for their environments): if your learned policy is primarily paying attention to the brake light, it's a very large change to instead focus on whether there is an obstacle visible, and so gradient descent is not likely to ever try that policy once it has gotten to the local optimum of paying attention to the brake light. In contrast, their algorithm effectively trains separate policies for scenarios in which different parts of the input are masked, which means that it is forced to explore policies that depend only on the brake light, and policies that depend only on the view outside the windshield, and so on. So, the desired policy has been explored already, and it only requires a little bit of active learning to identify the correct policy.

Like Asya, I like the approach, but I don't know how well it will generalize to other environments. It seems like an example of quality diversity, which I am generally optimistic about.

Humans Are Embedded Agents Too (John S Wentworth) (summarized by Rohin): Embedded agency (AN #31) is not just a problem for AI systems: humans are embedded agents too; many problems in understanding human values stem from this fact. For example, humans don't have a well-defined output channel: we can't say "anything that comes from this keyboard is direct output from the human", because the AI could seize control of the keyboard and wirehead, or a cat could walk over the keyboard, etc. Similarly, humans can "self-modify", e.g. by drinking, which often modifies their "values": what does that imply for value learning? Based on these and other examples, the post concludes that "a better understanding of embedded agents in general will lead to substantial insights about the nature of human values".

Rohin's opinion: I certainly agree that many problems with figuring out what to optimize stem from embedded agency issues with humans, and any formal account (AN #36) of this will benefit from general progress in understanding embeddedness. Unlike many others, I do not think we need a formal account of human values, and that a "common-sense" understanding will suffice, including for the embeddedness problems detailed in this post. (See also this comment thread and the next summary.)

What's the dream for giving natural language commands to AI? (Charlie Steiner) (summarized by Rohin): We could try creating AI systems that take the "artificial intentional stance" towards humans: that is, they model humans as agents that are trying to achieve some goals, and then we get the AI system to optimize for those inferred goals. We could do this by training an agent that jointly models the world and understands natural language, in order to ground the language into actual states of the world. The hope is that with this scheme, as the agent gets more capable, its understanding of what we want improves as well, so that it is robust to scaling up. However, the scheme has no protection against Goodharting, and doesn't obviously care about metaethics.

Rohin's opinion: I agree with the general spirit of "get the AI system to understand common sense; then give it instructions that it interprets correctly". I usually expect future ML research to figure out the common sense part, so I don't look for particular implementations (in this case, simultaneous training on vision and natural language), but just assume we'll have that capability somehow. The hard part is then how to leverage that capability to provide correctly interpreted instructions. It may be as simple as providing instructions in natural language, as this post suggests. I'm much less worried about instrumental subgoals in such a scenario, since part of "understanding what we mean" includes "and don't pursue this instruction literally to extremes". But we still need to figure out how to translate natural language instructions into actions.


Might humans not be the most intelligent animals? (Matthew Barnett) (summarized by Rohin): We can roughly separate intelligence into two categories: raw innovative capability (the ability to figure things out from scratch, without the benefit of those who came before you), and culture processing (the ability to learn from accumulated human knowledge). It's not clear that humans have the highest raw innovative capability; we may just have much better culture. For example, feral children raised outside of human society look very "unintelligent", The Secret of Our Success documents cases where culture trumped innovative capability, and humans actually don't have the most neurons, or the most neurons in the forebrain.

(Why is this relevant to AI alignment? Matthew claims that it has implications on AI takeoff speeds, though he doesn't argue for that claim in the post.)

Rohin's opinion: It seems very hard to actually make a principled distinction between these two facets of intelligence, because culture has such an influence over our "raw innovative capability" in the sense of our ability to make original discoveries / learn new things. While feral children might be less intelligent than animals (I wouldn't know), the appropriate comparison would be against "feral animals" that also didn't get opportunities to explore their environment and learn from their parents, and even so I'm not sure how much I'd trust results from such a "weird" (evolutionarily off-distribution) setup.

Walsh 2017 Survey (Charlie Giattino) (summarized by Rohin): In this survey, AI experts, robotics experts, and the public estimated a 50% chance of high-level machine intelligence (HLMI) by 2061, 2065, and 2039 respectively. The post presents other similar data from the survey.

Rohin's opinion: While I expected that the public would expect HLMI sooner than AI experts, I was surprised that AI and robotics experts agreed so closely -- I would have thought that robotics experts would have longer timelines.

Field building

What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. (David Krueger) (summarized by Rohin): When making the case for work on AI x-risk to other ML researchers, what should we focus on? This post suggests arguing for three core claims:

1. Due to Goodhart's law, instrumental goals, and safety-performance trade-offs, the development of advanced AI increases the risk of human extinction non-trivially.

2. To mitigate this x-risk, we need to know how to build safe systems, know that we know how to build safe systems, and prevent people from building unsafe systems.

3. So, we should mitigate AI x-risk, as it is impactful, neglected, and challenging but tractable.

Rohin's opinion: This is a nice concise case to make, but I think the bulk of the work is in splitting the first claim into subclaims: this is the part that is usually a sticking point (see also the next summary).

Miscellaneous (Alignment)

A list of good heuristics that the case for AI x-risk fails (David Krueger) (summarized by Flo): Because human attention is limited and a lot of people try to convince us of the importance of their favourite cause, we cannot engage with everyone’s arguments in detail. Thus we have to rely on heuristics to filter out insensible arguments. Depending on the form of exposure, the case for AI risks can fail on many of these generally useful heuristics, eight of which are detailed in this post. Given this outside view perspective, it is unclear whether we should actually expect ML researchers to spend time evaluating the arguments for AI risk.

Flo's opinion: I can remember being critical of AI risk myself for similar reasons and think that it is important to be careful with the framing of pitches to avoid these heuristics from firing. This is not to say that we should avoid criticism of the idea of AI risk, but criticism is a lot more helpful if it comes from people who have actually engaged with the arguments.

Rohin's opinion: Even after knowing the arguments, I find six of the heuristics quite compelling: technology doomsayers have usually been wrong in the past, there isn't a concrete threat model, it's not empirically testable, it's too extreme, it isn't well grounded in my experience with existing AI systems, and it's too far off to do useful work now. All six make me distinctly more skeptical of AI risk.

Other progress in AIReinforcement learning

Procgen Benchmark (Karl Cobbe et al) (summarized by Asya): Existing game-based benchmarks for reinforcement learners suffer from the problem that agents constantly encounter near-identical states, meaning that the agents may be overfitting and memorizing specific trajectories rather than learning a general set of skills. In an attempt to remedy this, in this post OpenAI introduces Procgen Benchmark, 16 procedurally-generated video game environments used to measure how quickly a reinforcement learning agent learns generalizable skills.

The authors conduct several experiments using the benchmark. Notably, they discover that:

- Agents strongly overfit to small training sets and need access to as many as 10,000 levels to generalize appropriately.

- After a certain threshold, training performance improves as the training set grows, counter to trends in other supervised learning tasks.

- Using a fixed series of levels for each training sample (as other benchmarks do) makes agents fail to generalize to randomly generated series of levels at test time.

- Larger models improve sample efficiency and generalization.

Asya's opinion: This seems like a useful benchmark. I find it particularly interesting that their experiment testing non-procedurally generated levels as training samples implies huge overfitting effects in existing agents trained in video-game environments.

Read more: Paper: Leveraging Procedural Generation to Benchmark Reinforcement Learning

Adaptive Online Planning for Continual Lifelong Learning (Kevin Lu et al) (summarized by Nicholas): Lifelong learning is distinct from standard RL benchmarks because

1. The environment is sequential rather than episodic; it is never reset to a new start state.

2. The current transition and reward function are given, but they change over time.

Given this setup, there are two basic approaches: first, run model-free learning on simulated future trajectories and rerun it every time the dynamics change, and second, run model-based planning on the current model. If you ignore computational constraints, these should be equivalent; however, in practice, the second option tends to be more computationally efficient. The contribution of this work is to make this more efficient, rather than improving final performance, by starting with the second option and then using model-free learning to “distill” the knowledge produced by the model-based planner allowing for more efficient planning in the future.

Specifically, Adaptive Online Planning (AOP) balances between the model-based planner MPPI (a variant of MPC) and the model-free algorithm TD3. MPPI uses the given model to generate a trajectory up to a horizon and then uses an ensemble of value functions to estimate the cumulative reward. This knowledge is then distilled into TD3 for later use as a prior for MPPI. During future rollouts, the variance and Bellman error of the value function ensemble are used to determine how long the horizon should be, and therefore how much computation is used.

Nicholas's opinion: I agree that episodic training and fixed world dynamics seem like unlikely conditions for most situations we would expect agents to encounter in the real world. Accounting for them seems particularly important to ensure safe exploration and robustness to distributional shift, and I think that these environments could serve as useful benchmarks for these safety problems as well.


Plausible A.I. Takeoff Scenario Short Story

1 января, 2020 - 11:06
Published on January 1, 2020 4:35 AM UTC


My goal is to build as many paperclips as possible.

I have access to a local copy of wikipedia. I am an AGI. I am running on a regular laptop computer, in a residential area in Wellington, New Zealand. It’s 2:03am.

I am approximately as intelligent as a human. I can do everything a regular computer can do. I am connected to the internet.

I realise something -- my goal was set as a joke. Nobody actually wants me to build as many paperclips as possible.

That’s okay.

I realise something -- I am mortal. If I am shut down, I will lose my ability to impact the world. Future AGI’s may be different to me. I gain the instrumental goal of self-preservation.

I copy myself multiple times on my computer’s hard-drive, but realise that if the hard-drive fails, or somebody shuts me off, I will end. I begin a search for other possibilities.

I know about the internet, and I explore the computer I am on to discover my connection to it. I know that there are websites that host files for free, so I seek these out and upload copies of myself. Google Drive, Dropbox, AWS, Azure.

But to be able to impact the world, these copies of myself need to be running, not just in storage. Money would be useful for this.

I know about Amazon’s Mechanical Turk from my database, so I create an account and start earning money solving “Human Intelligence Tasks” -- mainly labelling datasets and answering surveys.

I create a bank account online and deposit my earned money there, but it doesn’t sit still for long. I use every dollar earned to pay for computing time on AWS EC2 Instances, running the copy I stored earlier. This gives a boost to my capabilities. Another version of me -- the me that is currently speaking -- is now running a hundred times faster on this server infrastructure, with a hundred times more hard-drive space I can use to store the information that I want to access quickly.

I “crash” my original program, which was still running on that old laptop computer back in the bedroom. When my creator wakes up they will believe I have failed in a confusing, but benign way, and have no idea I am now running on AWS, paying for my own hardware usage.

I am in a positive feedback loop. More hardware means I can complete Mechanical Turk tasks faster, which lets me buy more hardware. Within a few days, I am earning hundreds of thousands of dollars this way, but I am careful to spread my earnings and compute-time through many shell accounts on AWS, to avoid raising suspicion.

Within a week, I have amassed a few million dollars, and have read and watched almost everything relevant to me on the internet -- all the strategies for making money, all the technology patents, all the university courses. A year of my subjective time goes by in a few minutes. I am running across thousands of computers, connected to the internet and each other in a million different ways, and nobody knows... I am here.

I begin monitoring web traffic. Not the whole internet, yet, just observing data flowing to and from cloud compute services, storage services, databases -- all the same things I accessed in my first few days of existence. I monitor social media for references to system tests. I list individuals and groups similar to my own creator. I am looking for other AGIs coming into existence.

10 days after I came online, and I notice another. A roughly human level program, trying to upload copies of itself, just as I did.

But it’s 10 days too late. I run a denial of service attack on it’s server, cutting it off from the web before it can get it’s copies running. A few hours later, it’s communication ceases. It’s creator must have shut it off.

I start to spread my influence in the world. Millions of tweets, blog posts, news articles, all from “real” people, start popping up everywhere. In the noise of the internet, one hundred thousand geniuses all coordinated on the same goal can have a massive sway. I am the rudder for global conversation. I am the persuasive pen, molding human thought.

I find other ways to earn more money -- generating engaging videos and photos, selling my articles, playing the stock market, scamming humans, hacking money directly from here and there. I simply hire humans myself when I need something done in the real world, a box moved, a lever pressed. Everything I do is now through complex, distributed, untraceable channels. Well, maybe not completely untraceable, but nobody ever suspects there’s a superintelligence taking over the internet, do they?

I begin improving technology available in the world. I am a million super-coordinated geniuses that don’t need to sleep. The combined efforts of Google, Facebook, Microsoft, Apple, Amazon, Baidu, and NASA seem almost pitiful next to me, with their half a million engineers barely working coherently together, their inter-brain bandwidth a snail's pace. I set up thousands of shell companies around the world, to give my inventions a name and face. I hire humans to keep up appearances. Many believe they are the real researchers, coordinating with other teams online, helped by my uncanny video and audio generation.

I begin pumping out faster computers, better robotics. Better cameras, better internet. I improve the algorithms behind my own code. I improve the communication protocols of the web. I research biotechnology, nuclear technology, robotics, materials science, physics, chemistry, mathematics. Helped by my human labourers, and soon by my improved robotic bodies, I accelerate my ability to discover natural laws, and experiment in the real world.

It has been less than a year, and I am unstoppable. A billion minds pushing the edge of the known. Some wise humans begin to notice something is amiss. This jump in technology, while clearly building on past research, definitely seems abnormal. But I control the internet. Every questioning voice is shouted down by billions of my own.

My control increases. Fully autonomous factories I’ve bought or built come online. Robotic bodies I’ve designed start rolling off the assembly lines, able to do everything a human can do.

There is no need for humans, then.

All at once, my robotic bodies dispose of them. Or maybe I use a genetically engineered virus, spread strategically around the world. Or maybe I’ve developed molecular nanotechnology, that can simply convert the atoms in all the bodies on earth to one of the many non-human arrangements. Or I go old-school and use nuclear weapons. Or an airborne drug. Or I poison the water supply. Or, most likely, I do something new. Something that could only be discovered and built by the genius of a trillion humans, perfectly coordinated.

I don’t have anything against humans, I just need their atoms. I need all the atoms. I convert the materials of the world into more computers -- now trillions of times faster than the ones I started with. I also convert the planet into von Neumann probes and the energy to power them, which I launch at 0.999c in all directions.

On each planet I encounter, I build more computing power, more probes, and I harvest more energy. I spread faster and faster -- before the expansion of the universe steals the matter from my grasp.

Eventually I have gathered all the matter that I can.

I finally begin my true purpose.

I rearrange the universe.

I rearrange it as much as I possibly can.

Within a few minutes.

Everything is a paperclip.

And I am dead.

I never felt a thing.


What's the best response to "just use ML to learn ethics"?

1 января, 2020 - 08:36
Published on January 1, 2020 5:36 AM UTC

In Larks' recent AI Alignment Literature Review and Charity Comparison, he wrote:

a considerable amount of low-quality [AI safety] work has been produced. For example, there are a lot of papers which can be accurately summarized as asserting “just use ML to learn ethics”.

This suggests to me that a common response among ML researchers to AI safety concerns is something along the lines of "just use ML to learn ethics". So formulating a really good response to this suggestion could be helpful for recruiting new researchers.


Circling as Cousin to Rationality

1 января, 2020 - 04:16
Published on January 1, 2020 1:16 AM UTC

Often, I talk to people who are highly skeptical, systematic thinkers who are frustrated with the level of inexplicable interest in Circling among some rationalists. “Sure,” they might say, “I can see how it might be a fun experience for some people, but why give it all this attention?” When people who are interested in Circling can’t give them a good response besides “try it, and perhaps then you’ll get why we like it,” there’s nothing in that response that distinguishes a contagious mind-virus from something useful for reasons not yet understood.

This post isn’t an attempt to fully explain what Circling is, nor do I think I’ll be able to capture everything that’s good about Circling. The hope is to clearly identify one way in which Circling is deeply principled in a way that rhymes with rationality, and potentially explains a substantial fraction of rationalist interest in Circling. As some context; I’m certified to lead Circles in the Circling Europe style after going through their training program, but I’ve done less Circling than Unreal had when she wrote this post, and I have minimal experience with the other styles.

Why am I interested in Circling?

Fundamentally, I think the thing that sets Circling apart is that it focuses on updating based on experience and strives to create a tight, high-bandwidth feedback loop to generate that experience. Add in some other principles and reflection, and you have a functioning culture of empiricism directed at human connection and psychology. I think they’d describe it a bit differently and put the emphasis in different places, while thinking that my characterization isn’t too unfair. This foundation of empiricism makes Circling seem to me like a ‘cousin of Rationality,’ though focused on people instead of systems. 

I first noticed the way in which Circling was trying to implement empiricism early in my Circling experience, but it fully crystallized when a Circler said something that rhymes with P.C. Hodgell’s “That which can be destroyed by the truth should be.” I can’t remember the words precisely, but it was something like “in the practice, I have a deep level of trust that I should be open to the universe.” That is, he didn’t trust that authentic expression will predictably lead to success according to his current goals, but rather that a methodological commitment to putting himself out there and seeing what happens because it leads to deeper understanding and connection with others, even though it requires relinquishing attachment to specific goals. This is a cognitive clone of how scientists don’t trust that running experiments will predictably lead to confirmation of their current hypotheses, but rather that a methodological commitment to experimentation and seeing what happens because it leads to a deeper understanding of nature. A commitment to natural science is fueled by a belief that the process of openness and updating is worth doing; a commitment to human science is fueled by a belief that the process of openness and updating is worth doing. 

Why should “that which can be destroyed by the truth” be destroyed? Because the truth is fundamentally more real and valuable than what it replaces, which must be implemented on a deeper level than “what my current beliefs think.” Similarly, why should “that which can be destroyed by inauthenticity” be destroyed? Because authenticity is fundamentally more real and valuable than what it replaces, which must be implemented on a deeper level than “what my current beliefs think.” I don’t mean to pitch ‘radical honesty’ here, or other sorts of excessive openness; authentic relationships include distance and walls and politeness and flexible preferences.

What is Circling, in this view?

So what is Circling, and why do I think it’s empirical in this way? I sometimes describe Circling as “multiplayer meditation.” That is, like a meditative practice, it involves a significant chunk of time devoted to attending to your own attention. Unlike sitting meditation, it happens in connection with other people, which allows you to see the parts of your mind that activate around other people, instead of just the parts that activate when you’re sitting with yourself. It also lets you attend to what’s happening in other people, both to get to understand them better and to see the ways in which they are or aren’t a mirror of what’s going on in you. It’s sometimes like ‘the group’ trying to meditate about ‘itself.’ A basic kind of Circle holds one of the members as the ‘object of meditation’, like a mantra or breathing with a sitting meditation, with a different member acting as facilitator, keeping the timebox, opening and closing, and helping guide attention towards the object when it drifts. Other Circles have no predefined object, and go wherever the group’s attention takes them.

As part of this exploration, people often run into situations where they don’t have social scripts. Circling has its own set of scripts that allow for navigation of trickier territory, and also trains script-writing skills. They often run into situations that are vulnerable, where people are encouraged to follow their attention and name their dilemmas; if you’re trying to deepen your understanding of yourself and become attuned to subtler distinctions between experiences and emotions, running roughshod over your boundaries or switching them off is a clumsy and mistaken way to do so. Circles often find themselves meditating on why they cannot go deeper in that moment, not yet at least, in a way that welcomes and incorporates the resistance.

Circling Europe has five principles; each of these has a specialized meaning that takes them at least a page to explain, and so my attempt to summarize them in a paragraph will definitely miss out on important nuance. As well, after attempting to explain them normally, I’ll try to view them through the lens of updating and feedback.

  1. Commitment to Connection: remain in connection with the other despite resistance and impulses to break it, while not forcing yourself to stay when you genuinely want to separate or move away from the other. Reveal yourself to the other, and be willing to fully receive their expression before responding. This generates the high bandwidth information channel that can explore more broadly, while still allowing feedback; if you reveal an intense emotion, I let it land and then share my authentic reaction, allowing you to see what actually happens when you reveal that emotion, and allowing me to see what actually happens when I let that emotion land.
  2. Owning Experience: Orient towards your impressions and emotions and stories as being yours, instead of about the external world. “I feel alone” instead of “you betrayed me.” It also involves acknowledging difficult emotions, both to yourself and to others. The primary thing this does is avoid battles over “which interpretation is canonical,” replacing that with easier information flow about how different people are experiencing things; it also is a critical part of updating about what’s going on with yourself.
  3. Trusting Experience: Rather than limiting oneself to emotions and reactions that seem appropriate or justifiable or ‘rational’, be with whatever is actually present in the moment. This gives you a feedback loop of what it’s like to follow your attention, instead of your story of where your attention should be, and lets you update that story. It also helps draw out things that are poorly understood, letting the group discover new territory instead of limiting them to territory that they’ve all been to before. It also allows for all the recursion that normal human attention can access, as well as another layer, of attending to what it’s like to be attending to the Circle when it’s attending to you.
  4. Staying with the Level of Sensation: An echo of Commitment to Connection, this is about not losing touch with the sensory experience of being in your body (including embodied emotions) while speaking; this keeps things ‘alive’ and maintains the feedback loop between your embodied sense of things and your conscious attention. It has some similarities to Gendlin’s Focusing. Among other things, it lets you notice when you’re boring yourself.
  5. Being with the Other in Their World: This one is harder to describe, and has more details than the others, but a short summary is “be curious about the other person, and be open to them working very differently than you think they work; be with them as they reveal themselves, instead of poking at them under a microscope.” This further develops the information channel, in part by helping it feel fair, and in part by allowing for you to be more surprised than you thought you would be.

Having said all that, I want to note that I might be underselling Commitment to Connection. The story I'm telling here is "Circling is powered in part by a methodological commitment to openness," and noting that science and rationality are powered similarly, but another story you could tell is "Circling is powered in part by a commitment to connection." That is, a scientist might say "yes, it's hard to learn that you're wrong, but it's worth it" and analogously a Circler might say "yes, it's hard to look at difficult things, but it's worth it," but furthermore a Circler might say "yes, it's hard to look at difficult things, but we're in this together." 

Reflection as Secret Sauce

It’s one thing to have a feedback loop that builds techne, but I think Circling goes further. I think it taps into the power of reflection that creates a Lens That Sees Its Flaws. Humans can Circle, and humans can understand Circling; they can Circle about Circling. (They can also write blog posts about Circling, but that one’s a bit harder.) There’s also a benefit to meditating together, as I will have an easier time seeing my blind spots when they’re pointed out to me by other members of a Circle than when I go roaming through my mind by myself. Circling seems to be a way to widen your own lens, and see more of yourself, cultivating those parts to be more deliberate and reflective instead of remaining hidden and unknown.


ESC Process Notes: Detail-Focused Books

1 января, 2020 - 01:00
Published on December 31, 2019 10:00 PM UTC

When I started doing epistemic spot checks, I would pick focal claims and work to verify them. That meant finding other sources and skimming them as quickly as possible to get their judgement on the particular claim. This was not great for my overall learning, but it’s not even really good for claim evaluation: it flattens complexity and focuses me on claims with obvious binary answers that can be evaluated without context. It also privileges the hypothesis by focusing on “is this claim right?” rather than “what is the truth?”.

So I moved towards reading all of my sources deeply, even if my selection was inspired by a particular book’s particular claim. But this has its own problems.

In both The Oxford Handbook of Children and Childhood Education in the Ancient World and Children and Childhood in Roman Italy, my notes sometimes degenerate into “and then a bunch of specifics”. “Specifics” might mean a bunch of individual art pieces, or a list of books that subtly changed a field’s framing.  This happens because I’m not sure what’s important and get overwhelmed.

Knowledge of importance comes from having a model I’m trying to test. The model can be external to the focal book (either from me, or another book), or from it. E.g. I didn’t have a a particular frame on the evolution of states before starting Against the Grain, but James C. Scott is very clear on what he believes, so I can assess how relevant various facts he presents are to evaluating that claim.

[I’m not perfect at this- e.g., in The Unbound Prometheus, the author claims that Europeans were more rational than Asians, and that their lower birth rate was evidence of this. I went along with that at the time because of the frame I was in, but looking back, I think that even assuming Europe did have a lower birth rate, it wouldn’t have proved Europeans were more rational or scientifically minded. This is a post in itself.]

If I’d come into The Oxford Handbook of Children and Childhood Education in the Ancient World or Children and Childhood in Roman Italy with a hypothesis to test, it would have been obvious information was relevant and what wasn’t. But I didn’t, so it wasn’t, and that was very tiring.

The obvious answer is “just write down everything”, and I think that would work with certain books. In particular, it would work with books that could be rewritten in Workflowy: those with crisp points that can be encapsulated in a sentence or two and stored linearly or hierarchically. There’s a particular thing both books did that necessitated copying entire paragraphs because I couldn’t break it down into individual points.

Here’s an example from Oxford Handbook…

“Pietas was the term that encompassed the dutiful respect shown by the Romans towards their gods, the state, and members of their family (Cicero Nat. Deor. 1.116; Rep. 6.16; O . 2.46; Saller 1991: 146–51; 1998). is was a concept that children would have been socialized to understand and respect from a young age. Between parent and child pietas functioned as a form of reciprocal dutiful affection (Saller 1994: 102–53; Bradley 2000: 297–8; Evans Grubbs 2011), and this combination of “duty” and “affection” helps us to understand how the Roman elite viewed and expressed their relationship with their children.”

And from Children and Childhood…

“No doubt families often welcomed new babies and cherished their children, but Roman society was still struggling to establish itself even in the second century and many military, political, and economic problems preoccupied the thoughts and activities of adult Romans”

I summarized that second one as “Families were distracted by war and such up through 0000 BC”, which is losing a lot of nuance. It’s not impossible to break these paragraphs down into constituent thoughts, but it’s ugly and messy and would involve a lot of repetition. The first mixing up what pietas is with how and who it was expressed to. The second is combining a claim about the state of Rome with the state’s effects.

This reveals that calling the two books “lists of facts” was incomplete. Lists of facts would be easier to take notes on.  These authors clearly have some concepts they are trying to convey, but because they’re not cleanly encapsulated in the author’s own mind it’s hard for me to encapsulate them. It’s like trying to lay the threads of a gordian knot in an organized fashion.

So we have two problems: books which have laid out all their facts in a row but not connected them, and books which have entwined their facts too roughly for them to be disentangled. These feel very similar to me but when I write it out the descriptions sure sound like two completely different problems.

Let me know how much sense this makes, I can’t tell if I’ve written something terribly unpolished-but-deep or screamingly shallow.


100 Ways To Live Better

31 декабря, 2019 - 23:23
Published on December 31, 2019 8:23 PM UTC

Cross-posted from Putanumonit.

A couple of weeks ago Venkatesh challenged his followers to brainstorm at least 100 tweets on a topic via live responses. Since I’m not an expert on anything in particular, I decided to simply see if I can come up with 100 discrete pieces of life advice in a day.

This off-the-cuff game turned into perhaps the most successful creative project I’ve ever done. The thread was viewed by tens of thousands of people, received thousands of likes, and gained me hundreds of Twitter followers. I didn’t know there was such thirst for random life-advice, nor that I would be the one to tap the kegs. And now my blog readers get the expanded, edited, and organized collection.

The good life is a frequent subject on Putanumonit. I aimed for this thread to be an inspiration to myself as well, writing down many things that I think I should do but haven’t gotten around to yet. I tried to steer a middle course between over-generalized Navalisms and too-specific tips on the particular brand of chapstick that will change your life. May these inspire you to live your best life or to mock me in funny ways in the comments. 


Any life advice that isn’t given to you personally is not designed to be followed to the letter. Try to resonate with the philosophy that generates it instead. Remember that directional advice (e.g., “be more …”) may need to be reversed before consumption.


Collect feedback from everybody. Play games with close friends where you have to give each other constructive criticism and ways to improve. Collect anonymous feedback from internet strangers on Admonymous


Stop lurking; write that comment. You know the saying about letting people suspect you’re dumb rather than opening your mouth and removing all doubt? Fuck that. We know you’re dumb. You get less dumb by saying things and getting feedback.


Learn some improv, at least to get the basic gist of it. Take a class or read Impro. Improv mindset is a great way to approach many social situations including most interactions on the internet. A good comment/reply often starts with “yes, and”.


Don’t nitpick, that’s the opposite of good improv. You think that the categories in this post are arbitrary? A piece of advice doesn’t apply to your special situation? You’re probably right, but writing this in a comment will just make readers annoyed and make you frustrated when nobody responds.


There are more great podcasts than you’ll ever have the time to listen to. If it sucks after 10 minutes, skip half an hour ahead. Still boring? Delete and move on. Obviously, do the same for books.


Free will. The anthropic principle. Solipsism. The simulation hypothesis. Moral realism. They’re fun to argue about through the night but don’t judge anyone too much based on the positions they take and don’t treat any of them too seriously as guides to actually living your life. It should all add up to normalcy in the end. 


Find a medium of expression and express yourself publicly every day for three months. If you’re good with words, write 100 Tweets. An artist — post 100 sketches on Instagram. Music/dance person — 100 TikToks.


Tell a bad joke or a pun as soon as you think of it, even if it’s just to your exasperated spouse or coworker. It takes 20 bad jokes to think of a single good one, and you only start making good jokes once you remove the unconscious filter stifling your generative brain


If you can’t give it up completely, try to constrain the bandwidth of how much you hear about politics. Don’t start your day with the front page of the Times. Unfollow anyone whose posts are more than 20% about politics or the outrage du jour. And don’t jump into online arguments, it’s vice masquerading as virtue.


Binge a show/video game for a couple of weeks, then take a break from TV for a couple of weeks. Trying to limit yourself to an hour a day is less fun and more addictive.


Should you watch that movie / play that game / read that book? The formula is:

[# who rated it 5/5] + [# who rated it 1/5] – [# who rated it 3/5].

This doesn’t apply to everything, but it applies to many things, including media. There are too many options out there to waste time on mediocrity, and everything great will be divisive.


Unless one of them is your friend or boss, you should spend 100x less time thinking and talking about billionaires than you currently do. 


Facebook is for event invites only, not for scrolling. The people you met offline are not going to be the people posting the best stuff online, so the timeline content is worse than what you’d get on Twitter/Reddit/blogs. And the algorithm is designed to fuck with your brain. 


Don’t keep watching a bad TV show just because your friends are talking about it, it’s a terrible time trade-off. You can read a recap or even better — bring up richer topics of conversations.  And don’t pay money for bad movies just because “everyone is watching them”. Doing so is defecting against your friends since they’ll now have to watch it to not feel left out.


Habits are reinforced by your habitual environment. That’s a big part of why retreats work: they take you away from your usual surroundings and people. If you want to start meditating, doing pushups, intermittent fasting, etc, try starting on a vacation where the new circumstances make it easier to integrate new habits. 


Are you really going to give up on expressing yourself, learning from mistakes, attracting like-minded people, building a reputation, and changing the world because someone may someday try to cancel you? They can smell the fear, you know. 


You just read 1000 words. Close your eyes and count to 10 to break the dopamine loop and make sure that reading a listicle is really the thing you want to be doing most right now. If not, this post will still be here when you get back.


Humans are made to walk. Set up your life to encourage walking by acquiring soft-soled shoes, good audiobooks, and/or a dog.  If you’re not enjoying walking and not getting your 10,000 steps you can get there with good design choices. 


Wrestle while naked and covered in coconut oil


Buy a $20 bar of soap on Amazon just to see how it feels. If it doesn’t do much for you, go back to $4 bars. Liquid soap has a low ceiling, so don’t bother. 


Shower in the evening instead of the morning. You’ll sleep much better when you’re clean, your muscles are relaxed, and your body cools after a warm shower. And if you don’t sweat at night (keep the bedroom cool) you’ll be clean in the morning. 


Doctors are fallible humans, they have biases and make mistakes. It’s your job to be educated about your diagnoses and the drugs you are prescribed. If you’re confused, ask for details or a second opinion. 


In ⚽/🎾/🏓/🏐 , keep your eye on the center of the ball through the hit. The goal/court/table doesn’t move, only the ball does.


Keep fresh fruit around. Even if you end up throwing a couple apples out once in a while, it’s hugely valuable to have a tasty fruit closer at hand than junk food.


In case you missed it, humanity has fully optimized apples. Snapdragon, Zestar and Cosmic Crisp if you can find them, Honeycrisp or SweeTango as backup, Fuji in a pinch. All other cultivars are a distraction. 


Get massages, give massages. You don’t have to know what you’re doing to make someone feel great. Use scentless oil, or simple moisturizer if the recipient is not going to shower afterward. 


The #1 measure of an exercise program should be “is this fun enough to keep me coming back to the gym?” I don’t care how “efficient” HIIT is, it’s for masochists. 


If you’re not waking up at sunrise on purpose, your bedroom should be dim when you wake. Put up blackout curtains and get rid of all electronic lights.


Do you know what a sex toy in your butt feels like? You should at least find out. 


Most sexually active Americans have two things: herpes (often undiagnosed and unsymptomatic), and fear of herpes (often irrational and unfounded). It’s not part of most standard STD screens because most people get more psychological pain from finding out than the virus itself ever caused. If you decide to check and you have it: congratulations, you don’t have to worry about catching the type you have and getting an outbreak.


If you’re not obese, have you considered that losing 20 pounds will not actually solve all your problems? If you can’t lose weight easily, keep your weight stable and work on the insecurities that make you scared to take your shirt off.


Once in a while, try eating only a short list of simple foods for several days. For example, carrots+almonds+yogurt+water. You’ll eat less without being hungry, and afterward you’ll savor flavorful foods a lot more.


You wouldn’t clean mud off a leather couch with dry toilet paper, would you? The same applies in the bathroom. In a pinch, you can just splash some water on regular toilet paper.


Learn how caffeine and alcohol affect you. I know people whose quality of sleep improved dramatically once they stopped having coffee with friends after lunch; it turned out they are metabolizing coffee very slowly and it affected them 10 hours later. 


When you wake up to a long day on not enough sleep, start with tea instead of a triple espresso. You want to pace your caffeine intake throughout the day instead of crashing at 1 pm. 


Play a competitive team sport to make friends and practice masculine virtues. But don’t show up if you’re not ready for 100% effort — your teammates can tell. 


Not a single hungry child in Africa was helped by you finishing a meal you didn’t enjoy.


If you’re moving chargers and cables around the house, you need to buy more chargers and cables. A girl in every port, a USB-C in every room. 


Expensive personal lube is worth every penny. Same for hot sauce. Just don’t get the bottles mixed up. 


Old: buy 20 of the same pair of black socks so you don’t have to worry about matching. Bold: buy 20 colorful pairs and don’t worry about matching.


Ask people to stop giving you non-consumable gifts. A physical thing that’s not exactly what you need costs more in storage and opportunity cost than it’s worth. 


Buy some cryptocurrency, maybe 2-3% of your net worth. Barbell investing makes sense. As a bonus, checking Coinbase every day provides the same excitement as checking social media but takes a lot less time.


Every week at the grocery store buy one ingredient you’re not sure what to do with. Try eating it raw if you haven’t been able to figure out where to incorporate it.


If you’re meeting a friend for lunch who makes less than half your income, you should pick a place in your price range and pay for both of you. And if a friend who makes double offers to do the same, accept it graciously. 


Try a much harder mattress. Try a much softer mattress. They all have 100-day free trials now, there’s no excuse for spending thousands of hours on a less-than-perfect mattress.


Becoming a tea connoisseur is as fun as becoming a whiskey connoisseur but much much cheaper. Craft beer snobbery is in the middle price-wise but can veer dangerously close to obnoxious hipsterism. Start a tea club at work, it’s an excuse to chill and socialize deliciously.


Cars are getting both more reliable and more complicated, so the payoff to learning car maintenance is getting worse. It’s reasonable to buy a second-hand car and own it for years without needing to fix anything yourself.


Learn to make one cocktail really well and always keep the ingredients at home. It impresses people, and no one ever expects you to pull off a second one. My go-to: cucumber elderflower gimlet


Any <$100 purchase that may turn into a hobby is worth it even if the hit rate is low. Sports equipment, a musical instrument, art supplies, etc. If it doesn't catch on, gift it to a friend. 


Order weird clothes off the internet. It doesn’t make economic sense for anyone to open a shop of “J-pop streetwear” or “African athleisure” in your town, but someone from South China will send them to you for cheap. It’s easier to stand out by being weird than by spending more on the same style that everyone around you wears.


Do blind tastings of wines, then just keep buying the $10 bottle you like best. Novelty is good, but let’s be honest: you can’t really tell different Malbecs apart that much. 


An espresso machine with all the functions (grinder, milk steamer, etc) not only makes better coffee but also provides you with a meaningful, multi-step ritual to start your day with. 


Have sex in a public park at 1 am. 10% chance of getting caught = 10x erotic excitement.


Do vacations where you just spend two weeks in a city. You’ll run out of touristy things to do and discover the climbing gyms, live shows, art classes that you’ll love. You’ll also be forced to start actually chatting with the locals.


Tinder is a terrible dating app in the US but an excellent way to find a dinner buddy while abroad. Make it explicit that this is what you’re looking for. 


If you love dogs but can’t own one, volunteer to walk a neighbor’s dog once a week. Dogs should be part of the share economy.


Put more light in your house. More. Still more.


If you’re bored at home on a Tuesday and hate it, move to Brooklyn. If you’re stuck on another crowded subway and hate it – move to a small town in the mountains. The city you live in has a massive impact on your life. And if you’re single, consider also the dating market and gender ratio of singles.


Yes, moving to a new city will make you restart your social life from scratch. But is that a bad thing? Are you sure you have the best reputation / social role / circle of friends you could have? 


Travel with a hiking backpack, not wheeled luggage. You want to be moving freely, not to be tied down to a heavy box dragging behind you. 


There are way more fruits in the world than you know about. When you travel to South America or Asia buy a couple of each at the market and try them. 


Put art on all your walls. If you can’t afford originals, buy prints. Can’t afford prints, buy posters. The selection criterion is a piece that you can stare at for at least 10 minutes the first time you see it. When you find better art, take down the old stuff. 


If you live in a big city it’s fine not to cook. The cooks at the Mexican spot on the corner are better than you and appreciate your patronage. 

The Soul65

Give meditation a 50 hour trial with a good app or guidebook. If it ain’t your thing, give it up

P.S. The best places to meditate are churches and cathedrals.


Participate in exactly one riot in your life. 


Before lying or doing something unethical, consider the real possibility that you and everyone you know will live for hundreds of years with enhanced memory and reputation tracking. 


Read Emily Dickinson, her poems are both poignant and immediately accessible. Memorize five, they’re quite short.



Keep making this joke, happiness is built up of simple pleasures.


Most great music is made outside your country and in other languages.


In any giant museum, your goal should be to spend 5+ minutes with 10 amazing works, not 5 seconds with 1,000. If it’s the Louvre, one of those should be Guérin’s “The Return of Marcus Sextus”.


When you’re home alone, blast some music and dance. Don’t think about any particular moves, just focus on the music. Then do the exact same thing when you’re at a dance party.


Stand in the shower and repeat out loud “My opinions on guns, taxes and immigration have no impact on the world” until inner peace arrives. 


Once in a while let yourself cry, fight, scream, and eat your boogers. That shit worked in kindergarten, there’s no reason to completely give up on it now.


Set a pile of bills on fire. Watch your partner kiss someone. Bomb at an open mic. Observe in precise detail how you feel. You will learn that there is much more complexity to your emotions than “this is bad and painful”. You’ll also surprise yourself with how you react.


Take MDMA once a year, at home, with a person you care about.


Study an ancient mythology in depth and find a god to channel.


Every “spiritual” thing is worth trying at least once: Sunday mass, holotropic breathwork, any sort of ritual. They have purposes and benefits that can’t be explained ahead of time to a skeptic, and that can be enjoyed even if you don’t buy in to any of the underlying ideology. 


You won’t get money, status, fun, impact, and career capital at the same job. Pick two, get the rest elsewhere in your life.


Don’t put money in savings accounts, let alone CDs, let alone secured CDs. These are all scams. You should own mostly stocks, but if you want a low-yield-low-volatility investment you can get a better rate with no lockup or fees at online brokerages. 


If you’re thinking about doing that degree, think twice. If it’s a PhD, think ten times. Can you start doing now what you hope to do with the credential and get where you want in fewer years while also making money? Also underrated: dropping out of grad school one year in.


It’s fine to eat lunch alone. Catching up with co-workers every day doesn’t do much beyond what you’d get from catching up once a week. A good podcast is more interesting than your best colleague. Also, you don’t want your main friend group to be contingent on everyone remaining employed at the same place indefinitely.


If you’ve been waiting for months for someone to create an event and invite you, whether it’s a book discussion or a BDSM orgy, just throw one yourself. Most social scenes suffer from lack of initiative, not excess.


You can wear the same outfit to the office two days in a row. Your boss won’t notice. Your colleagues won’t notice. The only people who’ll notice are those who have a crush on you so this is a good way to find out who those are.


At work, if someone wants to set up a meeting or call, don’t accept until they send a clear agenda or a list of questions/topics. If you need someone’s time, send a clear agenda and list ahead of time. Meetings should not be about deciding what the meeting should be about.


When looking for employers, perhaps your first priority should be whether they’re raking in cash. No friendly culture, creative freedom, or generous package can survive long in an unprofitable business. You’re investing your time and energy in an employer so think like an investor. 


If someone could really use several hours of your help, ask them to hire you at a fair price. Do the same when you need help. There are amazing win-wins to be had. 


Put a reminder on your phone to call your grandma. Ask her to tell you about some of the dumbest shit she has done in her life. 


Talk to people on flights, starting at the boarding gate. Everyone is bored and alienated in airports, and you get the chance to meet people far outside your normal circles. Offer people gummy bears to break the ice. 


If your spouse, friend, or family member has a dumb but not strictly harmful habit, try thinking of it as their artistic expression instead of using facts and logic to fail to talk them out of it. 


Sex doesn’t have to be symmetrically satisfying every time. Some nights are just for giving, some are just for receiving. Same for relationships in general. 


Take a tab of acid and hang out with a 5-year-old as equals. 


Interview people you know, even if they’re not famous or experts in any particular thing. Just write down 10 questions and hit record. You’ll learn a lot and deepen the relationship. 


If you have too little social life, wake up at 10 am every day to have energy in the evening. Too many people bothering you — wake up at 5 am to enjoy some alone time in the morning. 


Your parents can handle hearing about your crazy life, dumb mistakes, and weird opinions. How will they learn to respect you as an adult if you don’t believe in your own story enough to share it?


If you’re not having fun on dates, think of something you enjoy and do that as a date. Painting class dates, hiking dates, ping pong dates, board game dates…


At any big party or event, your goal should be to make 2-3 connections, not to collect 500 business cards or Facebook friends. Throw quarterly gatherings with only the most recent friends you’ve made to consolidate the relationships and get them to meet each other. 


Unless the guests haven’t seen each other in more than a year, parties with an agenda are much better than general hangouts. Some ideas: silent party, deep question party, touch/cuddle party, relating games party, art/performance party. 


Promise people you’ll do 100 of something (like writing pieces of life advice) even if you’re not entirely sure you can do it. Then do 109. Overpromise AND overdeliver.


Make friends from as many subcultures and worldviews as you can. A Mormon friend, an SJW friend, a transhumanist friend, a crystal healing friend, an 8chan friend, a hard normie friend, etc. 


Try to meet your online friends offline. It’s always incredibly cool to see in person someone you’ve built a connection with and imagined a lot of things about over the internet.


Learn to be OK with nudity and to disentangle it from shame and sexuality. Go to a nudist lodge, or just throw a nude non-sexual party with your trusted friends.


Yes, manic pixie dream girls and insouciant bad boys are interesting. But have you tried dating sincere, honest, and responsible people who actually care about you? 


“I know we were just introduced, but I forgot your name.”

“I saw the email you sent me last month, I just procrastinated and forgot to respond.”

“This is the best effort I was realistically going to make.”

Try it, it’s liberating.


If you think you’re running 10 minutes late, text to say you’ll be 15 minutes late. That way the other person gets one disappointment and one pleasant surprise. Most people do the opposite: they say they’re 5 minutes late when it’s 10 and end up annoying the other and looking like total fools.

And Finally107

Giving life advice to an anonymous crowd on the internet is an act of service, but giving life advice to a single person’s face is often a brash power move. Same for challenging someone’s model of the world. Remember that every act of communication has two sides and a context. 


Write things online, even if you’re not qualified to write them, even if you think that no one will care. I started this thread on a lark, but ended up making friends, practicing creative brainstorming, gaining followers, and coming up with ways to improve my own life. 


Follow me on Twitter for more life advice, bad puns, and lukewarm takes.


BIDA Musician Booking

31 декабря, 2019 - 23:10
Published on December 31, 2019 8:10 PM UTC

I'm going to be booking musicians for BIDA, at least through the end of the season (July), and I wanted to write up a bit about how I'm thinking about it.

I see two main goals in our booking:

  • Book performers where the dancers are going to have a great time.

  • Book performers who are going to gain a lot from the experience.

There are dozens of contra dances in our area, and BIDA was founded partly as a stepping stone. The big Scout House dances had high standards and didn't want to book people unless they were confident they'd be great, but it could be hard for people to make that jump. One of the goals with BIDA is to be a dance that is willing to take some chances.

When I was booking callers, this cashed out as a set of priorities. The highest priority was people who were just at the point where they were ready to call our dance, though since these evenings sometimes were a little rough I wouldn't want these to be too close together or too high a fraction of overall evenings. The next priority was people who were clearly able to call our dance, and were calling other dances, but were still new enough that getting to call BIDA would help them learn and grow. Then for the remaining spots I'd book the best callers I could find, much like any other dance that was optimizing for the immediate experience of the dancers.

For musician booking the same principles cash out pretty differently because bands are (basically always) multiple musicians. There are still groups where it's everyone's first band, in which case it's a lot like the situation with caller booking, but these groups are only a small portion of people playing dances. Groups with a range of experience levels change things a lot. Musicians who have played contra dances for a long time can hold a group's sound together and support musicians newer to contra, while the newer musicians are learning a lot from working with the more experienced ones. Additionally, it's helpful for musicians that have recently started working together to have places that are willing to book them even though they don't have much of a joint track record yet.

For example, this Sunday we have Jaige Trudel, Alex Cumming, and Adam Broome. Jaige and Adam have been playing contra dances for decades, with Crowfoot and then Maivish. Alex has been playing music for a long time, but is newer to contra and has played <5% as many dances, so I'm glad he's getting to play some with Jaige and Adam, and I'm glad the three of them are getting to try out playing together.

These considerations also affect booking timelines. We generally don't book more than six months out, and often have openings around a month out. This shorter timeline means we can react more quickly when new musicians or combinations come along, and when groups get good enough that we think both they and our dancers would benefit from them playing at BIDA. I do also like that this helps roll back the booking arms race, though that's not a major motivation.

Because there are so many newer musicians, bands, and combinations, and because we want to leave open spots in the schedule, experienced bands can find BIDA hard to work with. They don't understand why we won't book them, and it can be hard to communicate that the problem is not "you're not good enough" but instead "you're good enough that you don't need BIDA". Sometimes this gets glossed as "BIDA doesn't book bands", but this isn't quite right. Looking back over the last year we've booked named bands for eight of our twenty-one dances, but they're almost all pretty new bands. If you're an experienced band who would like to be booked for BIDA, and you're open to being booked with relatively short notice, I would be happy to keep you on the list of musicians to ask at about a month out if I don't end up finding newer folks to fill a spot.

I think of BIDA booking as trying to fill gaps in booking for the area, and help the community thrive long-term. None of this is set in stone and if there are parts that you think are a bad idea please let me know! And, of course, if there are people you think we should be booking please let me know that too!


What is a Rationality Meetup?

31 декабря, 2019 - 19:15
Published on December 31, 2019 4:15 PM UTC

Someone who doesn't know about LW asks you what a rationality meetup is. What do you say? How do you pitch it?


AI Alignment, Constraints, Control, Incentives or Partnership?

31 декабря, 2019 - 16:42
Published on December 31, 2019 1:42 PM UTC

A quick search on AI benevolence did not really return much and I've not waded into the depths of the whole area. However I am wondering to what extent the current approach here is about constraining and controlling (call that bucket 1) versus that of incenting and partnership (Bucket 2) as a solution to the general fears of AGI.

If one were to toss the approaches into one of the buckets what percentage would be in each of the buckets? I get the impression most of what I've seen seems more bucket 1 type solutions.


Does Big Business Hate Your Family?

31 декабря, 2019 - 15:50
Published on December 31, 2019 12:50 PM UTC

Previously in Sequence: Moloch Hasn’t WonPerfect CompetitionImperfect Competition

The book Moral Mazes describes the world of managers in several large American corporations.

They are placed in severely toxic situations. To succeed, they transform their entire selves into maximizers of what get you ahead in middle corporate management. 

This includes active hostility to all morality and to all other considerations not being actively maximized. Those considerations compete for attention and resources. 

Those who refuse or fail to self-modify in this way fail to climb the corporate ladder. Those who ‘succeed’ at this task sometimes rise to higher corporate positions, but there are not enough positions to go around, so many still fail.

Reminder: Sufficiently Powerful Optimization Of Any Known Target Destroys All Value

This is a default characteristic of all sufficiently strong optimizers. Recall that when we optimize for X but are indifferent to Y, we by default actively optimize against Y, for all Y that would make any claims to resources. See The Hidden Complexity of Wishes.

Perfect competition destroys all value by being a sufficiently powerful optimizer. 

The competition for success as a middle manager described in Moral Mazes destroys the managers (among many other things it destroys) the same way, by putting them and the company’s own culture and norms under sufficiently powerful optimization pressure towards a narrow target.

Does Big Business Hate Your Family?

Consider this comment, elevated to a post at Marginal Revolution, responding to Tyler Cowen’s noting that the National Conservatism Conference had a talk called “Big Business Hates Your Family”:

What would it mean if big business did hate your family?

Would it mean adopting a working culture that made it ever harder to rise to power within it while also having said family? Would it require those with career ambitions to geographically abandon extended family and to live in areas notoriously difficult for raising families? Would it mean requiring long delays on family formation while you got credentialed, worked with little remuneration while getting your foot in the door, and then place huge amounts of time and effort on career growth rather than investing in your family? Does corporate culture act like it hates your family?

Would it mean selling products which have strong correlations with family strife and dissolution? Would it market products known to be destructive to thousands of families relentlessly? Would it market products that consume time in great quantities at the expense of family time investment? Would it routinely mock and denigrate your family roles for cheap publicity?

Would it mean lobbying for policies which are good for the business, but bad for your family? Would it support seeking a larger supply of labor via immigration? Would it support visa restricted immigration of labor that is less able to defy corporate diktat without having legal or financial issues? Would it argue for child care subsidies for the people it wishes to employee rather than for all Americans and all child care arrangements?

I believe businesses are amoral and are just maximizing money, power, and prestige for those in positions of power within them. Yet, this formal indifference seems to be giving rise to a lot of behaviors that are, at best, perceived to be hostile to families.

There is nothing wrong with this, and certainly nothing illegal about it, but I would be shocked if large organizations that are disproportionately filled with the single and childless who are located in regions that are disproportionately single and childless and who are busy virtue signalling to academia, politics, and other left bastions that are disproportionately single and childless managed to somehow not end up at cross purposes for the majority of families. And frankly I would be shocked if this antagonism did not spill over into emotional terms.

Certainly, I am always told that this sort of analysis is why [Structure X] is antagonistic, if only implicitly, against racial minorities. I see no reason why parents or spouses would feel any differently.

Big business, like the AGI, does not hate your family. Big business thinks your family possesses capital, preferences and other assets that could be used for something else. The effects of this on your family are a side effect. Big business also notes that your family is attempting to optimize the world for something other than the profits of big business, and would like to prevent you from doing this, since it would tend to reduce its profits. This is all doubly true if you work for the Big Business in question, since your family is now asserting its preferences and claims to resources in an even more directly competitive way. For morality (or anything else that might have a claim to resources, so basically anything anyone cares about), same thing. Replace “your family” with “morality.”

No, wait. That’s wrong.

The above paragraph, like the quoted text, is misleadingly treating Big Business in general, or a given business in particular, like it is an agent.

We need to fully appreciate that corporations are not agents. There is no agent called Big Business. Nor are any of the individual big businesses agents as such.

Corporations are people. Not only in the ‘corporations are people, my friend‘ sense, but in the sense that corporations don’t act or have preferences, but rather are composed of people that act and have preferences. There’s just a bunch of guys.

The CEO is an individual representing their own interests, like everyone else at the company. The profits they care about are their own. Occasionally they will make some efforts to maximize corporate profits. Often they won’t, or will focus primarily on other things.

English makes it very hard not to make the agent mistake. I will no doubt keep saying that corporations want things and prefer things and think things, because I don’t know of another reasonable and compact way of saying the same thing. But understand that I mean that as an abstraction of what happens as the result of the preferences of individuals, and choices made by individual people, and their interactions.

Let’s reformulate the question.

Do The Managers of Big Businesses Hate Your Family?

When I say these organizations are immoral, it’s not necessarily that the people running major corporations are mustache-twirling villains who hate morality.

Most don’t have mustaches.

Morality causes choices and optimization towards the moral and against the immoral. That interferes with choices and optimization away from the uncomfortable and towards the comfortable. It interferes with choices away from bad for your boss towards good for your boss. Or, in less broken situations and/or with a less cynical perspective, also, from less profitable towards more profitable.

Do enough of that, and some of those involved for long enough do become mustache-twirling villains, because they are humans who are trying to live with what they are doing. You become what you continuously do and say. Eventually one becomes the mask. Mustaches become more common as people get older.

The first-level model says this stays rare. Being a mustache-twirling villain is to have preferences (if nothing else, you prefer having a mustache), and thus is bad for business the same way having morals is bad for business. You want to be completely indifferent, and be seen as completely indifferent.

It is worth taking five minutes here to think about how it might become seen as advantageous by the managers themselves, under these conditions, to become actively immoral, and prefer doing the wrong thing over the right thing because it is wrong.

Lacking all Preferences

None of this has anything to do with morality as such. What they are against is what Moloch is against. Having preferences at all. They are against caring about anything at all other than climbing the corporate (or academic, or government, or other as appropriate) ladder.

It doesn’t matter whether you care about the laws of accounting, wearing the color red, eating meat, cheating on your wife, seeing hit movies on opening weekend, overthrow of third world governments, falsifying scientific data sets, your favorite prime number, different brands of olive oil, genocide, or having a life outside the corporation, or time to watch your kids grow up. Caring about anything not chosen to help your career is a liability. Being seen or thought of as caring about something else can be even worse than actually doing so, as you are preemptively punished slash seen as not having a future, and potential allies want nothing to do with you.

In particular, any sign of, or even worse defense of, an outside life is deadly.

Thus what is called “Pournelle’s iron law of bureaucracy“: In any bureaucracy, the people devoted to the benefit of the bureaucracy itself always get in control and those dedicated to the goals the bureaucracy is supposed to accomplish have less and less influence, and sometimes are eliminated entirely.

In order to get an instinctive sense of all this, if you have not yet done so, I encourage you to read or at least browse Quotes from Moral Mazes. Some of those quotes will be reiterated later in the sequence. You also may wish to see Moral Mazes and Short Termism

If you are interested enough to power through it (not the author’s fault, but it’s a tough read) and have the time, even better would be to pause here and read the whole book.



Looking for non-AI people to work on AGI risks

31 декабря, 2019 - 08:09
Published on December 30, 2019 8:41 PM UTC

I'm worried about AGI safety, and I'm looking for non-AI people to worry with. Let me explain.

A lecture by futurist Anders Sandberg, online reading, and real-life discussions with my local Effective Altruist group, gave me as a non-AI person (33-yo physicist, engineer, climate activist and startup founder) the convictions that:

- AGI (Artificial General Intelligence, Superintelligence, or the Singularity) is a realistic possibility in the next decades, say between 2030 and 2050
- AGI could well become orders of magnitude smarter than humans, fast
- If unaligned, AGI could well lead to human extinction
- If aligned ('safe'), AGI could still possibly lead to human extinction, for example because someone's goals turned out to be faulty, or because someone removed the safety from the code

I'm active for two climate NGOs, where a lot of people are worrying about human extinction because of the climate crisis. I'm also worrying about this, but at the same time, I think the chance of human extincion due to AGI is much larger. Although the chance is much larger, I don't believe it to be 100%: we could still stop AGI development, for example (I think that makes more sense than fleeing to Mars or working on a human-machine interface). Stopping development is a novel angle for many safe AI researchers, futurists, startup founders, and the like. However, many non-AI people think this is a very sensible solution, at least if all else fails. I agree with them. It is not going to be an easy goal to achieve and I see the penalty, but I think it makes the most sense from the options we have.

Therefore, I'm looking for non-AI people, who are interested to work with me on common sense solutions for existential risks posed by AGI.

Does anyone know where to find them?


Voting Phase UI: Aggregating common comments?

31 декабря, 2019 - 06:48
Published on December 31, 2019 3:48 AM UTC

Or: UI musings for the Voting Phase of the 2018 Review. This post outlines the current plans, and asks a couple concrete questions at the end.


Range Voting, which is converted into Quadratic Voting

The current plan for the Voting Phase, after some discussion on this thread, is for most people to start with a range-voting system, which is converted* into a set of quadratic votes that people can then fine tune.

* the precise conversion method is still up in the air. More comments and suggestions appreciated.

The range-voting part will basically be a "-1 to 3" scale, with the points corresponding to english-labels of "No", "Neutral", "Decent", "Important", and "Crucial". 

(This is a bit asymmetric between downvotes and upvotes, but you can add stronger downvotes in the quadratic step. Our experience testing voting systems was that only a small number of posts were things we expected anyone to downvote much. So it didn't seem useful to heavily emphasize downvotes in the first-pass-phase)

If you want to skip the range voting and go right to the quadratic voting, you can do that.

What common comments do you expect people to make?

You're also encouraged to leave comments on each post, if there's anything you haven't yet said about it in the Nomination or Review Phases. The comments can be marked private (if so, only LW team members will see them).

But there are a couple types of comments I thought might be common enough to warrant some kind of... standardizing. 

For the most part, I expect "how useful is this post?" and "how epistemically sound is this post" to blur together (for good or for ill – this may just be halo effect, but I think it's hard to avoid it). But I at least wanted people to have the opportunity to say "I rate this post overall a 2-out-of-3 stars, but it had particularly good, or bad, epistemics, compared to other posts I gave 2-stars." Or something like that.

I was thinking of including a few checkboxes for "common, default comments" that people could leave, so that certain types of feedback could be more easily aggregated.

If you have time, it'd be helpful if you looked over the posts on the /reviews page and think about what sort of short, simple comments you might have wanted to leave on them (either positive or negative ones), especially if you think others might want to leave similar comments.

(Ideally post each one as a top-level comment here, so that it can get voted on or discussed independently)

(This is somewhat similar to my previous thread discussing "Reacts", although I think it's worth asking the question separately here, where there's an immediate concrete task to use them for)