## Вы здесь

### Reason isn't magic

Новости LessWrong.com - 7 часов 35 минут назад
Published on June 18, 2019 4:04 AM UTC

Discuss

### Discussion Thread: The AI Does Not Hate You by Tom Chivers

Новости LessWrong.com - 11 часов 57 минут назад
Published on June 17, 2019 11:43 PM UTC

This post is provided as convenient place to discussion of the new book, The AI Does Not Hate You by Tom Chivers, which covers LessWrong and rationalist community.

The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the WorldThis is a book about AI and AI risk. But it's also more importantly about a community of people who are trying to think rationally about intelligence, and the places that these thoughts are taking them, and what insight they can and can't give us about the future of the human race over the next few years. It explains why these people are worried, why they might be right, and why they might be wrong. It is a book about the cutting edge of our thinking on intelligence and rationality right now by the people who stay up all night worrying about it.

Discuss

### Рациональное додзё. Внутренние конфликты

События в Кочерге - 17 июня, 2019 - 22:10
Четверг, 20 июня, 16:30

### Research Agenda v0.9: Synthesising a human's preferences into a utility function

Новости LessWrong.com - 17 июня, 2019 - 20:46
Published on June 17, 2019 5:46 PM UTC

I'm now in a position where I can see a possible route to a safe/survivable/friendly Artificial Intelligence being developed. I'd give a 10+% chance of it being possible this way, and a 95% chance that some of these ideas will be very useful for other methods of alignment. So I thought I'd encode the route I'm seeing as research agenda; this is the first public draft of it.

Clarity, rigour, and practicality: that's what this agenda needs. Writing this agenda has clarified a lot of points for me, to the extent that some of it now seems, in retrospect, just obvious and somewhat trivial - "of course that's the way you have to do X". But more clarification is needed in the areas that remain vague. And, once these are clarified enough for humans to understand, they need to be made mathematically and logically rigorous - and ultimately, cashed out into code, and tested and experimented with.

So I'd appreciate any comments that could help with these three goals, and welcome anyone interested in pursuing research along these lines over the long-term.

0 The fundamental idea

This agenda fits itself into the broad family of Inverse Reinforcement Learning: delegating most of the task of inferring human preferences to the AI itself. Most of the task, since it's been shown that humans need to build the right assumptions into the AI, or else the preference learning will fail.

To get these "right assumptions", this agenda will look into what preferences actually are, and how they may be combined together. There are hence four parts to the research agenda:

1. A way of identifying the (partial) preferences of a given human H.
2. A way for ultimately synthesising a utility function UH that is an adequate encoding of the partial preferences of a human H.
3. Practical methods for estimating this UH, and how one could use the definition of UH to improve other suggested methods for value-alignment.
4. Limitations and lacunas of the agenda: what is not covered. These may be avenues of future research, or issues that cannot fit into the UH paradigm.

There has been a myriad of small posts on this topic, and most will be referenced here. Most of these posts are stubs that hint to a solution, rather than spelling it out fully and rigorously.

The reason for that is to check for impossibility results ahead of time. The construction of UH is deliberately designed to be adequate, rather than elegant (indeed, the search for an elegant UH might be counterproductive and even dangerous, if genuine human preferences get sacrificed for elegance). If this approach is to work, then the safety of UH has to be robust to different decisions in the synthesis process (see Section 2.8, on avoiding disasters). Thus, initially, it seems more important to find approximate ideas that cover all possibilities, rather than having a few fully detailed sub-possibilities and several gaps.

Finally, it seems that if a sub-problem is not formally solved, we stand a much better chance of getting a good result from "hit it with lots of machine learning and hope for the best", than we would if there were huge conceptual holes in the method - a conceptual hole meaning that the relevant solution is broken in an unfixable way. Thus, I'm publishing this agenda now, where I see many implementation holes, but no large conceptual holes.

A word of warning here, though: with some justification, the original Dartmouth AI conference could also have claimed to be confident that there were no large conceptual holes in their plan of developing AI over a summer - and we know how wrong they turned out to be. With that thought in mind, onwards with the research agenda.

0.1 Executive summary: synthesis process

The first idea of the project is to identify partial preferences as residing within human mental models. This requires identifying the actual and hypothetical internal variables of a human, and thus solving the "symbol grounding problem" for humans; ways of doing that are proposed.

The project then sorts the partial preferences into various categories of interest (basic preferences about the world, identity preferences, meta-preferences about basic preferences, global meta-preferences about the whole synthesis project, etc...). The aim is then to synthesise these into a single utility function UH, representing the preference of the human H (at a given time or short interval of time). Different preference categories play different roles in this synthesis (eg object-level preferences get aggregated, meta-preferences can modify the weights of object-level preferences, global meta-preferences are used at the design stage, and so on).

The aims are to:

1. Ensure the synthesis UH has good properties and reflects H's actual preferences, and not any of H's erroneous factual beliefs.
2. Ensure that highly valued preferences weight more than lightly held ones, even if the lightly held one is more "meta" that the other.
3. Respect meta-preferences about the synthesis as much as possible, but...
4. ...always ensure that the synthesis actually reaches an actual non-contradictory UH.

To ensure point 4. and 2., there will always be an initial way of synthesising preferences, which certain meta-preferences can then modify in specific ways. This is designed to resolve contradictions (when "I want a simple moral system" and "value is fragile and needs to be preserved" are both comparably weighted meta-preferences) and remove preference loops ("I want a simple moral system" is itself simple and could reinforce itself; "I want complexity in my values" is also simple and could undermine itself).

The "good properties" of 1. are established, in large part, by the global meta-preferences that don't comfortably sit within the synthesis framework. As for erroneous beliefs, if H wants to date H′ because they think that would make them happy and respected, then an AI will synthesise "being happy" and "being respected" as preferences, and would push H away from H′ if H were actually deluded about what dating them would accomplish.

That is the main theoretical contribution of the research agenda. It then examines what could be done with such a theory in practice, and whether the theory can be usefully approximated for constructing an actual utility function for an AI.

0.2 Executive summary: agenda difficulty and value

One early commentator on this agenda remarked:

[...] it seems like this agenda is trying to solve at least 5 major open problems in philosophy, to a level rigorous enough that we can specify them in code:

1. The symbol grounding problem.
2. Identifying what humans really care about (not just what they say they care about, or what they act like they care about) and what preferences and meta-preferences even are.
3. Finding an acceptable way of making incomplete and inconsistent (meta-)preferences complete and consistent.
4. Finding an acceptable way of aggregating many people's preferences into a single function[1].
5. The nature of personal identity.

I agree that AI safety researchers should be more ambitious than most researchers, but this seems extremely ambitious, and I haven't seen you acknowledge the severe outside-view difficulty of this agenda.

This is indeed an extremely ambitious project. But, in a sense, a successful aligned AI project will ultimately have to solve all of these problems. Any situation in which most of the future trajectory of humanity is determined by AI, is a situation where there are solutions to all of these problems.

Now, these solutions may be implicit rather than explicit; equivalently, we might be able to delay solving them via AI, for a while. For example, a tool AI solves these issues by being contained in such a way that human judgement is capable of ensuring good outcomes. Thus humans solve the grounding problem, and we design our questions to the AI to ensure compatibility with our preferences, and so on.

But as the power of AIs increase, humans will become confronted by situations they have never been in before, and our ability to solve these issues diminish (and the probabilities increase that we might be manipulated or fall into a bad attractor). This transition may sneak up on us, so it is useful to start thinking of how to a) start solving these problems, and b) start identifying these problems crisply so we can know when and whether they need to be solved, and when we are moving out of the range of validity of the "trust humans" solution. For both these reasons, all the issues will be listed explicitly in the research agenda.

A third reason to include them is so that we know what we need to solve those issues for. For example, it is easier to assess the quality of any solution to symbol grounding, if we know what we're going to do with that solution. We don't need a full solution, just one good enough to define human partial preferences.

And, of course, we need to also consider scenarios where partial approaches like tool AI just don't work, or only work if we solve all the relevant issues anyway.

Finally, there is a converse: partial solutions to problems in this research agenda can contribute to improving other methods of AI safety alignment. Section 3 will look into this in more detail. The basic idea is that, to improve an algorithm or an approach, it is very useful to know what we are ultimately trying to do (eg compute partial preferences, or synthesise a utility function with certain acceptable properties). If we rely only on making local improvements, guided by intuition, we may ultimately get stuck when intuition runs out; and the improvements are more likely to be ad-hoc patches than consistent, generalisable rules.

0.3 Executive aside: the value of approximating the theory

The theoretical construction of UH in Sections 1 and 2 is a highly complicated object, involving millions of unobserved counterfactual partial preferences and a synthesis process involving higher-order meta-preferences. Section 3 touches on how UH could be approximated, but, given its complexity, it would seem that the answer would be "only very badly".

And there is a certain sense in which this is correct. If UV is the actual idealised utility defined by the process, and VH is the approximated utility that a real-world AI could compute, then it is likely[2] that UH and VH will be quite different in many formal senses.

But there is a certain sense in which this is incorrect. Consider many of the AI failure scenarios. For example, imagine that the AI, for example, extinguished all meaningful human interactions because these can sometimes be painful and the AI knows that we prefer to avoid pain. But it's clear to us that most people's partial preferences will not endorse total loneliness as good outcome; if it's clear to us, then it's a fortiori clear to a very intelligent AI; hence the AI will avoid that failure scenario.

One should be careful with using arguments of this type, but it is hard to see how there could be a failure mode that a) we would clearly understand is incompatible with proper synthesis of UH, but b) a smart AI would not. And it seems that any failure mode should be understandable to us, as a failure mode, especially given some of the innate conservatism of the construction of UH.

Hence, even if VH is a poor approximation of UH in a certain sense, it is likely an excellent approximation of VH in the sense of avoiding terrible outcomes. So, though d(UH,VH) might be large for some formal measure of distance d, a world where the AI maximises VH will be highly ranked according to UH.

0.4 An inspiring just-so story

This is the story of how evolution created humans with preferences, and what the nature of these preferences are. The story is not true, in the sense of accurate; instead, it is intended to provide some inspiration as to the direction of this research agenda. This section can be skipped.

In the beginning, evolution created instinct driven agents. These agents had no preferences or goals, nor did they need any. They were like Q-learning agents: they knew the correct action to take in different circumstances, but that was it. Consider baby turtles that walk towards the light upon birth, because, traditionally, the sea was lighter than the land - of course, this behaviour fails them in the era of artificial lighting.

But evolution has a tiny bandwidth, acting once per generation. So it created agents capable of planning, of figuring out different approaches, rather than having to follow instincts. This was useful, especially in varying environments, and so evolution offloaded a lot of its "job" onto the planning agents.

Of course, to be of any use, the planning agents need to be able to model their environment to some extent (or else their plans can't work) and had to have preferences (or else every plan was as good as another). So, in creating the first planning agents, evolution created the first agents with preferences.

Of course, evolution is a messy, undirected process, so the process wasn't clean. Planning agents are still riven with instincts, and the modelling of the environment is situational, used for when it was needed, rather than some consistent whole. Thus the "preferences" of these agents were underdefined and sometimes contradictory.

Finally, evolution created agents capable of self-modelling and of modelling other agents in their species. This might have been because of competitive social pressures as agents learn to lie and detect lying. Of course, this being evolution, this self-and-other-modelling took the form of kludges built upon spandrels built upon kludges.

And then arrived humans, who developed norms and norm-violations. As a side effect of this, we started having higher-order preferences as to what norms and preferences should be. But instincts and contradictions remained - this is evolution, after all.

And evolution looked upon this hideous mess, and saw that it was good. Good for evolution, that is. But if we want it to be good for us, we're going to need to straighten out this mess somewhat.

1 The partial preferences of a human

The main aim of this research agenda is to start with a human H at or around a given moment t and produces a utility function UHt which is an adequate synthesis of the human's preferences at the time t. Unless the dependence on t needs to be made explicit, this will simply be designated as UH.

Later sections will focus on what can be done with UH or the methods used for its construction; this section and the next will focus solely on that construction. It is mainly based on these posts, with some commentary and improvements.

Essentially the process is to identify human preferences and meta-preferences within human (partial) mental model (Section 1), and find some good way of synthesising these into a whole UH (Section 2).

Partial preferences (see Section 1.1) will be decomposed into:

1. Partial preferences about the world.
2. Partial preferences about our own identity.
3. Partial meta-preferences about our preferences.
4. Partial meta-preferences about the synthesis process.
5. Self-referential contradictory partial meta-preferences.
6. Global meta-preferences about the outcome of the synthesis process.

This section and the next will lay out how preferences of types 1, 2, 3, and 4 can be used to synthesise the UH. Section 2 will conclude by looking what role preferences of type 6 can play. Preferences of type 5 are not dealt with in this agenda, and remain a perennial problem (see Section 4.5).

1.1 Partial models, partial preferences

As was shown in the paper "Occam's razor is insufficient to infer the preferences of irrational agents", an agent's behaviour is never enough to establish their preferences - even with simplicity priors or regularisation (see also this post and this one).

Therefore a definition of preference needs to be grounded in something other than behaviour. There are further arguments, presented here, as to why a theoretical grounding is needed even when practical methods are seemingly adequate; this point will be returned to later.

The first step is to define a partial preference (and a partial model for these to exist in). A partial preference is a preference that exists within a human being's internal mental model, and which contrasts two[3] situations along a single axis of variation, keeping other aspects constant. For example, "I wish I was rich (rather than poor)", "I don't want to go down that alley, lest I get mugged", and "this is much worse if there are witnesses around" are all partial preferences. A more formal definition of partial preferences, and the partial mental model in which they exist, is presented here.

Note that this is one of the fundamental theoretical underpinnings of the method. It identifies human (partial) preferences as existing within human mental models. This is a "normative assumption": we choose to define these features as (partial) human preferences, the universe does not compel us to do so.

This definition gets around the "Occam's razor" impossibility result, since these mental models are features of the human brain's internal process, not of human behaviour. Conversely, this also violates certain versions of functionalism, precisely because the internal mental states are relevant.

A key important feature is to extract not only the partial preferences itself, but the intensity of the preferences, referred to as its weight. This will be key in combining the preferences together (technically, we only need the weight relative to other partial preferences).

1.2 Symbol grounding

In order to interpret what a partial model means, we need to solve the old problem of symbol grounding. "I wish I was rich" was presented as an example of a partial preference; but how can we identify "I", "rich" and the counterfactual "I wish", all within the mess of the neural net that is the human brain?

To ground these symbols, we should approach the issue of symbol grounding empirically, by aiming to predict the values of real world-variables through knowledge of internal mental variables (see also the example presented here). This empirical approach can provide sufficient grounding for the purposes of partial models, even if symbol grounding is not solved in the traditional linguistic sense of the problem.

This is because each symbol has a web of connotations, a collection of other symbols and concepts that co-vary with it, in normal human experience. Since the partial models are generally defined to be within normal human experiences, there is little difference between any symbols that are strongly correlated.

To formalise and improve this definition, we'll have to be careful about how we define the internal variables in the first place - overly complicated or specific internal variables can be chosen to correlate artificially well with external variables. This is, essentially, "symbol grounding overfitting".

Another consideration is the extent to which the model is conscious or subconscious; aliefs, for example, could be modelled as subconscious partial preferences. For consciously endorsed aliefs, this is not much of a problem - we instinctively fear touching fires, and don't desire to lose that fear. But if we don't endorse that alief - for example, we might fear flying and not want to fear it - this becomes more tricky. Things get confusing with partially endorsed aliefs: amusement park rides are extremely safe, and we wouldn't want to be crippled with fear at the thought of going on one. But neither would we want the experience to feel perfectly bland and safe.

1.3 Which (real and hypothetical) partial models?

Another important consideration is that humans do not have, at the moment t, a complete set of partial models and partial preferences. They may have a single partial model in mind, with maybe a few others in the background - or they might not be thinking about anything like this at all. We could extend the parameters to some short period around the time t (reasoning that people's preferences rarely change in such a short time), but though that gives us more data, it doesn't give us nearly enough.

The most obvious way to get a human to produce an internal model is to ask them a relevant question. But we have to be careful about this - since human values are changeable and manipulable, the very act of asking a question can cause humans to think in certain directions, and even create partial preferences where none existed. The more interaction between the questioner and the human, the more extreme preferences can be created. If the questioner is motivated to maximise the utility function that it is also computing (i.e. if the UH is an online learning process), then the questioner can rig or influence the learning process.

Fortunately, there are ways of removing the questioner's incentives to rig or influence the learning process.

Thus the basic human preferences at time t are defined to be those partial models produced by "one-step hypotheticals". These are questions that do not cause the human to be put in unusual mental situations, and try and minimise any departure from the human's base-state.

Some preferences are conditional (eg "I want to eat something different from what I've eat so far this week"), as are some meta-preferences (eg "If I hear a convincing argument about X being good, I want to prefer X"), which could violate the point of the one-step hypothetical. Thus conditional (meta-)preferences are only acceptable if their conditions are achieved by short streams of data, unlikely to manipulate the human. They also should be weighted more if they fit a consistent narrative of what the human is/wants to be, rather than being ad hoc (this will be assessed by machine learning, see Section 2.4).

Note that among the one-step hypotheticals, are included questions about rather extreme situations - heaven and hell, what to do if plants were conscious, and so on. In general, we should reduce the weight[4] of partial preferences in extreme situations[5]. This is because of the unfamiliarity of these situations, and because the usual human web of connotations between concepts may have broken down (if a plant was conscious, would it be a plant in the sense we understand that?). Sometimes the breakdown is so extreme that we can say that the partial preference is factually wrong. This includes effects like the hedonic treadmill: our partial models of achieving certain goals often include an imagined long-term satisfaction that we would not actually feel. Indeed, it might be good to specifically avoid these extreme situations, rather than having to make a moral compromise that might lose part of H's values due to uncertainty. In that case, ambiguous extreme situations get a slight intrinsic negative - that might be overcome by other considerations, but is there nonetheless.

A final consideration is that some concepts just disintegrate in general environments - for example, consider a preference for "natural" or "hand-made" products. In those cases, the web of connotations can be used to extract some preferences in general - for example, "natural", used in this way has connotations[6] of "healthy", "traditional", and "non-polluting", all of which extend better to general environments than "natural" does. Sometimes, the preference can be preserved but routed around: some versions of "no artificial genetic modifications" could be satisfied by selective breeding that achieved the same result. And some versions couldn't; it's all a function of what powers the underlying preference: specific techniques, or a general wariness of these types of optimisation. Meta-preferences might be very relevant here.

2 Synthesising the preference utility function

Here we will sketch out the construction of the human utility function UH, from the data that is the partial preferences and their (relative) weights.

This is not, by any means, the only way of constructing UH. But it is illustrative of how the utility could be constructed, and can be more usefully critiqued and analysed than a vaguer description.

2.1 What sort of utility function?

Partial preferences are defined over states of the world or states of the human H. The later included both things like "being satisfied with life" (purely internal) and "being an honourable friend" (mostly about H's behaviour).

Consequently, UH must also be defined over such things, so UH is dependent on states of the world and states of the human H. Unlike standard MDP-like situations, these states can include the history of the world or of H up to that point - preferences like "don't speak ill of the dead" abound in humans.

2.2 Why a utility function?

Why should we aim to synthesise a utility function, when human preferences are very far from being utility functions?

It's not of an innate admiration for utility functions, or a desire for mathematical elegance. It's because they tend to be stable under self-modification. Or, to be more accurate, they seem to be much more stable than preferences that are not utility functions.

In the imminent future, human preferences are likely to become stable and unchanging. Therefore it makes more sense to create a preference synthesis that is already stable, that create a potentially unstable one and let it randomly walk itself to stability (though see Section 4.6).

Also, and this is one of the motivations behind classical inverse reinforcement learning, reward/utility functions tend to be quite portable, and can be moved from one agent to another or from one situation to another, with greater ease than other goal structures.

2.3 Extending and normalising partial preferences

Human values are changeable, manipulable, underdefined, and contradictory. By focusing around time t, we have removed the changeable problem for partial preferences (see this post for thoughts on how long a period around t should be allowed); manipulable has been dealt with by removing the possibility of the AI influencing the learning process.

Being underdefined remains a problem, though. It would be possible to overfit absurdly specifically to the human's partial models, and generate a UH that is in full agreement with our partial preferences and utterly useless. So the first thing to do is to group the partial preferences together according to similarity (for example, preferences for concepts closely related in terms of webs of connotations should generally be grouped together), and generalise them in some regularised way. Generalise means, here, that they are transformed into full preferences, comparing all possible universes. Though this would only be comparing on the narrow criteria that were used for the partial preference: a partial preference fear of being mugged could generalise to a fear of pain/violence/violation/theft across all universes, but would not include other aspects of our preferences. So they are full preferences, in terms of applying to all situations, but not the full set of our preferences, in terms of taking into account all our partial preferences.

It seems that standard machine learning techniques should already be up to this task (with all the usual current problems). For example, clustering of similar preferences would be necessary. There are unsupervised ML algorithms that can do that; but even supervised ML algorithms end up grouping labelled data together in ways that define extensions of the labels into higher dimensional space. Where could these labels come from? Well, they could come from grounded symbols within meta-preferences. A meta-preference of the form "I would like to be free of bias" contains some model of what "bias" is; if that meta-preference is particularly weighty, then clustering preferences by whether or not they are biases could be a good thing to do.

Once the partial preferences are generalised in this way, remains the problem of them being contradictory. This is not as big a problem as it may seem. First of all, it is very rare for preferences to be utterly opposed: there is almost always some compromise available. So an altruist with murderous tendencies could combine charity work with aggressive online gaming; indeed some whole communities (such as BDSM) are designed to balance "opposing" desires for risk and safety.

So in general, the way to deal with contradictory preferences is to weight them appropriately, then add them together; any compromise will then appear naturally from the weighted sum[7].

To do that, we need to normalise the preferences in some way. We might seek to do this in an a priori, principled way, or through partial models that include the tradeoffs between different preferences. Preferences that pertain to extreme situations, far removed from everyday human situations, could also be penalised in this weighting process (as the human should be less certain about these).

Now that the partial preferences have been identified and weighted, the challenge is to synthesise them into a single UH.

2.4 Synthesising the preference function: first step

So this is how one could do the first step of preference synthesis:

1. Group similar partial preferences together, generalise them to full preferences without overfitting.
2. Use partial models to compute the relative weight between different partial preferences.
3. Using those relative weights, and again without overfitting, synthesise those preferences into a single utility function U0H.

This all seems doable in theory within standard machine learning. See Section 2.3 and the discussion of clustering for point 1. Point 2. comes from the definition of partial preferences. And point 3. is just an issue of fitting a good regularised approximation to noisy data.

In certain sense, this process is the partial opposite how Jacob Falkovich used a spreadsheet to find a life partner. In that process, he started by factoring his goal of having a life-partner in many different subgoals. He then ranked the putative partners on each of the subgoals by comparing two options at a time, and building a (cardinal) ranking from these comparisons. The process here also aims to assign cardinal values from comparisons of two options, but the construction of the "subgoals" (full preferences) is handled by machine learning from the sets of weighted comparisons.

2.5 Identity preferences

Some preferences are best understood as pertaining to our own identity. For example, I want to understand how black holes work; this is separate from my other preference that some humans understand black holes (and separate again from an instrumental preference that, had we a convenient black hole close to hand, that we could use it to get energy out of).

Identity preferences seem to be different from preferences about the world; they seem more fragile than other preferences. We could combine identity preference differently from standard preferences, for example using smoothmin rather than summation.

Ultimately, the human's mental exchange rate between preferences should determine how preferences are combined. This should allow us to treat identity and world-preferences in the same way. There are two reasons to still distinguish between world-preferences and identity preferences:

1. For preferences where relative weights are unknown or ill-defined, linear combinations and smooth-min serve as a good default for world-preferences and identity preferences respectively.
2. It's not certain that identity can be fully captured by partial preferences; in that case, identity preferences could serve as a starting point from which to build a concept of human identity.
2.6 Synthesising the preference function: meta-preferences

Humans generally have meta-preferences: preferences over the kind of preferences they should have (often phrased as preferences over their identity, eg "I want to be more generous", or "I want to have consistent preferences").

This is such an important feature of humans, that it needs its own treatment; this post first looked into that.

The standard meta-preferences endorse or unendorse lower lever preferences. First one can combine them as in the method above, and get a synthesised meta-preference. Then this increases or decreases the weights of the lower level preferences, to reach a UnH with preference weights adjusted by the synthesised meta-preferences.

Note that this requires some ordering of the meta-preferences: each meta-preference refers only to meta-preferences "below" itself. Self-referential meta-preferences (or, equivalently, meta-preferences referring to each other in a cycle) are more subtle to deal with, see Section 4.5.

Note that an ordering does not mean that the higher meta-preferences must dominate the lower ones; a weakly held meta-preference (eg a vague desire to fit in with some formal standard of behaviour) need not overrule a strongly held object level preference (eg a strong love for a particular person, or empathy for an enemy).

2.7 Synthesising the preference function: meta-preference about synthesis

In a special category are the meta-preference about the synthesis process itself. For example, philosophers might want to give greater weight to higher order meta-preferences, or might value the simplicity of the whole UH.

One can deal with that by using the standard synthesis to combine the method meta-preferences, then use this combination to change how standard preferences are synthesised. This old post has some examples of how this could be achieved.

As long as there is an ordering of meta-preferences about synthesis, one can use the standard method to synthesise the highest level of meta-preferences, which then tells us how to synthesise the lower-level meta-preferences about synthesis, and so on.

Why use the standard synthesis method for these meta-preferences - especially if they contradict this synthesis method explicitly? There are three reasons for this:

1. These meta-preferences may be weakly weighted (hence weakly held), so they should not automatically overwhelm the standard synthesis process when applied to themselves (think of continuity as the weight of the meta-preference fades to zero).
2. Letting meta-preferences about synthesis determine how they themselves get synthesised leads to circular meta-preferences, which may cause problems (see Section 4.5).
3. The standard method is more predictable, which makes the whole process more predictable; self-reference, even if resolved, could lead to outcomes randomly far away from the intended one. Predictability could be especially important for "meta-preferences over outcomes" of the next section.

Note that these synthesis meta-preferences should be of a type that affects the synthesis of UH, not its final form. So, for example, "simple (meta-)preferences should be given extra weight in UH" is valid, while "UH should be simple" is not.

Thus, finally, we can combine everything (except for some self-referencing contradictory preferences) into one UH.

Note there are many degrees of freedom in how the synthesis could be carried out; it's hoped that they don't matter much, and that each of them will reach a UH that avoids disasters[8] (see Section 2.8).

2.8 Avoiding disasters, and global meta-preferences

It is important that we don't end up in some disastrous outcome; the very definition of a good human value theory requires this.

The approach has some in-built protection against many types of disasters. Part of that is that it can include very general and universal partial preferences, so any combination of "local" partial preferences must be compatible with these. For example, we might have a collection of preferences about autonomy, pain, and personal growth. It's possible that, when synthesising these preferences together, we could end up with some "kill everyone" preference, due to bad extrapolation. However, if we have a strong "don't kill everyone" preference, this will push the synthesis process away from that outcome.

So some disastrous outcomes of the synthesis should be avoided, precisely because all of H's preferences are used, including those that would specifically label that outcome a disaster.

But, even if we included all of H's preferences in the synthesis, we'd still want to be sure we'd avoided disasters.

In one sense, this requirement is trivially true and useful. But in another, it seems perverse and worrying - the UH is supposed to be a synthesis of true human preferences. By definition. So how could this UH be, in any sense, a disaster? Or a failure? What criteria - apart from our own preferences - could we use? And shouldn't we be using these preferences in the synthesis itself?

The reason that we can talk about UH not being a disaster, is that not all our preferences can best be captured in the partial model formalism above. Suppose one fears a siren world or reassures oneself that we can never encounter an indescribable hellworld. Both of these could be clunkily transformed into standard meta-preferences (maybe about what some devil's advocate AI could tell us?). But that somewhat misses the point. These top-meta-level considerations live most naturally at the top-meta-level: reducing them to the standard format of other preferences and meta-preferences risks losing the point. Especially when we only partially understand these issues, translating them to standard meta-preferences risks losing the understanding we do have.

So, it remains possible to say that UH is "good" or "bad", using higher level considerations that are difficult to capture entirely within UH.

For example, there is an argument that human preference incoherence should not cost us much. If true, this argument suggests that overfitting to the details of human preferences is not as bad as we might fear. One could phrase this as a synthesis meta-preference allowing more over-fitting, but this doesn't capture a coherent meaning of "not as bad" - which precludes the real point of this argument, which is "allow more overfitting if the argument holds". To use that, we need some criteria for establishing "the argument holds". This seems very hard to do within the synthesis process, but could be attempted as top-level meta-preferences.

We should be cautious and selective when using these top-level preferences in this way. This is not generally the point at which we should be adding preferences to UH; that should be done when constructing UH. Still, if we have a small selection of criteria, we could formalise these and check ourselves whether UH satisfies them, or have an AI do so while synthesising UH. A Last Judge can be a sensible precaution (especially if there are more downsides to error than upsides to perfection).

Note that we need to distinguish between the global meta-preferences of the designers (us) and those of the subject H. So, when designing the synthesis process, we should either allow options to be automatically changed by H's global preferences, or be aware that we are overriding them with our own judgement (which may be inevitable, as most H's have not thought deeply about preference synthesis; still, it is good to be aware of this issue).

This is also the level at which experimental testing of UH synthesis is likely to be useful - keeping in mind what we expect from UH synthesis, and running the synthesis in some complicated toy environments, we can see whether our expectations are correct. We may even discover extra top-level desiderata this way.

2.9 How much to delegate to the process

The method has two types of basic preferences (world-preferences and identity preferences). This is a somewhat useful division; but there are others that could have been used. Altruistic versus selfish versus anti-altruistic preferences is a division that was not used (though see Section 4.3). Moral preferences were not directly distinguished from non-moral preferences (though some human meta-preferences might make the distinction).

So, why divide preferences this way, rather than in some other way? The aim is to allow the process itself to take into account most of the divisions that we might care about; things that go into the model explicitly are structural assumptions that are of vital important. So the division between world- and identity preferences was chosen because it seemed absolutely crucial to get that right (and to err on the side of caution in distinguishing the two, even if our own preferences don't distinguish them as much). Similarly, the whole idea of meta-preferences seems a crucial feature of humans, which might not be relevant for general agents, so it was important to capture it. Note that meta-preferences are treated as a different type to standard preferences, with different rules; most distinctions built into the synthesis method should similarly be between objects of a different type.

But this is not set in stone; global meta-preferences (see Section 2.8) could be used to justify a different division of preference types (and different methods of synthesis). But it's important to keep in mind what assumptions are being imposed from outside the process, and what the method is allowed to learn during the process.

3 UH in practice 3.1 Synthesis of UH in practice

If the definition of UH of the previous section could be made fully rigorous, and if the AI has a perfect model of H's brain, knowledge of the universe, and unlimited computing power, it could construct UH perfectly and directly. This will almost certainly not be the case; so, do all these definitions give us something useful to work with?

It seems they do. Even extreme definitions can be approximated, hopefully to some good extent (and the theory allows us to assess the quality of the approximation, as opposed to another method without theory, where there is no meaningful measure of approximation ability). See Section 0.3 for an argument as to why even very approximate versions of UH could result in very positive outcomes: even approximated UH rule out most bad AI failure scenarios.

In practical terms, the synthesis of UH from partial preferences seems quite robust and doable; it's the definition of these partial preferences that seems tricky. One might be able to directly see the internal symbols in the human brain, with some future super-version of fMRI. Even without that direct input, having a theory of what we are looking for - partial preference in partial models with human symbols grounded - allows us to use results from standard and moral psychology. These results are insights into behaviour, but they are often also, at least in part, insights into how the human brain processes information. In Section 3.3, we'll see how the definition of UH allows us to "patch" other, more classical methods of value alignment. But the converse is also true: with a good theory, we can use more classical methods to figure out UH. For example, if we see H as being in a situation where they are likely to tell the truth about their internal model, then their stated preferences become good proxies for their internal partial preferences.

If we have a good theory for how human preferences change over time, then we can use preferences at time t′ as evidence for the hypothetical preferences at time t. In general, more practical knowledge and understanding would lead to a better understanding of the partial preferences and how they change over time.

This could become an area of interesting research; once we have a good theory, it seems there are many different practical methods that suddenly become usable.

For example, it seems that humans model themselves and each other using very similar methods. This allows us to use our own judgement of irrationality and intentionality, to some extent, and in a principled way, to assess the internal models of other humans. As we shall see in Section 3.3, an awareness of what we are doing - using the similarity between our internal models and those of others - also allows us to assess when this method stops working, and patch it in a principled way.

In general, this sort of research would give results of the type "assuming this connection between empirical facts and internal models (an assumption with some evidence behind it), we can use this data to estimate internal models".

3.2 (Avoiding) uncertainty and manipulative learning

There are arguments that, as long as we account properly for our uncertainty and fuzziness, there are no Goodhart-style problems in maximising an approximation to UH. This argument has been disputed, and there are ongoing debates about it.

With a good definition of what it means for the AI to influence the learning process, online learning of UH becomes possible, even for powerful AIs learning over long periods of time in which the human changes their views (either naturally or as a consequence of the AI's actions).

Thus, we could construct an online version of inverse reinforcement learning without assuming rationality, where the AI learns about partial models and human behaviour simultaneously, constructing the UH from observations given the right data and the right assumptions.

3.3 Principled patching of other methods

Some of the theoretical ideas presented here can be used to improve other AI alignment ideas. This post explains one of the ways this can happen.

The basic idea is that there exists methods - stated preferences, revealed preferences, an idealised human reflecting for a very long time - that are often correlated with UH and with each other. However, all of the methods fail - stated preferences are often dishonest (the revelation principle doesn't apply in the social world), revealed preferences assumes a rationality that is often absent in humans (and some models of revealed preferences obscure how unrealistic this rationality assumption is), humans that think for a long time have the possibility of value drift or random walks to convergence.

Given these flaws, it is always tempting to patch the method: add caveats to get around the specific problem encountered. However, if we patch and patch until we can no longer think of any further problems, that doesn't mean there are no further problems: simply that they are likely beyond our capacity to predict ahead of time. And, if all that it has is a list of patches, the AI is unlikely to be able to deal with these new problems.

However, if we keep the definition of UH in mind, we can come up with principled reasons to patch a method. For example, lying on stated preferences means a divergence between stated preferences and internal model; revealed preferences only reveal within the parameters of the partial model that is being used; and value drift is a failure of preference synthesis.

Therefore, each patch can have an explanation for the divergence between method and desired outcome. So, when the AI develops the method further, it can itself patch the method, when it enters a situation where a similar type of divergence. It has a reason for why these patches exist, and hence the ability to generate new patches efficiently.

3.4 Simplified UH sufficient for many methods

It's been argued that many different methods rely upon, if not a complete synthesis UH, at least some simplified version of it. Corrigibility, low impact, and distillation/amplification all seem to be methods that require some simplified version of UH.

Similarly, some concepts that we might want to use or avoid - such as "manipulation" or "understanding the answer" - also may require a simplified utility function. If these concepts can be defined, then one can disentangle them from the rest of the alignment problem, allowing us to instructively consider situations where the concept makes sense.

In that case, a simplified or incomplete construction of UH, using some simplification of the synthesis process, might be sufficient for one of the methods or definitions just listed.

3.5 Applying the intuitions behind UH to analysing other situations

Finally, one could use the definition of UH as inspiration when analysing other methods, which could lead to interesting insights. See for example these posts on figuring out the goals of a hierarchical system.

4 Limits of the method

This section will look at some of the limitations and lacuna of the method described above. For some limitations, it will suggest possible ways of dealing with them; but these are, deliberately, chosen to be extras beyond the scope of the method, where synthesising UH is the whole goal.

4.1 Utility at one point in time

The UH is meant to be a synthesis of the current preferences and meta-preferences of the human H, using one-step hypotheticals to fill out the definition. Human preferences are changeable on a short time scale, without us feeling that we become a different person. Hence it may make sense to replace UHt with some average UH, averaged over a short (or longer) period of time. Shorter period lead to more "overfitting" to momentary urges; longer period allow more manipulation or drift.

4.2 Not a philosophical ideal

The UH is also not a reflective equilibrium or other idealised distillation of what preferences should be. Philosophers will tend to have a more idealised UH, as will those who have reflected a lot and are more willing to be bullet swallowers/bullet bitters. But that is because these people have strong meta-preferences that push in those idealised directions, so any honest synthesis of their preferences must reflect these.

Similarly, this UH is defined to be the preferences of some human H. If that human is bigoted or selfish, their UH will be bigoted or selfish. In contrast, moral preferences that can be considered factually wrong will be filtered out by this construction. Similarly, preferences based on erroneous factual beliefs ("trees can think, so...") will be removed or qualified ("if trees could think, then...").

Thus if H is wrong, the UH will not reflect that wrongness; but if H is evil, then UH will reflect that evilness.

Also, the procedure will not distinguish between moral preferences and other types of preferences, unless the human themselves does.

4.3 Individual utility versus common utility

This research agenda will not look into how to combine the UH of different humans. One could simply weight the utilities according to some semi-plausible scale and add them together.

But we could do many other things as well. I've suggested removing anti-altruistic preferences before combining the UH's into some global utility function UH for all of humanity - or for all future and current sentient beings, or for all beings that could suffer, or for all physical entities.

There are strong game-theoretical reasons to remove anti-altruistic preferences. We might also add philosophical considerations (eg moral realism) or deontological rules (eg human rights, restrictions on copying themselves, extra weighting to certain types of preferences), either to the individual UH or when combining them, or prioritise moral preferences over other types. We might want to preserve the capacity for moral growth, somehow (see Section 4.6).

That can all be done, but is not part of this research agenda, whose sole purpose is to synthesise the individual UH's, which can then be used for other purposes.

4.4 Synthesising UH rather than discovering it (moral anti-realism)

The utility UH will be constructed, rather than deduced or discovered. Some moral theories (such as some versions of moral realism) posit that there is a (generally unique) UH waiting to be discovered. But none of these theories give effective methods for doing so.

In the absence of such a definition of how to discover an ideal UH, it would be highly dangerous to assume that finding UH is a process of discovery. Thus the whole method is constructive from the very beginning (and based on a small number of arbitrary choices).

Some versions of moral realism could make use of UH as a starting point of their own definition. Indeed, in practice, moral realism and moral anti-realism seem to be initially almost identical when meta-preferences are taken into account. Moral realists often have mental examples of what counts as "moral realism doesn't work", while moral anti-realists still want to simplify and organise moral intuitions. To a first approximation, these approaches can be very similar in practice.

4.5 Self-referential contradictory preferences

There remain problems with self-referential preferences - preferences that claim they should be given more (or less) weight than otherwise (eg "all simple meta-preferences should be penalised"). This was already observed in a previous post.

This includes formal Gödel-style problems, with preferences explicitly contradicting themselves, but those seem solvable - with one or another version of logical uncertainty.

More worrying, from the practical standpoint, is the human tendency to reject values imposed upon them, just because they are imposed upon them. This resembles a preference of the type "reject any UH computed by any synthesis process". This preference is weakly existent in almost all of us, and a variety of our other preferences should prevent the AI from forcibly re-writing us to become UH-desiring agents.

So it remains not at all clear what happens when the AI says "this is what you really prefer" and we almost inevitably answer "no!"

Of course, since the UH is constructed rather than real, there is some latitude. It might be possible to involve the human in the construction process, in a way that increases their buy-in (thanks to Tim Genewein for the suggestion). Maybe the AI could construct the first UH, and refine it with further interactions with the human. And maybe, in that situation, if we are confident that UH is pretty safe, we'd want the AI to subtly manipulate the human's preferences towards it.

4.6 The question of identity and change

It's not certain that human concepts of identity can be fully captured by identity preferences and meta-preferences. In that case, it is important that human identity be figured out somehow, lest humanity itself vanish even as our preferences are satisfied. Nick Bostrom sketched how this might happen: in the mindless outsourcers scenario, human outsource more and more of their key cognitive features to automated algorithms, until nothing remains of "them" any more.

Somewhat related is the fact that many humans see change and personal or moral growth as a key part of their identity. Can such a desire be accommodated, despite a likely stabilisation of values, without just becoming a random walk across preference space?

Some aspects of growth and change can be accommodated. Humans can certainly become more skilled, more powerful, and more knowledgeable. Since humans don't distinguish well between terminal and instrumental goals, some forms of factual learning resemble moral learning ("if it turns out that anarchism results in the greatest flourishing of humanity, then I wish to be a anarchist; if not, then not"). If we take into account the preferences of all humans in some roughly equal way (see Section 4.3), then we can get "moral progress" without needing to change anyone's individual preferences. Finally, professional roles, contracts, and alliances allow for behavioural changes (and sometimes values changes), in ways that maximise the initial values. Sort of like "if I do PR work for the Anarchist party, I will spout anarchist values" and "I accept to make my values more anarchist, in exchange for the Anarchist party shifting their values more towards mine".

Beyond these examples, it gets trickier to preserve moral change. We might put a slider that makes our own values less instrumental or less selfish over time, but that feels like a cheat: we already know what we will be, we're just taking the long route to get there. Otherwise, we might allow our values to change within certain defined areas. This would have to be carefully defined to prevent random change, but the main challenge is efficiency: changing values have an inevitable efficiency cost, so there needs to be strong positive pressure to preserve the changes - and not just preserve an unused "possibility for change", but actual, efficiency-losing, changes.

This should be worth investigating more; it feels like these considerations need to be built into the synthesis process for this to work, rather than the synthesis project making them work itself (thus this kind of preferences is one of the "Global meta-preferences about the outcome of the synthesis process").

4.7 Other Issues not addressed

These are other important issues that need to be solved to get a fully friendly AI, even if the research agenda works perfectly. They are, however, beyond the scope of this agenda; a partial list of these is:

1. Actually building the AI itself (left as an exercise to the reader).
2. Population ethics (though some sort of average of individual human population ethics might be doable with these methods).
3. Taking into account other factors than individual preferences.
4. Issues of ontology and ontology changes.
5. Mind crime (conscious suffering beings simulated within an AI system), though some of the work on identity preferences may help in identifying conscious minds.
6. Infinite ethics.
7. Definitions of counterfactuals or which decision theory to use.
8. Agent foundations, logical uncertainty, how to keep a utility stable.
10. Optimisation daemons/inner optimisers/emergent optimisation.

Note that the Machine Intelligence Research Institute is working heavily on issues 7, 8, and 9.

1. Actually, this specific problem is not included directly in the research agenda, though see Section 4.3. ↩︎

2. Likely but not certain: we don't know how effective AIs might become at computing counterfactuals or modelling humans. ↩︎

3. It makes sense to allow partial preferences to contrast a small number of situations, rather than just two. So "when it comes to watching superhero movies, I'd prefer to watch them with Alan, but Beth will do, and definitely not with Carol". Since partial preferences with n situations can be built out of smaller number of partial preferences with two situations, allowing more situations is a useful practical move, but doesn't change the theory. ↩︎

4. Equivalently to reducing the weight, we could increase uncertainty about the partial preference, given the unfamiliarity. There are many options for formalisms that lead to the same outcome. Though note that here, we are imposing a penalty (low weight/high uncertainty) for unfamiliarity, whereas the actual human might have incredibly strong internal certainty in their preferences. It's important to distinguish assumptions that the synthesis process makes, from assumptions that the human might make. ↩︎

5. Extreme situations are also situations where we have to be very careful to ensure the AI has the right model of all preference possibilities. The flaws of incorrect model can be corrected by enough data, but when data is sparse and unreliable, then model assumptions - including prior - tend to dominate the result. ↩︎

6. "Natural" does not, of course, mean any of "healthy", "traditional", or "non-polluting". However those using the term "natural" are often assuming all of those. ↩︎

7. The human's meta-preferences are also relevant to this it. It might be that, whenever asked about this particular contradiction, the human would answer one way. Therefore H's conditional meta-preferences may contain ways of resolving these contradictions, at least if the meta-preferences have high weight and the preferences have low weight.

Conditional meta-preferences can be tricky, though, as we don't want them to allow the synthesis to get around the one-step hypotheticals restriction. A "if a long theory sounds convincing to me, I want to believe it" meta-preference in practice do away with these restrictions. That particular meta-preference might be cancelled out by the ability of many different theories to sound convincing. ↩︎

8. We can allow meta-preferences to determine a lot more of their own synthesis if we find an appropriate method that a) always reaches a synthesis, and b) doesn't artificially boost some preferences through a feedback effect. ↩︎

Discuss

### Preference conditional on circumstances and past preference satisfaction

Новости LessWrong.com - 17 июня, 2019 - 18:30
Published on June 17, 2019 3:30 PM UTC

I've mentioned conditional preferences before. These are preferences that are dependent on facts about the world, for example "I'd want to believe X if there are strong argument for X".

But there is another type of preference that is conditional: my tastes can vary depending on circumstances and on my past experience. For example, I might prefer to eat apples during the week and oranges on weekends. Or, because of the miracle of boredom, I might prefer oranges if (but only if) I've been eating apples all week so far.

What if I currently want apples, would want oranges tomorrow, but falsely believe (today) that I would want apples tomorrow? This is a known problem with "one-step hypotheticals", and a strong argument in practice for assessing preferences over time rather than at a single moment t.

In theory, there are meta-preferences that allow one to get this even at a single moment t, such as "I want to be able to follow my different tastes at different times" or a more formalised desire for variety and exploration.

Discuss

### An article of mine on "Flowers for Algernon". (Of course) being a genius won't solve one's problems.

Новости LessWrong.com - 17 июня, 2019 - 17:46
Published on June 17, 2019 2:46 PM UTC

Charlie

Flowers  for Algernon, the famous short story by Daniel Keyes, is centered on  the protagonist's attempt to become intelligent. It can be said to  primarily be a reflection on how it often seems to be considered  standard behavior to look down on the less mentally able. Charlie Gordon  – the aforementioned protagonist– is a person with an IQ a little below  80, who is routinely taken advantage of due to his inability to  recognize the difference between those who genuinely like him and the  ones who make cruel jokes and have fun at his expense.Charlie does want to become intelligent, and “works very hard, with  very little” – as his teacher, Mrs Kinnian, eloquently puts it. Yet he  only wants to become intelligent because he supposes that his problems  would stop existing if that was to happen. It's why he strives to be the  one chosen for the experiment that will triple a person’s IQ.

Algernon

Prior  to Charlie, the only beings which underwent the experiment – one that  begins with surgery and is followed by a sequence of mental exercises  and trials – had been laboratory mice. The most successful of those is a  white mouse, called Algernon, who regularly solves maze-related  puzzles in order to be rewarded with food. Algernon is, at first,  disliked by Charlie - because the creature makes him feel sad for being  “dumber even than a mouse”. But Charlie will very soon come to like his  fellow-traveler in this experiment, and will feel saddened upon  reflecting that Algernon is continuously asked to solve puzzles and take  tests so that he may be given the food he needs to stay alive.On a more symbolic level, Algernon can be said to represent the  purely somatic aspect of Charlie’s own existence: When Charlie will  become a genius, with an IQ far surpassing that of the two doctors who  oversaw his transformation, he will be virtually all about intellect,  with Algernon staying behind to allude to the more physical aspects of  this troubled hero...

Not All Goes Well

As  a result of his increased intelligence, Charlie will be able to detect  that his supposed friends at work had been mocking him all this time.  This depresses him, and he even wonders if anything was gained by  learning the truth; when his previous state of existence had been  shielding him from such cruel revelations.But the worst of his troubles lie ahead: at some point, he notices  that Algernon starts showing signs of reverting to a former, less  intelligent state, as well as losing his physical health... Algernon is  dying, and the process of deterioration is quite rapid. Charlie is  correct to fear that the same fate might await himself, so he sets to  find a way – he is the only one intelligent enough to achieve this – to  alter the method his doctors had developed, hoping to salvage at least  some part of the intelligence he had gained through the experimental  procedure they oversaw.But the result of his struggles is heart-breaking: he will come to  the conclusion that the experiment was inherently flawed, and would  always produce this terrible result: after a while, the person – or  other type of being – which had gained intelligence as a result of  undergoing the experiment, will not only return to their previous state,  but will consequently become far worse than before.

An Inevitable Future

Charlie  cannot accept his fate stoically. He now refuses to meet with his  doctors. He even denies a visit by his old teacher, Mrs. Kinnian; he  was, in reality, in love with Mrs. Kinnian, but never managed to find  the courage to tell her.He now wants to isolate himself from everyone. He wishes to fade  away. He has taken a glimpse into the world as understood by a genius,  only to be forced to return to being a man with very low IQ. He manages  to publish his proof regarding the experiment’s qualities, according to  which it will always lead to the same end, and his peer-reviewed paper  is titled as “The Algernon-Gordon effect”; in memory of Algernon, who,  in the meantime, had passed away…

The End

In  many ways, the ending of this story is the most saddening part. Charlie  by now is once again a person with special needs, and his intellect has  become even lower than the one he had prior to the experiment. He is  indeed further incapacitated, and – quite tragically – can no longer  even recall that he quit being a student of Mrs. Kinnian's class for  people with low IQ. So, at some point, he returns to that classroom –  and Mrs Kinnian has to leave so as to hide her tears, upon realizing  that Charlie – Charlie, the hard-worker, Charlie, the person who  earnestly wanted something good but ended up being worse-off mentally  and moreover now also physically ill – by and large cannot even recall  what happened to him... He is, by this point, unaware even of his own  remarkable and devastating personal story!
by Kyriakos Chalkopoulos (https://www.patreon.com/Kyriakos)

Discuss

### Клуб чтения цепочек

События в Кочерге - 17 июня, 2019 - 13:00
Пятница, 21 июня, 16:30

### Is there a guide to 'Problems that are too fast to Google'?

Новости LessWrong.com - 17 июня, 2019 - 08:04
Published on June 17, 2019 5:04 AM UTC

It seems to me like problems come in a variety of required response speeds, but there's a natural threshold to distinguish fast and slow: whether or not you can Google it. The slow ones, like getting an eviction notice from your landlord or a cancer diagnosis from your doctor, can't be ignored but you have time to figure out best practices before you act. Fast ones, like getting bit by a rattlesnake or falling from a high place, generally require that you already know best practices in order to properly implement them.

Also useful would be the meta-guide, which just separates out which problems are fast and slow (or how fast they are). Getting bit by a tick, for example, seems like it might be quite urgent when you discover one biting you, but isn't; you have about 24 hours from when it first attaches to remove it, which is plenty of time to research proper removal technique. Getting a bruise might seem like you have time, but actually applying cold immediately does more to prevent swelling than applying cold later does to reduce it.

Of course, this is going to vary by region, profession, age, sex, habits, and so on. I'm sort of pessimistic about this existing at all, and so am interested in whatever narrow versions exist (even if it's just "here's what you need to know about treating common injuries to humans"). Basic guides also seem useful from a 'preventing illusion of transparency' perspective.

Discuss

### SSC-Madison: Andrew Yang

Новости LessWrong.com - 17 июня, 2019 - 04:01
Published on June 16, 2019 2:09 PM UTC

Discuss

### ISO: Automated P-Hacking Detection

Новости LessWrong.com - 17 июня, 2019 - 00:15
Published on June 16, 2019 9:15 PM UTC

I'm sure there's some ML students/researchers on Lesswrong in search of new projects, so here's one I'd love to see and probably won't build myself: an automated method for predicting which papers are unlikely to replicate, given the text of the paper. Ideally, I'd like to be able to use it to filter and/or rank results from Google scholar.

Getting a good data set would probably be the main bottleneck for such a project. Various replication-crisis papers which review replication success/failure for tens or hundreds of other studies seem like a natural starting point. Presumably some amount of feature engineering would be needed; I doubt anyone has a large enough dataset of labelled papers to just throw raw or lightly-processed text into a black box.

Also, if anyone knows of previous attempts to do this, I'd be interested to hear about it.

Discuss

### Does scientific productivity correlate with IQ?

Новости LessWrong.com - 16 июня, 2019 - 22:42
Published on June 16, 2019 7:42 PM UTC

Anders Ericsson, in his popular book Peak: Secrets from the New Science of Expertise, claims that though one may need a reasonably high IQ to even be a scientist, after that level, IQ is irrelevant to scientific success.

The average IQ of scientists is certainly higher than the IQ of the general population, but among scientists, there is no correlation between IQ and scientific productivity. Indeed, a number of noble prize winning scientists have had IQs that would not even qualify them for Mensa, an organization who's members must have a measured IQ of at least 132, a number that put's you in the upper 2 percentile of the population.

(From chapter 8: "But What About Natural Talent", of Peak: Secrets from the New Science of Expertise)

My understanding is that this is completely wrong, and the best scientists tend to have higher IQs, at all levels of performance. But this is a background belief that I haven't concretely verified.

Is there a canonical reference regarding the impact of IQ on scientific contribution?

Discuss

### Does the _timing_ of practice, relative to sleep, make a difference for skill consolidation?

Новости LessWrong.com - 16 июня, 2019 - 22:12
Published on June 16, 2019 7:12 PM UTC

It is well known that sleep (both mid-day naps and nighttime sleep) has a large effect on the efficacy of motor skill acquisition. Performance on a newly learned task improves, often markedly, following a period of sleep.

A few citations (you can find many more by searching "motor skill acquisition sleep" or similar in google scholar) :

I want to know if the _timing_ of practice, relative to sleep, makes a difference for skill acquisition.

For instance, if you practice a skill at 7:00 PM, shortly before a night of sleep, will your performance be better in the morning than if you had practiced at 7:00 AM had a full day of wakefulness, and _then_ gone to sleep? If so, what is the effect size?

Josh Kaufman makes a claim to this effect in his book, The First 20 Hours. I have no particular reason to doubt him, 40 minutes of searching on google scholar did not turn up any papers about the importance of sleep and practice timing.

Can you point me at a relevant citation?

Discuss

### Is "physical nondeterminism" a meaningful concept?

Новости LessWrong.com - 16 июня, 2019 - 18:55
Published on June 16, 2019 3:55 PM UTC

Background: Our mental models of the universe can contain uncertainty or probability-links, as in a causal network. One may have a deterministic understanding of a phenomenon if the probability-values are all 0 and 1.

Question: Beyond that, is it meaningful to distinguish whether or not the *universe itself* is deterministic or nondeterministic?

For example, is it meaningful to say that the Copenhagen interpretation of QM implies a "nondeterministic universe", while Many Worlds implies a "deterministic universe"?

Discuss

### Ненасильственное общение. Тренировка

События в Кочерге - 16 июня, 2019 - 14:10
Четверг, 20 июня, 16:30

### Reasonable Explanations

Новости LessWrong.com - 16 июня, 2019 - 08:29
Published on June 16, 2019 5:29 AM UTC

Today I watched a friend do calibration practice and was reminded of how wide you have to cast your net to get well-calibrated 90% confidence. This is true even when the questions aren't gotchas, just because you won't think of all the ways something could be wildly unlike your quick estimate's model. Being well-calibrated for 90% confidence intervals (even though this doesn't sound all that confident!) requires giving lots of room even in questions you really do know pretty well, because you will feel like you really do know pretty well when in fact you're missing something that wrecks you by an order of magnitude.

Being miscalibrated can feel like "if it were outside of this range, I have just... no explanation for that" - and then it turns out there's a completely reasonable explanation.

Anyway, I thought a fun exercise would be describing weird situations we've encountered that turned out to have reasonable explanations. In initial descriptions, present only the information you (and whoever was thinking it over with you) remembered to consider at the time, then follow up in ROT-13 with what made the actual sequence of events come clear.

Discuss

### Accelerate without humanity: Summary of Nick Land's philosophy

Новости LessWrong.com - 16 июня, 2019 - 06:22
Published on June 16, 2019 3:22 AM UTC

I took note of the philosopher Nick Land from reading about posthumanism on Wikipedia.

A more pessimistic alternative to transhumanism in which humans will not be enhanced, but rather eventually replaced by artificial intelligences. Some philosophers, including Nick Land, promote the view that humans should embrace and accept their eventual demise. This is related to the view of "cosmism", which supports the building of strong artificial intelligence even if it may entail the end of humanity, as in their view it "would be a cosmic tragedy if humanity freezes evolution at the puny human level".

I was intrigued by such boldness, so I read more. And turns out Nick Land's writing is sometimes easy to read but most of the times extremely hard to read, and probably garbage. I wrote this post so that you don't have to waste time wading through the garbage, looking for fragments of good poetry.

Nothing human makes it out of the near-future. -- Nick Land

First of all, Nick Land was obsessed with hating Kant, loving Gilles Deleuze and Félix Guattari and their philosophical style (called schizoanalysis, related to schizophrenia). He likes to think about the world from very nonhuman viewpoints, such as other animals, robots, computers, machines that humans made, earth, the universe, etc. He likes capitalism and technological revolution, as fast as possible, without regard for its goodness.

Recently there's some mainstream reports on his philosophy of Neoreactionism ("Dark Enlightenment"), the idea that democracy sucks and monarchy/CEO-president works better. This philosophy has gained a bit of following, but uninteresting to me, so we won't review that. I'd simply note that the phrase "Dark Enlightenment" really should be "Delightenment". Really missing out such a pun.

Schizoanalysis

The idea of schizoanalysis just means that there's a lot of ways to make a theory about the world, and make philosophies, and there's no one way to do it, and further, there could be genuine conflicts that cannot be resolved by appealing to a higher standard.

In mathematics, there's some fringe movement of this style. Most mathematicians are in favor of logical consistency, but some are okay with controlled inconsistency (paraconsistency). Most mathematicians are in favor of using infinities, but some are finitists who think that infinities don't exist, and a few are ultrafinitists who think that there are finite large numbers (such as e^{e^{10}}) that can be assumed to not exist.

Coincidentally, these fringe mathematicians tend to be obnoxious and argumentative (Doron Zeilberger is a prominent example). Maybe there's such a thing as an "obnoxious fringe personality"...

Schizoanalysis uses an analogy for how to think about theories: the rhizome. A rhizome is a bunch of underground roots, touching each other in a messy network. This is in contrast to a tree, from a big trunk going up to little branches.

Traditionally, stories about the world are told like a tree: there's a great principle of the world: be it God, Existentialism, or Absurdism, and the story gets more and more details as it explains the smaller things like how to treat other people.

But maybe there are many stories just messed up and knotted, without any way to unify them in a single principle. I make stories about Infinities and you about Ultrafinitism and there's no way to unify us. Two powerful countries with incompatible philosophies go to war, unable to unify their stories.

Really obscure style

Kant is hard enough, Deleuze and Guattari's books are unreadable (I tried). Nick Land, being immersed in such kinds of books, often wrote in the same extreme obscure style. For example, Machinic Desire (1992):

The transcendental unconscious is the auto-construction of the real, the production of production, so that for schizoanalysis there is the real exactly in so far as it is built. Production is production of the real, not merely of representation, and unlike Kantian production, the desiring production of Deleuze/Guattari is not qualified by humanity (it is not a matter of what things are like for us)...

Don't bother trying to understand that. A big part of reading philosophy is to ignore real nonsense while still spending time on apparent nonsense that is actually sensible.

Non-human viewpoints Rats

Nick Land uses schizoanalysis by considering very non-human viewpoints. For example, he once gave a talk about studying the Black Death from the perspective of rats:

"Putting the Rat back Into Rationality", in which he argued that, rather than seeing death as an event that happened at a particular time to an individual, we should look at it from the perspectives of the rats carrying the Black Death into Europe; that is, as a world-encircling swarm... An older professor tried to get his head round this idea: “How might we locate this description within human experience?” he asked. Nick told him that human experience was, of course, worthy of study, but only as much as, say, the experience of sea slugs: “I don’t see why it should receive any special priority.”

Earth

Another paper/fiction, Barker Speaks, develops the theory of "geotrauma", a story about how the Earth feels, and it feels endless PAIN. This is my most favorite story so far, just because it's easy to picture (especially if you know Gaia theory).

Deleuze and Guattari ask: Who does the Earth think it is?... during the Hadean epoch, the earth was kept in a state of superheated molten slag [from asteroid impacts]... the terrestrial surface cooled, due to the radiation of heat into space... During the ensuing – Archaen – epoch the molten core was buried within a crustal shell, producing an insulated reservoir of primal exogeneous trauma, the geocosmic motor of terrestrial transmutation... It’s all there: anorganic memory, plutonic looping of external collisions into interior content, impersonal trauma as drive-mechanism.

Basically, do psychoanalysis on geology. The center of the earth is full of heat, and tension, leftovers from its early pains of being hit by asteroids. This trauma is being expressed in geological phenomena like earthquakes, volcanoes, and continental drifts.

Fast forward seismology and you hear the earth scream.

Further, even biological creatures should be thought of as one kind of geological phenomenon. This isn't complete nonsense, considering that we have possible clay-life earlier on Earth, and the fact that biological lifeforms have shaped geological strata.

Geotrauma is an ongoing process, whose tension is continually expressed – partially frozen – in biological organization.

In this story, biological creatures are just one way for Earth to express its trauma. We are the skin-crawls, manifestations of Earth's inner suffering.

Machinic desires

Nick Land talks a lot about cyborgs, AI, and machinic desire/desiring machines. The idea is that humans, animals, anything that has desires, are machines behaving as if they have true desires. It's not necessary for there to be deep reasons behind wanting to do something. A creature desires something (like sugar) because it's constructed to seek it.

Humans, animals, computers, cyborgs, they are all desiring machines. Some are better at achieving their desires, but there's no way to judge who has a superior/inferior desire.

Acceleration and capitalism

Nick Land is obsessed with progress and capitalism. Progress here seems to be defined by increasing complexity, increased number of machines, and numbers going up. I have some sympathies with this idea, but at the same time is also very uncomfortable with it.

Idle game of the whole universe

The easiest way to summarize accelerationism seems to be: The universe should be consumed into an idle game.

Think of Cookie Clicker. You click to make a number go up and enslave grandmas and build factories to make more cookies, with which to buy more cookie makers. It's the purest form of capitalism: You never get to consume any cookies, and all that's produced is reinvested to produce more. You don't have any friends, you consume the whole universe to make cookies, and that's all there is. The number of cookies accelerates exponentially, and you feel happy and empty and can't stop going anyway.

Accelerationism sees an idle game universe as good, or the least bad of all choices.

Previous work

The idea that capitalism is a great innovative, destructive force is nothing new. Karl Marx's Communist Manifesto (1848) already states:

Constant revolutionising of production, uninterrupted disturbance of all social conditions, everlasting uncertainty and agitation distinguish the bourgeois epoch from all earlier ones. All fixed, fast-frozen relations, with their train of ancient and venerable prejudices and opinions, are swept away, all new-formed ones become antiquated before they can ossify. All that is solid melts into air, all that is holy is profaned, and man is at last compelled to face with sober senses his real conditions of life, and his relations with his kind.

The need of a constantly expanding market for its products chases the bourgeoisie over the entire surface of the globe. It must nestle everywhere, settle everywhere, establish connexions everywhere.

The idea that things are changing way too fast is not new either, Future Shock (1970), by Alvin and Heidi Toffler is one famous book that argues that the modern area is changing so fast that it's causing many kinds of psychological stress on people. The book is quite accurate in its diagnosis, and its list of features of modern society are so common sense as to be banal (I yawned).

Accelerationism

Capitalism has many criticisms, such as turning people into products, giving prices to things that shouldn't have a price, etc. The idea of accelerationism is that we should keep capitalism going, keep technology going, go with the flow of technology even if it destroys humanity and everything we love. After all, there's no alternative. And why resist? It's glorious to burn up like a shooting star, like the fuel of a rocket that accelerates into empty space.

Capitalism’s destructive force is picked up in the 1990s by the British philosopher Nick Land. In a series of incendiary essays, Land celebrates absolute deterritorialization as liberation—even (or above all) to the point of total disintegration and death... He sees its absolute, violently destructive speed as an alien force that should be welcomed and celebrated.

Or just like Facebook said:

Move fast and break things.

Note: some people use the word "accelerationalism" in a different sense, that capitalism is bad, but the only way to escape capitalism is to make it go faster until it arrives at its bitter end, then we can escape. Kind of like diving into the center of a black hole and hoping that we'll escape into a better universe. I'm not interested in this sense of accelerationalism.

This is similar in spirit to cosmism, as a philosophy against humanism, detailed in The Artilect War (2005), by Hugo de Garis. The basic idea is simple though. There are the Terrans, or the humanists, who prefer to keep humans in control, and there are the Cosmists, who wants to keep the progress of intelligence expansion going, and fulfill a kind of cosmic destiny.

I think humanity should build these godlike supercreatures with intellectual capacities trillion of trillions of trillions times above our levels. I think it would be a cosmic tragedy if humanity freezes evolution at the puny human level.

Technological determinism

Technology is not neutral. It's a mere "tool", but even tools have desires and tendencies, controlling the very users who controls the tools. This is an ancient idea, going way back to Socrates's criticism of writing as affecting the memories of its users. Kevin Kelly is a modern thinker who wrote a book What Technology Wants (2010), and his idea is that the technologies are very much not neutral, and can even be thought of as something alive, with its own goals. The future of earth is very much determined by how this ecosystem of technologies evolves.

The cars are mechanical horses that wants you to build more roads so that it can go to more places. In order to encourage you to make more roads, it allows you to sit in them and take you everywhere. Thus proven itself useful, the cars entice you to build more roads. And that's how in just 100 years, there are suddenly these thin, gray, flat concrete things called "roads" everywhere on earth. The Internet want to expand, enticing you to join by providing so much stuff there. Junk food wants to be eaten, and diet books want you to get fat. Books want you to make more printing machines, and printing machines want you to read more books.

Nick Land takes this to an extreme.

Machinic desire can seem a little inhuman, as it rips up political cultures, deletes traditions, dissolves subjectivities, and hacks through security apparatuses, tracking a soulless tropism to zero control. This is because what appears to humanity as the history of capitalism is an invasion from the future by an artificial intelligent space that must assemble itself entirely from its enemy's resources.

It means something like this: our world, with its cars, finances, AI, and other industrial technologies, has a clear goal of its own: a future dominated by more of upgraded versions of these technologies, with humans becoming extinct or irrelevant. An inevitable AI apocalypse. It's called an invasion from the future, because this inhuman future is not yet here, but we already feel like we are being pulled towards it, as if someone has sent agents back in time to ensure humans do not mess up this plan. It's like the plot of Terminator.

Materialistic nihilism

This section is based on his book The Thirst for Annihilation (1992) that I have been reading on and off sometimes. This book is a collection of essays on George Bataille, a very weird writer that I encountered twice. The first time, I encountered him during my research on lingchi, as he wrote about it in a really hard to read book (Tears of Eros) that sexualizes violence.

The second time, it was in this book by Nick Land.

Basically, George Bataille wrote a lot, and his writing about materialistic nihilism, death, shit, vomit, garbage, and all that's ugly about life. (He also wrote a lot of sexual fetishes, but it's not very interesting.)

He wrote about them repetitively, not because he wanted to repeat himself a lot, but because to write was to howl in pain. We scream when we are burnt, no matter how many times it happens. Bataille wrote ugly despair whenever ugly despair hit his brain like a tsunami.

The meaning of life is to waste energy

Bataille thought Life is evil and ugly and meaningless. Life doesn't try to conserve energy, instead, life is about wasting energy. The Sun is a giant source of energy, and all the excess energy has to be used up somehow... hence life! Life appears when the blind materials of earth become overheated by all the energy of the sun, and shaken into more and more complicated shapes, in order to consume all the excess energy.

All energy must ultimately be spent pointlessly and unreservedly, the only questions being where, when, and in whose name... Bataille interprets all natural and cultural development upon the earth to be side-effects of the evolution of death, because it is only in death that life becomes an echo of the sun, realizing its inevitable destiny, which is pure loss.

The Sun is the source of energy. All the energy ends up being wasted, turned to "zero", nothing. Life is a thin, fragile, and very complex middle-layer between the Sun and the zero.

Life is ejected from the energy-blank and smeared as a crust upon chaotic zero, a mold upon death. This crust is also a maze - a complex exit back to the energy base-line - and the complexity of the maze is life trying to escape from out of itself... life is itself the maze of its route to death...

Nick Land's "maze" means something like this: Life is really simple: it's about wasting energy. But life is anything but simple, since life has developed more and more complicated ways to waste energy. In order to waste the maximal amount of energy, it's necessary that life doesn't start wasting energy immaturely (by, for example, committing suicide), but accumulate and grow, before it starts to massively waste energy (by, for example, making babies, then dying and turning into a warm pool of rotten wasted energy).

Paradoxically, in order to waste a lot of energy, life must not waste energy immediately, and so it has to stay alive for quite a while. Thus, life has become more and more complicated, like a maze that keeps growing, apparently wandering further and further away from death, even though it is simply preparing for even more massive wasting of energy later.

This is probably the reason why people fear death, and also fear immortality. They fear death, because they need to accumulate energy. They fear immortality, because they need to waste all the energy at the end of life. An endless life defeats the purpose of life: to waste a lot of energy.

Perhaps a good illustration of this idea is a time-lapse video of slime molds. They even look like mazes!

My comments on the theory of life as energy-waster

Scientifically, I think this is stupid. But it's a good story, and has some kernels of truth.

When Darwinism first became famous, many people thought it was nonsense, because it's just so unlikely that life could emerge in the first place. Sure, once simple life emerges, evolution can start and allow more complicated lifeforms to appear, but why did simple life appear from purely lifeless matter?

This thinking has changed among some scientists. There are theories that say that life, far from being a lucky accident, is in fact inevitable by the laws of physics. Life is in fact "meant" to waste solar energy. This was explicitly proposed in Life as a manifestation of the second law of thermodynamics (1994), by E.D.Schneider, J.J.Kay:

We argue that as ecosystems grow and develop, they should increase their total dissipation, develop more complex structures with more energy flow, increase their cycling activity, develop greater diversity and generate more hierarchical levels, all to abet energy degradation. Species which survive in ecosystems are those that funnel energy into their own production and reproduction and contribute to autocatalytic processes which increase the total dissipation of the ecosystem.

In short, ecosystems develop in ways which systematically increase their ability to degrade the incoming solar energy.

This theory was given a more mathematical treatment in Statistical Physics of Self-Replication (2012) by Jeremy England, where it's proposed that self-replication, which is the fundamental part of life, is fueled by entropy. This paper has generated a lot of publicity, for it makes the ideas sketched above mathematically precise. As reported in First Support for a Physics Theory of Life (2017):

It’s not easy for a group of atoms to unlock and burn chemical energy. To perform this function, the atoms must be arranged in a highly unusual form. According to England, the very existence of a form-function relationship “implies that there’s a challenge presented by the environment that we see the structure of the system as meeting.”

Think of what the humans are doing as they dig up coals and oils and burn them? They are having fun, sure, but from a thermodynamic point of view, they are turning high-quality, useful chemical energy into low-quality, useless heat. If oil and coal are left buried, they would stay undisturbed for millions of years. With human intervention, all the useful energy is turned into useless energy in a hundred years.

Humans perhaps are the solution to the problem of consuming fossil energy. And by analogy, perhaps life is the solution to the problem of consuming solar energy.

As another analogy, think of a bottle of water with its cap unscrewed and turned upside down. Water flows out, to turn its gravitational energy into kinetic energy, and then turn into the useless energy of heat after it splashes into the ground. A big vortex forms in the bottle, and with that vortex, water flows out that much faster.

The beautiful vortexes of steam rising from a cup of hot coffee are similar: ordered structures arising to turn the useful temperature difference (you can run a heat engine with that!) between the coffee and the air, into a useless temperature equality, as fast as possible.

A bacterium is a little vortex for turning sugar into heat.

But how and why do atoms acquire the particular form and function of a bacterium, with its optimal configuration for consuming chemical energy? England hypothesizes that it’s a natural outcome of thermodynamics in far-from-equilibrium systems.

Coffee cools down because nothing is heating it up, but England’s calculations suggested that groups of atoms that are driven by external energy sources can behave differently: They tend to start tapping into those energy sources, aligning and rearranging so as to better absorb the energy and dissipate it as heat. He further showed that this statistical tendency to dissipate energy might foster self-replication. (As he explained it in 2014, “A great way of dissipating more is to make more copies of yourself.”) England sees life, and its extraordinary confluence of form and function, as the ultimate outcome of dissipation-driven adaptation and self-replication.

Perhaps pockets of low-entropy life emerged only to increase the entropy of the universe at the fastest possible rate.

Perhaps Bataille's desperate theory of life isn't that insane, after all.

Discuss

### Let's Read: Borges's Stories

Новости LessWrong.com - 16 июня, 2019 - 05:35
Published on June 16, 2019 2:35 AM UTC

Jorge Luis Borges wrote a lot of fictions, some quite mathematical and rational. Other than those, he mainly wrote Argentinian human dramas (he's Argentinian), which are pretty useless in my opinion.

I'm going through his collected fictions, and here are ~100-word-long summaries for each story in it. With these, you might know which one to read and which to avoid.

If you only want my recommendation, here are them, along with the keywords:

1. The Library of Babel - infinity, meaning, language, combinatorics
2. The Garden of Forking Paths - multiverse, meaning, time
3. Tlön, Uqbar, Orbis Tertius - idealism, language, worldbuilding
4. The Lottery in Babylon - randomness, meaning
5. Funes the Memorious - memory, learning, language, representation and reality
6. On Exactitude in Science - representation and reality
7. Pierre Menard, Author of the Quixote - meaning, language, intellectual nihilism
8. The Immortals - immortality, meaning of life
9. Three Versions of Judas - theology, morality
10. Borges and I - personal identity, memory, meaning of life
11. Shakespeare's Memories - personal identity, memory
A Universal History of Infamy

Dramatized versions of real stories. Nothing mathematical there.

Hakim, the Masked Dyer of Merv

Introduction to Gnosticism.

Fictions Tlön, Uqbar, Orbis Tertius

A group thought up a fictional universe, Tlön, more orderly than this universe, and wrote a series of encyclopedia to describe it. This turned this universe into Tlön.

Philosophical description of Berkeley's idealism, and a language without nouns.

The Approach to Al-Mu'tasim

Mostly human drama. Muslim allegory: A group of birds tries to find the god of birds and journeys to the end of the world, only to find they themselves have become gods through the journey.

Pierre Menard, Author of the Quixote

Menard spends years creating Don Quixote by the hardest way possible. Because he likes that novel, and because he thinks intellectual pursuits are always meaningless and thus he'd rather do something obviously meaningless than something that would be shown to be meaningless centuries later.

Philosophical discussion of authorship, meaning of text in relation to the author, and intellectual nihilism.

The Circular Ruins

A man creates a man by dreaming and imagining him really hard. In the end, he finds himself was also created like this.

The Lottery in Babylon

Everything in the city of Babylon was operated by a giant lottery company who randomly assigns fates to citizens. The company has become invisible and some people wonder if it exists at all.

The philosophical problem is, whether it matters if randomness comes from a human creation (the company), or from the nonhuman universe.

An Examination of the Work of Herbert Quain

Quain wrote weird stories, such as a detective story with a wrong solution, a choose-your-own-adventure going backwards in time,

The Library of Babel

Best introduction to infinity, meaning of language, and combinatorics.

The Garden of Forking Paths

Time, multiverse, and how one might live when all possible multiuniverses exist.

Funes the Memorious

To keep all details makes abstract learning impossible. To learn requires forgetting differences that don't make a difference. Funes can't do that.

A good metaphor for how learning works, and an introduction to learning theory.

The Form of the Sword

Human drama.

Theme of the Traitor and the Hero

Human drama.

Death and the Compass

Detective tries too hard to be clever and ignores the obvious solution. He died.

The Secret Miracle

Man sentenced to death, prayed for a miracle, and granted. Time paused for a year, and, he wrote a whole book in his head. He died when time unpaused.

The paused time makes no difference for anyone except himself, nobody knew this miracle and the outside world didn't change (presumably his thinking was purely in the soul, and doesn't require any physical chemistry?)

Three Versions of Judas

Introduction to theological thinking. Judas was the secret savior. God degraded himself fully into human, into Judas, so as to save humans. It is good for a human to be bad because only God is worthy of goodness.

Theological thinking style is interesting. I feel like AI philosophy can learn from them.

The End

Human drama. A man kills another man.

The Sect of the Phoenix

Riddle. Apparently answer is "sex", but I don't find it convincing.

The South

Human drama. Man tries to die with drama than to live with mundane boredom.

The Aleph The Immortal

Ethics of immortal humans. Every human action can be construed to have meaning and no meaning in an immortal life.

Human drama.

The Theologians

Introduction to theological thinking. Cyclic time vs linear time. Description of a strange value system ("Histrioni") that tries to be evil in order to be good. Philosophical question: is it possible for two moments of time to be the same?

Emma Zunz

Human drama.

The House of Asterion

The minotaur tells his story from his mildly inhuman (still mostly human) viewpoint .

Deutsches Requiem

Philosophy of the meaning of life, explained by a Nazi martyr. Nazi theodicy. Fate makes everything meaningful, including killing, and being killed for it.

Similar in philosophy to Three Versions of Judas.

Averroës's Search

Averroës tries to understand what a play is without ever seeing one. It's a bit funny, like Mary the color scientist.

The Zahir

Zahir is a thing that makes anyone who sees it become obsessed with it, losing touch with reality. The narrator saw a coin Zahir, but is happy, since he will simply pass "from a very complex dream to a very simple dream."

Others will dream that I am mad, and I will dream of the Zahir. When all men on earth think day and night of the Zahir, which one will be a dream and which a reality, the earth or the Zahir?

Includes a story told from the viewpoint of a dragon.

The Writing of the God

Gods created the world. They encoded in the pattern of jaguar a magic spell that grants omnipotence. An imprisoned Aztec priest became enlightened in a dream and deciphered the spell (14 apparently random words), but didn't say it because he forgot himself.

Strange philosophy of enlightenment.

Ibn-Hakam al-Bokhari, Murdered in His Labyrinth

Human drama. Detective story.

The Two Kings and the Two Labyrinths

It's just as easy to be lost in a desert as in a maze.

The Wait

Human drama. Some boring man muses on time and waiting, while hiding from his killers.

The Man on the Threshold

Human drama.

The Aleph

The Aleph is a point in space that contains all other points, where you can see everything in the universe from every angle simultaneously.

Most of the story is human drama. The only worthwhile paragraphs start at "It was then that I saw the Aleph."

The Maker The Maker

Human drama. An ancient Greek soldier lived a simple life of sensual pleasures. He was going blind and felt terrible about it, but then vivid good memories came to him, and he understood by some weird reasoning, that death is the next big adventure.

Dreamtigers

Borges loved tigers, and when he lucid dreamed, he tried to make a tiger, but was frustrated that all his dream tigers were worse than real tigers.

Dialogue on a Dialogue

If death is nothing serious, why not die now?

Toenails

Meh.

The Draped Mirrors

Borges is scared of mirrors.

Argumentum Ornithologicum

If I imagined a flock of birds, then the number of birds is an integer, so someone counted it. I couldn't count fast enough, so God counted.

Thus God exists.

The Captive

A boy was abducted and found years later. Borges wonders if he was really the same person anymore.

The Sham

All historical events of the same theme (such as assassination) are really just imperfect reproductions of an original event outside of time.

Platonism of historical events.

Delia Elena San Marco

Borges's friend, Delia, died. Borges obsesses about their last goodbye, and hopes that souls are eternal, and one day they meet again.

Argentinian human drama.

The Plot

A gaucho (Argentinian legendary cowboys) was betrayed and killed just like Caesar. Borges claims that his death had a higher meaning: he is an actor in a replay of a historical drama.

A Problem

Suppose Don Quixote killed a man, what happens next?

1. He keeps being deluded, thinking it's no big deal.
2. He becomes shocked into sanity forever.
3. He becomes shocked into more delusions to deny that he killed a man.
4. Actually, the world is an illusion and nothing is actually real. Hinduism!
A Yellow Rose

Descriptions, stories, poems, about a thing, are always worse than the actual thing. Language can't perfectly reflect what they talk about.

we may mention or allude to a thing, but not express it; and that the tall, proud volumes casting a golden shadow in a corner were not a mirror of the world, but rather one thing more added to the world.

The Witness

A lonely pegan died among a Christianized England. The Saxon peganism culture died with him.

Martin Fierro

Human dramas are soon forgotten.

Mutations

The arrow used to kill. Now it's a symbol. Everything dies and changes. Nobody knows who the future would think of them as. A legend, a symbol, an echo?

Parable of Cervantes and Don Quixotes

Cervantes and Don Quixote are both people stuck in a boring world wishing for myths. Then they became myths for the future.

There are secret messages in ordinary life, and by finding those secret messages, we could find that infinite thing that was lost. (Note how this is similar to delusion of reference and pareidolia.)

Perhaps a feature of the crucified face lurks in every mirror; perhaps the face died, was erased, so that God may be all of us. Who knows but that tonight we may see it in the labyrinth of dreams, and tomorrow not know we saw it.

Parable of the Palace

Poets toured the Emperor's giant palace, and were amazed by its beauty. But at the end of the tour, a poet spoke a single word which was the minimal description of the palace. He was killed.

Philosophy of language, and minimal description length. What's the minimal description of the universe? Is it short?

Everything and Nothing

Shakespeare had no own personality so he wrote dramas instead. He told God he just wanted to be himself, instead of so many selves. God replied that, just like Shakespear, He also had no true self, and simply dreamed up all the persons in the world.

Ragnarök

Borges dreamed that he was at a business meeting, then the gods came for a visit. The gods had been in exile for centuries and lived a very hard life, and grown old into petty criminals who would kill for a penny, so Borges shot them.

Inferno, I, 32

Theodicy: there is a meaning to our lives that we are too dumb to understand.

Borges and I

Introduction to the problems of personal identity.

A philosopher analyzed it.

On Exactitude in Science

Just read the whole thing. It's so short.

In Memoriam, J.F.K.

What if all the weapons are really the same, somehow.

In Praise of Darkness The Ethnographer

Murdock lived among American Indians for his PhD, and learned a secret he didn't want to talk about because it doesn't matter anymore.

Similar to The Writing of the God

A Prayer

Borges might be Christian.

His End and His Beginning

Heaven is terrifying.

Brodie's Report

This book contains only human drama, mostly Argentinian. I did not read much.

Brodie's Report

Description of a half-human social species. Very disgusting. Similar to Yahoos in Gulliver's Travels, and Molochs in The Time Machine.

The Book of Sand The Other

Borges talks with younger self.

Ulrikke

Human drama.

The Congress

An organization tries to make a congress that represents everyone, then have a problem about how to do it without just inviting everyone. It also tries to keep every printed material, which is too many.

There Are More Things

Bad imitation of Lovecraftian horror.

The Sect of the Thirty

Brief description of a sect that worships both Judas and Jesus.

The Night of the Gifts

Human drama.

The Mirror and the Mask

The king commissioned a poet. He wrote something so awesome that it killed the poet and turned the king into a beggar.

“Undr”

Norse human drama.

A Weary Man's Utopia

Utopia according to Borges. Nothing much happens, as it should. A more boring version of Brave New World.

The Bribe

Human drama.

Avelino Arredondo

Argentinian human drama.

The Disk

Old man claims to have a disk with only one side and that makes him king.

The Book of the Sand

Man bought a book that has an infinite number of pages. He gave the book up because he was afraid of his obsession.

It is as if the Library of Babel was compressed to one book.

Sakespeare's Memory August 25, 1983

Borges meets himself at the moment of death.

Blue Tigers

Man terrified by some blue stones that shift in and out of existence, defying arithmetic and logic.

The Rose of Paracelsus

Young man wants to become the sorcerer's apprentice, but demands to see a magic trick first. Sorcerer pretends he couldn't. Young man leaves sadly.

Written like a koan. Something like “Faith is more important than rational understanding.”

Shakespeare's Memory

Professor got a Shakespeare's memories transplanted into his brain. He started exploring it but also found himself confused about his personal identity. He gave Shakespeare away to preserve his identity.

Philosophical description of memory and personal identity.

Discuss

### 28 social psychology studies from *Experiments With People* (Frey & Gregg, 2017)

Новости LessWrong.com - 16 июня, 2019 - 05:23
Published on June 16, 2019 2:23 AM UTC

I'm reading a very informative and fun book about human social psychology, Experiments With People (2nd ed, 2018).

... 28 social psychological experiments that have significantly advanced our understanding of human social thinking and behavior. Each chapter focuses on the details and implications of a single study, while citing related research and real-life examples along the way.

Here I summarize each chapter so that you can save time. Some results are old news to me, but some were quite surprising. I often skip over the experimental details, such as how the psychologists used ingenious tricks to make sure the participants don't guess the true purposes of the experiments. Refer to originals for details.

The experiments start in the 1950s and get up to 2010s, and occasionally literatures from before 1900s are quoted.

Chapters I find especially interesting are:

• Chap 14. It lists the many failures of introspection, and raises question as to what consciousness can do.
• Chap 16. It has significant similarity with superrationality and acausal trade.
• Chap 20. It warns about how credulous humans are.
• Chap 27. It is about the human fear of death and the psychological defenses against it.
• Chap 28. It shows how belief in free will can be motivated by a desire to punish immoral behaviors. Understanding why people believe in free will is necessary for a theory of what is the use of the belief in free will.
Chap 1. Conforming to group norms

Video demonstration.

Groups of eight participated in a simple "perceptual" task. In reality, all but one of the participants were actors, and the true focus of the study was about how the remaining participant would react.

Each student viewed a card with a line on it, followed by another with three lines labeled A, B, and C (see accompanying figure). One of these lines was the same as that on the first card, and the other two lines were clearly longer or shorter. Each participant was then asked to say aloud which line matched the length of that on the first card... The actors would always unanimously nominate one comparator, but on certain trials they would give the correct response and on others, an incorrect response. The group was seated such that the real participant always responded last.

It was found that

• When there are over 3 actors giving unanimously the wrong answer, the participant went along 1/3 of time.
• Increasing the number of actors above 3 did not increase compliance.
• Even when the difference between the lines was 7 inches, there were still some who complied.
• If there is at least one actor disagreeing with the majority, the participant decreased compliance.
• If the fellow dissenter joins the majority, the participant increased compliance to the same level of 1/3.
• If the fellow dissenter leaves, the participant increased compliance only slightly.

There are two reasons for this compliance. One is heuristic about knowledge: the majority is usually more correct. Another is normative: social acceptance matters more than being correct.

The effect of a dissenting minority is notable.

Research finds that, whereas majorities inspire heuristic judgments and often compliance, minorities provoke a more systematic consideration of arguments, and possibly, an internal acceptance of their position.(Nemeth, 1987) Majorities tend to have a greater impact on public conformity, whereas minorities tend to have more effect on private conformity. (Chaiten & Stangor, 1987)

Chap 2. Forced compliance theory and cognitive dissonance

In When Prophecy Fails, the story of a UFO cult was detailed. When the doomsday prophecy failed, most people left, but some became even firmer believers.

(My own example, not appearing in the book.) In Borges's story A Problem, Borges asks, how would Don Quixote react if he kill a man?

Having killed the man, don Quixote cannot allow himself to think that the terrible act is the work of a delirium; the reality of the effect makes him assume a like reality of cause, and don Quixote never emerges from his madness.

This chapter reviews of Cognitive consequences of forced compliance (Festinger & Carlsmith, 1959)

1. Participants were asked to do an extremely boring task.
2. Then the experimenter asked the participant to deceive the next participant that the experiment was fun. Half were paid $1, another half paid$20.
3. A control group was not asked to lie.
4. Then they were nudged to take a survey about how they felt about the experiment.

The result is that, those paid $1 thought the experiment was fun, and those paid$20 thought it was boring, and those that didn't get asked to lie thought it was very boring.

Festinger explains this by the theory of cognitive dissonance:

1. An attitude (thinking the experiment was boring) and a behavior (saying it was fun) clashes, creating an uncomfortable feeling.
2. The participant then is motivated to remove the discomfort by changing the attitude by rationalization (thinking the experiment was actually fun).
3. If the participant was paid $20, then there was no dissonance, as there was a ready explanation of the dissonant behavior. 4. If the participant was paid$1, then there was dissonance, because the participant regarded the lying behavior as mostly voluntary.

An alternative explanation from Self-perception: An alternative interpretation of cognitive dissonance phenomena (Bem, 1967):

1. We don't form beliefs about ourselves by direct introspection, instead, we infer it through
2. When we behave against previously self-beliefs, this creates an update on our self-beliefs.

See also Chap 14 for more on the lack of introspection.

Current consensus is that both theories are correct, in different situations. The self-perception effect happens when the behavior is mildly different from self-beliefs, and the cognitive dissonance effect happens when the behavior is grossly different.

There are also many complications, such as in Double forced compliance and cognitive dissonance theory (Girandola, 1997), which reported that even if participants performed a boring task, then told others about how boring it was, afterwards they still felt the task was more interesting afterwards.

There is a lot of ongoing research.

Chap 3. Suffering can create liking

Such curious phenomena as hazing has been studied since The effect of severity of initiation on liking for a group (Aronson, 1959)

An experiment was conducted to test the hypothesis that persons who undergo an unpleasant initiation to become members of a group increase their liking for the group; that is, they find the group more attractive than do persons who become members without going through a severe initiation.

The group was a made-up thing by the experimenters. It purports to discuss interesting sexual things, but the participants, after finally "joining", would only hear a very boring group discussion about animal sex.

This hypothesis was derived from Festinger's theory of cognitive dissonance." 3 conditions were employed: reading of "embarrassing material" before a group, mildly embarrassing material to be read, no reading. The results clearly verified the hypothesis.

The "embarrassing material" are lists of obscene words. The "mildly embarrassing material" are lists of mildly sexual words.

Result: the very embarrassing ritual increased liking for the group.

Explanation was by the theory of cognitive dissonance: "I have already invested so much to join the group. I must be a fool if the group turned out to be bad! And I'm not a fool."

Cognitive dissonance has been used for brainwashing, persuasion, education, and many other kinds of things.

One of the authors learned from an investigative journalist about how a dodgy car company... had customers unnecessarily wait or hours while their finance deal was supposedly being negotiated upstairs.

[Commitment and community: Communes and utopias in sociological perspective (Kanter, 1972)] noted that

19th-century utopian cults requiring their member to make significant sacrifices were more successful. For example, cults that had their members surrender all their personal belongings lasted much longer than those that did not.

Some bad investments are continued far after they had become clearly unprofitable, this is the sunk cost fallacy.

Chap 4. Just following orders

The banality of evil is the theory that everyday people can do great evils such as the Holocaust, by simply following orders.

Behavioral study of obedience (Milgram, 1965) reported the famous Milgram experiment. A video recreation is here.

This is a very famous experiment with many followups. There is sufficient material freely online, such as the Wikipedia page. So I won't recount it here.

I was most surprised to learn that personality had very little effect. That is, obedience exhibited by the participants in this experiment was mostly situational, instead of stemming from the personality of the participants.

Chap 5. Bystander apathy effect

The murder of Kitty Genovese stimulated research into the "bystander effect". On March 13, 1964 Genovese was murdered... 38 witnesses watched the stabbings but did not intervene or even call the police until after the attacker fled and Genovese had died...

In Bystander intervention in emergencies: Diffusion of responsibility (Latané & Darley, 1968) attributed the lack of help by witnesses to diffusion of responsibility: because each saw others witnessing the same event, they assumed that the others would take responsibility.

This phenomenon has a big literature, and is very popularly known, possibly due to the dramatic stories.

Concerning the original experiment by Latane and Darley, I was again surprised that personality factors had little effect, except one: growing up in a big community is correlated with a lower probability of helping.

Chap 6. The effect of an audience

When people perform a task in the presence of others, they perform better if the task is easy, and worse if the task is hard. One theory is that presence of others increases physiological arousal, which then enhances performance of simple tasks and decreases performance of hard tasks. Other theories

In Social enhancement and impairment of performance in the cockroach (Zajonc, 1969), it is found that this is true even for cockroaches. In the experiment, Zajonc gave cockroaches two possible tasks: going through a straight maze, or a more complex maze. They either did the task alone, or while being watched by others outside (the maze was transparent).

While being watched, cockroaches solved faster on the straight maze but slower on the complex maze. This demonstrates that the physiological arousal theory is correct in cockroaches: the effect of an audience can happen without any complex cognitive ability.

However, complex cognitive ability sometimes does occur in humans. As reported in Social facilitation of dominant responses by the presence of an audience and the mere presence of others (Cottrell et al, 1968), blindfolded audience does not exert an effect on the performer.

Chap 7. Group conflicts from trivial groups

This chapter begins with the Robbers Cave experiment, which was a study that investigates the realistic conflict theory, which sounds very common-sense:

1. group conflicts and feelings of resentment for other groups arise from conflicting goals and competition over limited resources
2. length and severity of the conflict is based upon the perceived value and shortage of the given resource
3. positive relations can only be restored with goals that require cooperation between groups

Then it recounts the blue eyes-brown eyes experiment. The problem, then, is, what is the least amount of group-difference in order to make a difference? Enter the minimal group paradigm of Experiments in intergroup discrimination (Tajfel, 1970). Participants first took a test on estimating dot numbers, then divided into "overestimators" and "underestimators", while in truth they were random. Then, they were given points (convertible to cash) to divide among the groups. Participants favored their own groups significantly more.

In fact, the most favored strategy was to maximize (own group) - (other group), even though it did not maximize (own group). Thus, even the most minimal social groups induced ingroup-outgroup conflict.

The minimal group paradigm has been studied in many ways. It was also found that outgroup homogeneity effect, that is, "they are all the same; we are diverse", could also arise from minimal groups.

One theoretical explanation is Tajfel and Turner's social identity theory (Tajfel & Turner, 1979), which states that: 0. A person's self-esteem depends on having a good identity.

1. A person's identity has two parts: personal and social.
2. Personal identity are about one's own traits and outcomes.
3. Social identity are derived from social groups and comparison between groups.
4. A person is motivated to improve self-esteem, and thus social identity.
5. Thus, one is motivated to improve the standings of one's ingroups and decrease the standings of one's outgroups.

One supporting evidence is that when a person has more self-esteem, they are less discriminating against outgroups (Crocker et al, 1987).

Chap 8. The Good Samaritan Experiment

a traveller is stripped of clothing, beaten, and left half dead alongside the road. First a priest and then a Levite comes by, but both avoid the man. Finally, a Samaritan happens upon the traveller. Samaritans and Jews despised each other, but the Samaritan helps the injured man.

This inspired an experiment reported in "From Jerusalem to Jericho": A study of situational and dispositional variables in helping behavior. (Darley & Batson, 1973), participants were theology students asked to give a short talk in another building.

People going between two buildings encountered a shabbily dressed person slumped by the side of the road. Subjects in a hurry to reach their destination were more likely to pass by without stopping.

The experiment was 2 x 3: the participant was asked to either give a short talk on the parable of the Good Samaritan, or on an irrelevant topic. They were either very hurried, hurried, or not hurried by the experimenter.

Hurrying made significant difference in the likelihood of their giving the victim help. The topic of the talk had some influence, according to a reanalysis by (Greenwald, 1975), despite the original paper's claim of no influence. Self-reported personality and religiosity made no difference.

The lesson from this as well as many other social psychology experiments is that seemingly trivial situational variables have a greater impact than personality variables, even though people tend to explain behaviors using personality. See The Person and the Situation: Perspectives of Social Psychology (Lee Ross, Richard E. Nisbett, 2011)

Chap 9. External motivation harms internal motivation

Extrinsic motivations are motivations that "come from the outside", such as money, praise, food. Intrinsic motivations are from the inside, such as self-esteem, happiness. Both can motivate behaviors. However, it's interesting that sometimes extrinsic motivations can harm internal motivation.

In Undermining children's intrinsic interest with extrinsic reward: A test of the" overjustification" hypothesis (Lepper et al, 1973), children are given markers to draw with. Some were told that they would be rewarded with a prize for playing, others got a prize unexpectedly, others were left alone as control group.

After some days, the amount of time children spent playing the markers were: got expected prize < control group < got unexpected prize

The book didn't talk much about why the unexpected prize created higher motivation, but I think it is similar to how gambling addiction comes from variable reward.

There are some explanations for why extrinsic reward lowered subsequent motivation. One is that extrinsic reward provides overjustification effect, where external rewards "crowd out" internal rewards,

Once rewards are no longer offered, interest in the activity is lost; prior intrinsic motivation does not return, and extrinsic rewards must be continuously offered as motivation to sustain the activity.

Another explanation is that humans heuristically view means to an end as undesirable. In (Sagotsky et al, 1982), children were given two activities, playing with crayons and markers. They were equally fun at the beginning, but one group was told that in order to play with crayons, they had to play with markers first. After a while, they became less interested in playing with markers. The other group, the reverse.

I think this is the psychological basis of some ethical intuitions in the style of Kant:

we should never act in such a way that we treat humanity as a means only but always as an end in itself.

A third explanation is that people consider extrinsic rewards a threat to their freedom and autonomy, and thus tend to rebel against it. I saw a news today about Amazon's program to gamify work. Some complained that it was threatening the workers' autonomy, which is a strange complaint: if gamification actually increases intrinsic motivation for work, doesn't it increase autonomy? Autonomy is freedom to follow one's intrinsic motivation, and thus, if a worker acquires an intrinsic motivation to do a good job, they would have more autonomy.

I think this complaint can be explained as a different kind of autonomy: freedom from prediction. Humans are evolved to want to be unpredictable (at least by others), because to be predictable is to be threatened by manipulation, which often decreases fitness.

Chap 10. Actor-observer asymmetry

Other people did what they did because of who they are. We did what we did because of outside events.

In 1975, parts of the Watergate scandal was recreated in a very dramatic psychology experiment, reported in Ubiquitous Watergate: An attributional analysis (West, 1975).

80 criminology students were asked to meet the experimenter privately for a mysterious reason. There, they were asked to join a burglary team for secret documents in an ad agency. There were four versions presented:

1. The burglary plan was sponsored by a government agency, for secret investigation purposes. Government would provide immunity if caught.
2. Same, but without immunity.
3. The plan was sponsored by a rival ad agency, with \$2000 reward.
4. The student was asked to only join a test run of the plan, without stealing anything.

Afterwards, they were debriefed and asked to explain their decision to join/not join.

Separately, 238 psychology students were presented the above situation, and asked to guess what percentage would agree to the plan.

Then, half of the participants were asked, "Suppose John agreed to participate, explain why John agreed."

Results:

• About 45% of participants agreed to join the burglary in the government-with-immunity situation. Otherwise, about 10%.
• Most students in the second part thought they would not agree to the burglary plan.
• Students in the first part who agreed to join the burglary explained their behavior as due to the circumstances.
• Students in the second part explained the hypothetical John's behavior as due to John's personality.

The criminology students were "actors", and the psychology students were "observers". An asymmetry was that actors attributed their behavior to situations, while the observers attributed to personalities. This is the actor-observer asymmetry.

Complications in this asymmetry are noted in The actor-observer asymmetry in attribution: A (surprising) meta-analysis (Malle, 2006). Malle found that there are two kinds of biases: when the behavior is negative, the actor blames the situation and the observer blames the person. When the behavior is positive, the reverse happens. As such, this can be explained as a self-serving bias.

The authors conclude with a funny note:

it's interesting to how athletes often publically thank the Lord for a personal victory, but do not publically blame the Lord for a defeat!

Chap 11. We are number 1

They never shout, "They are number 1."

People like to think good about themselves. Even in collectivistic societies, people regard themselves as above average in collectivistic traits, according to Pancultural self-enhancement (Sedikides et al, 2003)

Americans... self-enhanced on individualistic attributes, whereas Japanese... self-enhanced on collectivistic attributes

An experiment is reported in Basking in reflected glory: Three (football) field studies (Ciadini et al, 1976), where students are asked to describe a recent university sports team's victory/defeat. Before that, half received criticisms that decreased to their self-esteem, and others received praises that increased their self-esteem.

The result was that among those who had higher self-esteem, they described the sports outcome using "we won" or "we lost" 1/4 of the times. For those who had lower self-esteem, they used "we won" 40% of the times when the team won, but used "we lost" only 14% of the times when the team lost.

The explanation is that people in need of boosts to self-esteem try to BIRG (Basking in reflected glory) and CORF (Cut off from reflected failures). Reflected glory also improves their social standing.

Methods of increasing one's social standing are called impression management, and include:

• BIRG and CORF, as noted above;
• ingratiation: we praise and agree with others, so as to be liked;
• self-handicapping: a student gets drunk before a big test, so that if they fail, they could blame on the drunkenness instead of their study ability;
• exemplification: behave virtuously and make sure others saw it.

A lot of these techniques are listed in (Jones and Pittman, 1982).

Chap 12. Deindividuation

The experiment was run in a Halloween. An experimenter place a bowl of candy in her living room for trick-or-treaters. A hidden recorder observes. In one condition, the woman asked the children identification questions such as their names. In the other condition, children were completely anonymous. Some children came individually, others in a group.

In each condition, the woman invited the children in, claimed she had something in the kitchen she had to tend to, and told each child to take only one candy.

Result: being in a group and being anonymous both increased frequency of transgression (taking more than one candy). If the first child to take candies in a group transgressed, other children were also more likely to transgress.

The authors then defined deindividuation as when private self-awareness is reduced.

The truly deindividuated person is alleged to pay scant atetntion to personal values and moral codes... to be inordinately sensitive to cues in the immediate environment.

One study, The baiting crowd in episodes of threatened suicide (Mann, 1981), examined 21 cases from newspapers, in which crowds were present when a person threatened to jump off a high place.

Baiting or jeering occurred in 10 of the cases. Analysis of newspaper accounts of the episodes suggested several deindividuation factors that might contribute to the baiting phenomenon: membership in a large crowd, the cover of nighttime, and physical distance between crowd and victim (all factors associated with anonymity).

Two theories of why deindividuation were given. One is that anonymity makes people feel safe to transgress. Another is that (Reicher & Postmes, 1995) people in a crowd would categorize themselves mainly by their social identity, and their behaviors would reflect the group norm than their personal norms.

I was disappointed that the authors did not give evolutionary psychological explanations for deindividuation. Humans are the only animals that wage wars. A deindividuation effect can be an evolutionary adaptation to prepare humans to fight more effectively in a crowd.

Chap 13. Mere exposure effect

People prefer familiar things. Really, that's quite a banal observation. What's delightful about this chapter is the ingenuity of the experiment design.

Think about your own face. You see them in a mirror image (unless you take a selfie), but others see it directly. This means that you are familiar with your face in the mirror image, but others in the direct image.

This is exploited in Reversed facial images and the mere-exposure hypothesis (Mita et al, 1977). Couples were separately shown photos of the female one's face, some mirrored, others not. They were asked to pick the one they prefer. The female one preferred the mirrored photo, and the male one preferred the direct photo.

Mere exposure effect is robust in real life and across species. (Grush et al, 1978) found that

previous or media exposure alone successfully predicted 83% of the [US congress election] primary winners

And (Cross et al, 1967) found rats who heard Mozart music in infancy preferred Mozart over Schoenberg as adults, and vice versa.

One possible evolutionary psychological explanation were given: preference familiarity is safer, and thus more adaptive. The authors warned however that it's not so simple, as people also have a preference for mild novelty.

Chap 14. Shortcomings of introspection

This chapter reviews a study that shows a particular instance of introspection failure:

people's ideas about how their minds work stem not from private insights but from public knowledge. Unfortunately, however, this public knowledge is often not accurate. It is based on intuitive theories, widely shared throughout society, that are often mistaken.

Subjects are sometimes (a) unaware of the existence of a stimulus that importantly influenced a response, (b) unaware of the existence of the response, and (c) unaware that the stimulus has affected the response. It is proposed that when people attempt to report on their cognitive processes... they do not do so on the basis of any true introspection. Instead, their reports are based on a priori, implicit causal theories, or judgments about the extent to which a particular stimulus is a plausible cause of a given response.

In the experiment, a subject is given a fictitious application from Jill for the job of staff at crisis center. These applications are the same except on a few attributes of the applicant: attractiveness, intelligence, etc. Then, the subject is asked how much each attribute is correlated with the decision to accept.

The situation is then described to some observers (who didn't do the job application review), who are asked how much each attribute is correlated with the decision of the subject to accept.

Subjects who read that Jill had once been involved in a serious car accident claimed that the event had made them view her as a more sympathetic person. However, according to the ratings they later gave, this event had exerted no impact... the only exception pertained to ratings of Jill's intelligence. Here, an almost perfect correlation emerged between how subjects' judgments had actually shifted and how much they believed they had shifted. Why so? The researchers argued that there are explicit rules, widely known throughout a culture, for ascribing intelligence to people. Because subjects could readily recognize whether a given factor was relevant to intelligence, they could reliably guess whether they would have taken it into consideration.

The determinations of subjects and observers coincided almost exactly.

There are other introspection failures demonstrated by social psychology. People are unaware of the halo effect at work in their own judgments of others (Nisbett &Wilson, 1977). People are unaware of the source of their own arousal. People are unaware of their bias even if they know of such bias (Pronin et al, 2002).

In a further twist, introspection can degrade judgment. In (Wilson & Kraft, 1993), participants reported how they felt about their romantic partners. Their expressed feelings correlated well with the duration of relationship. However, if they introspected on the reason of their feelings, before reporting their feelings, the correlation disappeared.

The authors conclude by suggesting that traveling, by putting oneself into novel situations, would be particularly helpful for one to know oneself.

Chap 15. Self-fulfilling prophecies

Again, a very well-known subject with a lot already written. This chapter reviews Social perception and interpersonal behavior: On the self-fulfilling nature of social stereotypes (Snyder et al, 1977)

Male "perceivers" interacted with female "targets" whom they believed to be physically attractive/unattractive. Tape recordings of each participant's conversational behavior were analyzed by naive observer judges for evidence of behavioral confirmation... targets who were perceived to be physically attractive came to behave in a friendly, likeable, and sociable manner in comparison with targets whose perceivers regarded them as unattractive. It is suggested that theories in cognitive social psychology attend to the ways in which perceivers create the information that they process in addition to the ways that they process that information.

Philosophically, a self-fulfilling prophecy is a prediction about a future that is true iff the act of prediction is done. Usually, predictions themselves are supposed to be independent of the future that they talk about. Of course, all useful predictions must affect the future -- the predictor would try to profit from the prediction. However, such effects on the future are on the predictor, not on the predicted.

Social psychologists have found that human behaviors are more influenced by the situation than the personality (as noted in The Person and the Situation book). Snyder et al suggested that, in fact, personality traits are one of those self-fulfilling prophecies.

our believing that others possess certain traits may cause us to behave in certain consistent ways toward them. This may cause them to behave in consistent ways in our presence.

In other words, a lot of the persistence of personality could arise from the fundamental attribution error.

Chap 16. How to live like a predeterminist

So then, God has mercy on whom he chooses to have mercy, and he hardens whom he chooses to harden. -- Romans 9:18, which Calvinists quote a lot.

Suppose an urge to smoke and a propensity to lung cancer are both genetically determined, and smoking does not cause lung cancer, why not smoke? If you feel the urge to smoke, it's already too late.

Believers of Calvinism think that God has chosen some people to be saved, and others are damned. Those who are favored by God would both be naturally free from the urge to sin in this world, and enjoy paradise after death. Those who are not, would feel the urge to sin in this world, and go to hell after death.

So if a Calvinist feels an urge to sin, it's already too late. Why not sin? Instead, Calvinists keep resisting the urge to sin, and moreover, deny that they are resisting such urges, and insisting that they are effortlessly virtuous, evidence of God's favor.

In Causal versus diagnostic contingencies: On self-deception and on the voter's illusion (Quattrone & Tversky, 1984) two experiments are reported.

In the first one, participants exercised, then were asked to put their hands in ice water until the pain makes them withdraw. Then they were told a version of the lung cancer puzzle: There are two kinds of hearts, type 1 and type 2, caused by unchangeable genetics. Type 1 heart is associated both with health and with a higher tolerance to the ice water after exercise. Type 2 heart is associated with early death and a lower tolerance. They then did the ice water test again, and they exhibited longer tolerance to the ice water, even though many of them denied that they were trying to do so.

In the second experiment, subjects encountered one of two theories about the sort of voters who determine the margin of victory in an election. Only one of the theories would enable voting subjects to imagine that they could "induce" other like-minded persons to vote. As predicted, more subjects indicated that they would vote given that theory than given a theory in which the subject's vote would not be diagnostic of the electoral outcome, although the causal impact of the subject's vote is the same under both theories

One explanation is that the unconsciousness deceived the consciousness, but the authors find this unreasonable, for it still does not explain what motivates the unconsciousness to deceive. They instead favored Greenwald's theory that people avoid analyzing in detail threatening information, just like how we throw away junk mail without looking in detail.

In conclusion, self-deception is not the result of one center of intelligence hoodwinking the other. Rather, it is the result of a low-level screening process that banishes suspicious cognitions before they have the opportunity to be fully entertained by the conscious mind.

Similarity to superrationality and acausal trade.

The behavior of Calvinists is similar to superrationality and acausal trade, in which agents behave in a way that is diagnostic of good outcomes, even if it does not cause good outcomes.

Assuming the superrational player has access to their opponents' source codes/simulations, the superrationality strategy can be justified, but then it would just be usual rationality.

I think normative decision theories are incompatible with sufficiently good prediction. Normative decisions are only defined for agents with apparent free will. An agent apparently has free will only to someone who cannot predict the agent's behavior well. Superrationality and acausal trade both attempt to make a decision theory for agents that are aware that they are too predictable (to themselves or to someone they play with). This is similar to the situation where someone sees the future and then "decides" to rebel against the future. Either they saw the true future and did not rebel, or they did not see the true future at all. It's illogical to say they both saw the future and rebelled against it.

Similar problems happen with Scott Aaronson's solution to Newcomb's paradox (I'm a "Wittengenstein"). A determinist who is self-aware of their determinism would, instead of offering a decision theory ("I should take one box because..."), offer a prediction theory ("I probably would take one box because...").

Chap 17. Partisan perceptions of media bias

People often complain of media biases. People report differently about the same event. Why?

In The hostile media phenomenon: biased perception and perceptions of media bias in coverage of the Beirut massacre (Vallone et al, 1985), the researchers studied how people perceived news about the Bairut massacre,

killing of civilians, mostly Palestinians and Lebanese Shiites... carried out by the militia under the eyes of their Israeli allies.

The researchers took some neutral reports on the event, and as expected, pro-Israel people thought they are biased to be anti-Israel, while anti-Israel people thought they are biased to be pro-Israel.

In a study on biases (Lord et al, 1984), participants avoided bias by this command:

"Ask yourself at each step whether you would have made the same evaluations had exactly the same study produced results on the other side of the issue.

Chap 18. Empathy-altruism hypothesis

Several theorized psychological mechanisms of human altruistic actions are studied in More evidence that empathy is a source of altruistic motivation (Batson, 1982) reported an experiment on whether people would help a person in need.

It was found that: If (empathy OR guilt), then (helping). That is, people can be motivated to act altruistically by empathy without expectation of gain, or to gain relief from guilt. This argues against the theory of psychological hedonism.

Other potential sources of altruism are collectivism (act for the benefit of a group) and principlism (uphold a principle for its own sake). Effective altruism is one example of principlism based on utilitarianism.

Chap 19. Expanding the self to include the other

A psychological phenomenon of love (close personal relationships, such as lover, best friend) is to include that person in one's self. This involves perceiving, and allocating resources to, that person, in a similar way as to one's self.

Three experiments are described, from Close relationships as including other in the self (Aron, 1991).

1. When allocating money, they allocate about the same to themself as to their friend.

2. They were asked to imagine nouns paired with their selves, mothers, or strangers. They recalled fewer nouns imagined with self or mother than nouns imagined with a stranger, suggesting that mother was processed more like self than a stranger.
They explained the reason why it was recalled less by that we usually look at strangers directly, but only ourselves upon reflection (literal or not), and so it's harder to imagine ourselves than strangers.

3. When faced with a task to sort a list of adjectives into 4 piles: "true/false about me, and true/false about my spouse", they reacted slower on adjectives that were true about one but false about the other. This was explained by that differences between one's own and a close other's properties caused dissonance in the same way that holding opposite attitudes within oneself can cause dissonance.

Chap 20. Believing precedes disbelieving

Descartes divided the mind up into intellect and will. The intellect writes up potential beliefs about the world; the will then chooses which to endorse. Spinoza said that we believe everything that we happen to understand, and then disbelieve only if we find it necessary. You Can't Not Believe Everything You Read (Gilbert, 1993) presented three experiments that supports Spinoza's theory, and discussed its sociological effect.

... we asked subjects in Experiment 1 to play the role of a trial judge and to make sentencing decisions about an ostensibly real criminal defendant. Subjects were given some information about the defendant that was known to be false and were occasionally interrupted [by a distraction task]... We predicted that interruption would cause subjects to continue to believe the false information they accepted on comprehension and that these beliefs would exert a profound influence on their sentencing of the defendant...

Experiments 1 and 2 provide support for the Spinozan hypothesis: When people are prevented from unbelieving the assertions they comprehend... they did not merely recall that such assertions were said to be true, but they actually behaved as though they believed the assertions.

Chap 21. Inferred memories

When we recall a memory, that memory is an inference about past based on a number of clues that we have in the present. It is not necessarily accurate.

Experiment from Women's theories of menstruation and biases in recall of menstrual symptoms (McFarland, 1989) found that when women report, day-to-day, their unpleasant emotions, there is no difference between premenstrual, menstrual, and inter-menstrual days (they feel equally unpleasant). But when asked to recall how unpleasant it was, they recall significantly more unpleasant pre-menstrual and menstrual days, and less unpleasant inter-menstrual days.

This is explained by that, when they recall, they used intuitive theories about PMS to infer "how it must have felt" instead of "how it actually felt". This also, as a side effect, casts doubt on whether PMS actually exists.

Memories can be completely made up, as in repressed memory therapies.

The fact that those inferences about the past are felt as genuine recalls, shows how little conscious introspection can give true knowledge about the self.

Chap 22. Ironic process theory

Try to not to think of a polar bear!

The theory of ironic process is that there is a cognitive process called intender who is looking for contents that matches some desired mental state. There is also a monitor who notifies consciousness about errant thoughts.

The intender is a costly process, and the monitor is a cheap process, so when one is under cognitive load, the intender doesn't work well, but the monitor still works well, and ironically, trying to not think of something results in thinking of it.

Ironic Processes of Mental Control (Wegner, 1994) reported an experiment. Participants were asked to consciously improve/deprove their moods with happy/sad thoughts. Half were also asked to do a memory task as cognitive load.

Those not under cognitive load were successful in their mood control, while those under cognitive load achieved the opposite.

This suggests that if you are under some cognitive load (such as busy studying), and you want to improve your mood, you should try consciously to feel worse. Also, if you are in a noisy and distracting environment, and want to sleep, you should try to stay awake.

Another experiment showed that people who try to avoid sexist language become ironically more prone to sexist language when under cognitive load. This is true no matter if they are sexist or not.

Chap 23. Implicit Association Test

In Single-target implicit association tests (ST-IAT) predict voting behavior of decided and undecided voters in swiss referendums (Raccuia, 2016), compared to self-reported political orientation, implicit association was found to be a weaker, but somewhat independent, predictor of voting behavior.

Other similar methods to probe the unconsciousness are studied, and the results are new and mixed.

Chap 24. Prospect theory

People don't behave as expectation-maximizers. Instead they are better modelled by prospect theory:

1. Gains and losses are measured compared to a changeable default, instead of an absolute zero.
2. Losses are weighted more than gains, and both have decreasing marginal utilities.
3. People are more risk-averse with respect to gains, and more risk-loving with respect to losses.

An experiment The systematic influence of gain-and loss-framed messages on interest in and use of different types of health behavior (Rothman et al, 1999). It was found that people used more bacteria-killing mouth wash, if they received positive advertising (about maintaining good health). They used more disclosing mouth wash (which merely detects dental diseases) if they received negative advertising (about the potential disease).

This theory, along with some others, is explained in great detail in Thinking, Fast and Slow (Kahnemann, 2011), which I recommend.

Other mental heuristics include mental accounting (Thaler, 1980), with its own set of irrational effects.

Chap 25. Social isolation increases aggression

If you can't join them, beat them: Effects of social exclusion on aggressive behavior (Twenge, 2001)

Social exclusion was manipulated by telling people that they would end up alone later in life or that other participants had rejected them. These manipulations caused participants to behave more aggressively. Excluded people issued a more negative job evaluation against someone who insulted them, blasted a target with higher levels of aversive noise both when the target had insulted them and when no interaction had occurred. However, excluded people were not more aggressive toward someone who issued praise.

In particular,

These responses were specific to social exclusion and were not mediated by emotion.

This was shown by two experimental facts:

1. Participants who were told they would end up alone later in life or that other participants had rejected them, did not feel worse than average.

2. Participants who were told they would end up unlucky later in life, did not act more aggressively than average.

Some psychological theories are given. One is self-determination theory from Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being (Deci and Ryan, 2000), which says that people have three needs:

• relatedness (to some other people)
• efficacy (can do important things)
• autonomy (can control their own future)

Other relevant factors are self-esteem, and stability over time. Stability and level of self-esteem as predictors of anger arousal and hostility (Kernis et al, 1989) found that in feelings of anger and hostility,

unstable high self-esteem > low self-esteem > stable high self-esteem

There is no evolutionary explanation, though. Social exclusion causes fewer offsprings, and aggression only worsens it. An evolutionary psychological explanation would be good. Either it has evolutionary benefit, or it is a side effect of something else.

Chap 26. Social effects of gossiping

Gossip is found to have a prosocial function. The virtues of gossip: Reputational information sharing as prosocial behavior (Feinberg, 2012)

... prosocial gossip, the sharing of negative evaluative information about a target in a way that protects others from antisocial or exploitative behavior.

In the study, they found experimental support for four hypotheses about the function of gossip:

• prosocial: gossip is motivated by a desire to protect vulnerable people, without promise of material reward.
• frustration: seeing antisocial behavior makes people feel bad, which . Prosocial people are more prone to this frustration.
• relief: gossiping reduces the frustration.
• deterrence: threat of gossip makes antisocial people behave more prosocially.
Chap 27. Fear of death

Good news: we will be worm food one day!

Good news for worms, I meant.

Terror management theory argues that the terror of death creates such a profound, subconscious, anxiety, that humans spend their lives denying it in various ways, creating culture, religion, and many other social phenomena in the process.

In this chapter are reviewed the first 4 of the 7 experiments from How sweet it is to be loved by you: the role of perceived regard in the terror management of close relationships (CR Cox, J Arndt, 2012). This paper studies

... whether people turn to close relationships to manage the awareness of mortality because they serve as a source of perceived regard.

Perceived regard means "am I a good person as viewed by someone else?" The paper in particular showed that people who have death on their mind exaggerate how much they think they are loved by a partner. Perceived regard from their own selves, and from average strangers, did not change. Having intense physical pain on the mind also did nothing.

They also found that having death on the mind makes people claim to love their partners more. They theorized that this is mediated by increased perceived regard:

death on the mind -> more perceived regard from their partner -> more love for their partner

Study 4 revealed that activating thoughts of perceived regard from a partner in response to MS reduced death-thought accessibility. Studies 5 and 6 demonstrated that MS led high relationship contingent self-esteem individuals to exaggerate perceived regard from a partner, and this heightened regard led to greater commitment to one's partner. Study 7 examined attachment style differences and found that after MS, anxious individuals exaggerated how positively their parents see them, whereas secure individuals exaggerated how positively their romantic partners see them. Together, the present results suggest that perceptions of regard play an important role in why people pursue close relationships in the face of existential concerns.

Personal comment: It has been commented that Transhumanism can be analyzed as a religion. Is there value in analyzing transhumanism through terror management theory? There is at least one paper, Software immortals: Science or faith? (Proudfoot, 2012), that did so. This is important, because if transhumanism is indeed a religion, then the chance is high that it is deluded/unfalsifiable, like most religions have been shown to be.

Also, this would explain why moral nihilism is usually suffered as a mental disease than accepted as a working hypothesis. Despite its theoretical simplicity and moderate empirical support, it just doesn't offer any protection against terror of death.

Chap 28. Motivated belief in free will

Free to punish: A motivated account of free will belief (Clark, 2014)

a key factor promoting belief in free will is a fundamental desire to hold others morally responsible for their wrongful behaviors

Five experiments from the paper are recounted in detail. The authors praised the paper highly for its comprehensiveness.

participants reported greater belief in free will after considering an immoral action than a morally neutral one... due to heightened punitive motivations... reading about others’ immoral behaviors reduced the perceived merit of anti-free-will research... the real-world prevalence of immoral behavior (as measured by crime and homicide rates) predicted free will belief on a country level.

Taken together, these results provide a potential explanation for the strength and prevalence of belief in free will: It is functional for holding others morally responsible and facilitates justifiably punishing harmful members of society.

Personal comment: Instead of philosophically studying whether free will exists, it's more productive to assume it doesn't exist, and see what behaviors can be explained. If everything can be explained without free will, then the problem of free will dissolves. Else, we will have concentrated what free will is for, and made subsequent studies more focused.

It is also useful to study the human intuitive belief in free will, as important phenomena about humans, independent of whether they are right or wrong. This is analogous to the study of folk psychology and naive physics. See From Uncaused Will to Conscious Choice: The Need to Study, Not Speculate About People’s Folk Concept of Free Will (Monroe, 2009)

the core of people’s concept of free will is a choice that fulfills one’s desires and is free from internal or external constraints. No evidence was found for metaphysical assumptions about dualism or indeterminism.

In the "Afterthoughts", the authors considered what a post-free-will society could be like. I think that such a society's theory of crime and punishment would be more like "because this follows the natural order of things", than "because criminals are morally bad".

Think of the joke about "my brain made me commit the crime"

The criminal: "My brain made me commit the crime." The judge: "My brain made me sentence you."

And now, instead of taking it as a joke, imagine both of them saying them very seriously. That's what I think could be true in the future.

Discuss

### The Univariate Fallacy

Новости LessWrong.com - 16 июня, 2019 - 00:43
Published on June 15, 2019 9:43 PM UTC

(A standalone math post that I want to be able to link back to later/elsewhere)

There's this statistical phenomenon where it's possible for two multivariate distributions to overlap along any one variable, but be cleanly separable when you look at the entire configuration space at once. This is perhaps easiest to see with an illustrative diagram—

The denial of this possibility (in arguments of the form, "the distributions overlap along this variable, therefore you can't say that they're different") is sometimes called the "univariate fallacy." (Eliezer Yudkowsky proposes "covariance denial fallacy" or "cluster erasure fallacy" as potential alternative names.)

Let's make this more concrete by making up an example with actual numbers instead of just a pretty diagram. Imagine we have some datapoints that live in the forty-dimensional space {1, 2, 3, 4}⁴⁰ that are sampled from one of two probability distibutions, which we'll call PA and PB.

For simplicity, let's suppose that the individual variables x₁, x₂, ... x₄₀—the coördinates of a point in our forty-dimensional space—are statistically independent. For every individual xi, the marginal distribution of PA is—

PA(xi)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩1/4xi=17/16xi=21/4xi=31/16xi=4

And for PB—

PB(xi)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩1/16xi=11/4xi=27/16xi=31/4xi=4

If you look at any one xi-coördinate for a point, you can't be confident which distribution the point was sampled from. For example, seeing that x₁ takes the value 2 gives you a 7/4 (= 1.75) likelihood ratio in favor of that the point having been sampled from PA rather than PB, which is log₂(7/4) ≈ 0.807 bits of evidence.

That's ... not a whole lot of evidence. If you guessed that the datapoint came from PA based on that much evidence, you'd be wrong about 4 times out of 10. (Given equal (1:1) prior odds, an odds ratio of 7:4 amounts to a probability of (7/4)/(1 + 7/4) ≈ 0.636.)

And yet if we look at many variables, we can achieve supreme, godlike confidence about which distribution a point was sampled from. Proving this is left as an exercise to the particularly intrepid reader, but a concrete demonstration is probably simpler and should be pretty convincing! Let's write some Python code to sample a point x⃗ ∈ {1, 2, 3, 4}⁴⁰ from PA—

import random def a(): return random.sample( [1]*4 + # 1/4 [2]*7 + # 7/16 [3]*4 + # 1/4 [4], # 1/16 1 )[0] x = [a() for _ in range(40)] print(x)

Go ahead and run the code yourself. (With an online REPL if you don't have Python installed locally.) You'll probably get a value of x that "looks something like"

[2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 4, 4, 2, 2, 3, 3, 1, 2, 2, 2, 4, 2, 2, 1, 2, 1, 4, 3, 3, 2, 1, 1, 3, 3, 2, 2, 3, 3, 4]

If someone off the street just handed you this x⃗ without telling you whether she got it from PA or PB, how would you compute the probability that it came from PA?

Well, because the coördinates/variables are statistically independent, you can just tally up (multiply) the individual likelihood ratios from each variable. That's only a little bit more code—

import logging logging.basicConfig(level=logging.INFO) def odds_to_probability(o): return o/(1+o) def tally_likelihoods(x, p_a, p_b): total_odds = 1 for i, x_i in enumerate(x, start=1): lr = p_a[x_i-1]/p_b[x_i-1] # (-1s because of zero-based array indexing) logging.info("x_%s = %s, likelihood ratio is %s", i, x_i, lr) total_odds *= lr return total_odds print( odds_to_probability( tally_likelihoods( x, [1/4, 7/16, 1/4, 1/16], [1/16, 1/4, 7/16, 1/4] ) ) )

If you run that code, you'll probably see "something like" this—

INFO:root:x_1 = 2, likelihood ratio is 1.75 INFO:root:x_2 = 1, likelihood ratio is 4.0 INFO:root:x_3 = 2, likelihood ratio is 1.75 INFO:root:x_4 = 2, likelihood ratio is 1.75 INFO:root:x_5 = 1, likelihood ratio is 4.0 [blah blah, redacting some lines to save vertical space in the blog post, blah blah] INFO:root:x_37 = 2, likelihood ratio is 1.75 INFO:root:x_38 = 3, likelihood ratio is 0.5714285714285714 INFO:root:x_39 = 3, likelihood ratio is 0.5714285714285714 INFO:root:x_40 = 4, likelihood ratio is 0.25 0.9999936561215961

Our computed probability that x⃗ came from PA has several nines in it. Wow! That's pretty confident!

Discuss

### On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory

Новости LessWrong.com - 15 июня, 2019 - 21:57
Published on June 15, 2019 6:57 PM UTC

Discuss