# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 28 минут 13 секунд назад

### The Echo Fallacy

6 июля, 2020 - 02:00
Published on July 5, 2020 11:00 PM GMT

One of my Facebook friend posted this. Sharing anonymously with permission.

The Echo Fallacy: When you shout "hello!" in a cave, and upon hearing the echo, conclude the cave walls are saying "hello" to you. More generally, when you put a certain idea into your environment, have it reflected back to you, and conclude that the reflection shows the idea originating independently from yourself.

Some examples:

* What Scott Adams calls "laundry list persuasion". This is when you say: "Maybe bigfoot videos are blurry and questionable, but there are so many of them! Why would there possibly be so many unless bigfoot is real?" Well, the answer is simple: bigfoot believers promote the idea of bigfoot, which prompts people to create fake bigfoot videos. The volume of bigfoot videos is merely a reflection of the belief in bigfoot. Similar logic applies to UFO videos, examples of pizza shops with spirals in their logos as evidence of a conspiracy, and the compilation of grievances against Jews over centuries. This is also related to apophenia, and the streetlight effect.

* When you bully someone about a quality they supposedly have, until they retort, sarcastically: "Yes, I totally have that quality! I'm just the biggest possessor of that quality on the face of the planet, dingus! Now leave me alone!" Whereupon you say they've admitted to it.

* Saying: "That guy over there loves <group>! Hey, everyone in <group>, go be friends with that guy!" Then, when some members of the group believe you and flock to that guy, saying: "See? I told you he liked them."

(I'm probably coming up with my own name for a concept that someone else has already given a name. If so, help me out.)

Discuss

### Better priors as a safety problem

6 июля, 2020 - 00:20
Published on July 5, 2020 9:20 PM GMT

Fitting a neural net implicitly uses a “wrong” prior. This makes neural nets more data hungry and makes them generalize in ways we don’t endorse, but it’s not clear whether it’s an alignment problem.

After all, if neural nets are what works, then both the aligned and unaligned AIs will be using them. It’s not clear if that systematically disadvantages aligned AI.

Unfortunately I think it’s an alignment problem:

• I think the neural net prior may work better for agents with certain kinds of simple goals, as described in Inaccessible Information. The problem is that the prior mismatch may bite harder for some kinds of questions, and some agents simply never need to answer those hard questions.
• I think that Solomonoff induction generalizes catastrophically because it becomes dominated by consequentialists who use better priors.

In this post I want to try to build some intuition for this problem, and then explain why I’m currently feeling excited about learning the right prior.

Indirect specifications in universal priors

We usually work with very broad “universal” priors, both in theory (e.g. Solomonoff induction) and in practice (deep neural nets are a very broad hypothesis class). For simplicity I’ll talk about the theoretical setting in this section, but I think the points apply equally well in practice.

The classic universal prior is a random output from a random stochastic program. We often think of the question “which universal prior should we use?” as equivalent to the question “which programming language should we use?” but I think that’s a loaded way of thinking about it — not all universal priors are defined by picking a random program.

A universal prior can never be too wrong — a prior P is universal if, for any other computable prior Q, there is some constant c such that, for all x, we have P(x) > c Q(x). That means that given enough data, any two universal priors will always converge to the same conclusions, and no computable prior will do much better than them.

Unfortunately, universality is much less helpful in the finite data regime. The first warning sign is that our “real” beliefs about the situation can appear in the prior in two different ways:

• Directly: if our beliefs about the world are described by a simple computable predictor, they are guaranteed to appear in a universal prior with significant weight.
• Indirectly: the universal prior also “contains” other programs that are themselves acting as priors. For example, suppose I use a universal prior with a terribly inefficient programming language, in which each character needed to be repeated 10 times in order for the program to do anything non-trivial. This prior is still universal, but it’s reasonably likely that the “best” explanation for some data will be to first sample a really simple interpret for a better programming language, and then draw a uniformly randomly program in that better programming language.

(There isn’t a bright line between these two kinds of posterior, but I think it’s extremely helpful for thinking intuitively about what’s going on.)

Our “real” belief is more like the direct model — we believe that the universe is a lawful and simple place, not that the universe is a hypothesis of some agent trying to solve a prediction problem.

Unfortunately, for realistic sequences and conventional universal priors, I think that indirect models are going to dominate. The problem is that “draw a random program” isn’t actually a very good prior, even if the programming language is OK— if I were an intelligent agent, even if I knew nothing about the particular world I lived in, I could do a lot of a priori reasoning to arrive at a much better prior.

The conceptually simplest example is “I think therefore I am.” Our hypotheses about the world aren’t just arbitrary programs that produce our sense experiences— we restrict attention to hypotheses that explain why we exist and for which it matters what we do. This rules out the overwhelming majority of programs, allowing us to assign significantly higher prior probability to the real world.

I can get other advantages from a priori reasoning, though they are a little bit more slippery to talk about. For example, I can think about what kinds of specifications make sense and really are most likely a priori, rather than using an arbitrary programming language.

The upshot is that an agent who is trying to do something, and has enough time to think, actually seems to implement a much better prior than a uniformly random program. If the complexity of specifying such an agent is small relative to the prior improbability of the sequence we are trying to predict, then I think the universal prior is likely to pick out the sequence indirectly by going through the agent (or else in some even weirder way).

I make this argument in the case of Solomonoff induction in What does the universal prior actually look like? I find that argument pretty convincing, although Solomonoff induction is weird enough that I expect most people to bounce off that post.

I make this argument in a much more realistic setting in Inaccessible Information. There I argue that if we e.g. use a universal prior to try to produce answers to informal questions in natural language, we are very likely to get an indirect specification via an agent who reasons about how we use language.

Why is this a problem?

I’ve argued that the universal prior learns about the world indirectly, by first learning a new better prior. Is that a problem?

To understand how the universal prior generalizes, we now need to think about how the learned prior generalizes.

The learned prior is itself a program that reasons about the world. In both of the cases above (Solomonoff induction and neural nets) I’ve argued that the simplest good priors will be goal-directed, i.e. will be trying to produce good predictions.

I have two different concerns with this situation, both of which I consider serious:

• Bad generalizations may disadvantage aligned agents. The simplest version of “good predictions” may not generalize to some of the questions we care about, and may put us at a disadvantage relative to agents who only care about simpler questions. (See Inaccessible Information.)
• Treacherous behavior. Some goals might be easier to specify than others, and a wide range of goals may converge instrumentally to “make good predictions.” In this case, the simplest programs that predict well might be trying to do something totally unrelated, when they no longer have instrumental reasons to predict well (e.g. when their predictions can no longer be checked) they may do something we regard as catastrophic.

I think it’s unclear how serious these problems are in practice. But I think they are huge obstructions from a theoretical perspective, and I think there is a reasonable chance that this will bite us in practice. Even if they aren’t critical in practice, I think that it’s methodologically worthwhile to try to find a good scalable solution to alignment, rather than having a solution that’s contingent on unknown empirical features of future AI.

Learning a competitive prior

Fundamentally, I think our mistake was building a system that uses the wrong universal prior, one that fails to really capture our beliefs. Within that prior, there are other agents who use a better prior, and those agents are able to outcompete and essentially take over the whole system.

I’ve considered lots of approaches that try to work around this difficulty, taking for granted that we won’t have the right prior and trying to somehow work around the risky consequences. But now I’m most excited about the direct approach: give our original system the right prior so that sub-agents won’t be able to outcompete it.

This roughly tracks what’s going on in our real beliefs, and why it seems absurd to us to infer that the world is a dream of a rational agent—why think that the agent will assign higher probability to the real world than the “right” prior? (The simulation argument is actually quite subtle, but I think that after all the dust clears this intuition is basically right.)

What’s really important here is that our system uses a prior which is competitive, as evaluated by our real, endorsed (inaccessible) prior. A neural net will never be using the “real” prior, since it’s built on a towering stack of imperfect approximations and is computationally bounded. But it still makes sense to ask for it to be “as good as possible” given the limitations of its learning process — we want to avoid the situation where the neural net is able to learn a new prior which predictably to outperforms the outer prior. In that situation we can’t just blame the neural net, since it’s demonstrated that it’s able to learn something better.

In general, I think that competitiveness is a desirable way to achieve stability — using a suboptimal system is inherently unstable, since it’s easy to slip off of the desired equilibrium to a more efficient alternative. Using the wrong prior is just one example of that. You can try to avoid slipping off to a worse equilibrium, but you’ll always be fighting an uphill struggle.

Given that I think that finding the right universal prior should be “plan A.” The real question is whether that’s tractable. My current view is that it looks plausible enough (see Learning the prior for my current best guess about how to approach it) that it’s reasonable to focus on for now.

Better priors as a safety problem was originally published in AI Alignment on Medium, where people are continuing the conversation by highlighting and responding to this story.

Discuss

### Measuring Meta-Certainty

6 июля, 2020 - 00:15
Published on July 5, 2020 9:15 PM GMT

If you are a regular user of this site then you're probably a proponent of the probabilistic model of certainty. You might even use sites like Metaculus to record your own degrees of certainty on various topics for later reference. What I don't see is people measuring their meta-certainty.

Meta-certainty exists

I think it's pretty obvious that such a thing as "meta-certainty" exists. When I predict there is a one in six chance of my dice rolling a five, but I also predict there is a one in six chance of world war three happening before 2050, that doesn't automatically mean that the two predictions are equal. I feel more certain that I guessed the actual probability of the dice roll correctly, than I feel about my probability estimate of world war three happening. In other words: my meta-certainty of the dice roll is higher.

The problem is that I find it much harder to figure out my meta-certainty estimate than my certainty estimate. This might be because human beings are inherently bad at guessing their own meta-certainty, or it might be because I have never trained myself to reflect on my meta-certainty in the same way that I've trained myself to reflect on my regular certainty.

So why should we care about meta-certainty? Well the most obvious answer is science. By measuring meta-certainty we could learn more about the human brain, how humans learn, how we reflect on our own thoughts etc.

But maybe non-psychologists should be interested in this too. If you have a concrete model of how certain you are about your certainty, you could more reliably decide when and where to search for more evidence. My meta-certainty about X is very low, so maybe there are some low-hanging fruits of data that might quickly change that. I said that I was very meta-confident about X, but that is contingent on Y being true which I'm not very meta-confident about. Did I make a mistake or am I missing something?

I think it could also show us some more biases. I'm willing to bet that people are more meta-confident about their political beliefs, but I'm not sure what other domains my brain is meta-overconfident about. This could also help us in heuristics research.

It's really hard to measure this

My friends tell me that putting a percentage on their certainty is hard/ridiculous. I've always found it doable and important but my endeavor to do the same with my meta-certainty has certainly made me sympathize with my friends more. Maybe this is actually a part of certainty that is too hard for us to intuitively put an accurate percentage on. You can tell me in the comments if you don't find it more difficult, but I suspect most will agree with me. I see less reason for why evolution would select for creatures that know their own meta-certainty compared to creatures that would know their object-level certainty. But even if it is more difficult we can quantify the differences in a more indirect way. I've tried to use words like "almost certain", "very likely", "likely", "more likely than not" etc to discover a posteriori what the actual probabilities of my intuitions are.

I unfortunately can't share any insights yet since I only started doing this recently and have been doing it pretty inconsistently. If sites like Metaculus gave the option to always register your meta-certainty, it would help people record it and would quickly give us large swaths of data to compare. I think most people would start out creating a nice bell-curve with your certainty on one axis and your meta-certainty one the other, but who knows, maybe it will turn out that meta-certainty is actually asymmetric for some reason.

Figuring out what degree of certainty was "correct" for a situation is very very hard and requires a lot of (a posteriori) data. Figuring out the "correct" degree of meta-certainty will probably take even longer. I think that even if we get really good at measuring meta-certainty, it won't ever be as good as the object-level certainty. But even in a rough version (with e.g steps of 10% instead of 1%) we could gain some interesting insights into our psyche.

So does meta-meta-certainty exist? Sure! When I'm drunk I might think to myself that I should be more uncertain about my meta-certainty compared to what my sober self would say. When I know I'm cognitively impaired I would give myself a lower meta-meta-certainty. The problem is that meta-meta-certainty might bleed into the lower levels.

I think that measuring meta-certainty is less useful than measuring object-level certainty, but still ultimately worth it in small amounts (e.g measuring it more roughly). I think meta-meta-uncertainy is even less useful and might not even be worth measuring unless you're a die-hard psychologist. This process of adding levels of meta has diminishing returns not only in terms of usefulness, but also in terms of accuracy. If evolution is not particularly interested in selecting for accurate meta-certainty then I think that meta-meta-meta-cetainty is basically impossible.

Conclusion

While measuring meta-certainty can help us discover more biases and help us make better predictions, it is ultimately less important than measuring regular certainty. Having a rough framework of your own meta-certainty might be useful, but I can't confidently say the same about any meta-levels above it. I would like websites like Metaculus to add the option of recording your meta-certainty, but steps of ten (0%-10%-20%...) might be enough if they want to conserve bandwidth. I've not talked about how this fits in with Artificial Intelligence since it isn't my area of expertise (but feel free to make a post about it).

Discuss

### Learning the prior

6 июля, 2020 - 00:00
Published on July 5, 2020 9:00 PM GMT

Suppose that I have a dataset D of observed (x, y) pairs, and I’m interested in predicting the label y* for each point x* in some new set D*. Perhaps D is a set of forecasts from the last few years, and D* is a set of questions about the coming years that are important for planning.

The classic deep learning approach is to fit a model f on D, and then predict y* using f(x*).

This approach implicitly uses a somewhat strange prior, which depends on exactly how I optimize f. I may end up with the model with the smallest l2 norm, or the model that’s easiest to find with SGD, or the model that’s most robust to dropout. But none of these are anywhere close to the “ideal” beliefs of a human who has updated on D.

This means that neural nets are unnecessarily data hungry, and more importantly that they can generalize in an undesirable way. I now think that this is a safety problem, so I want to try to attack it head on by learning the “right” prior, rather than attempting to use neural nets as an implicit prior.

Warm-up 1: human forecasting

If D and D* are small enough, and I’m OK with human-level forecasts, then I don’t need ML at all.

Instead I can hire a human to look at all the data in D, learn all the relevant lessons from it, and then spend some time forecasting y* for each x*.

Now let’s gradually relax those assumptions.

Warm-up 2: predicting human forecasts

Suppose that D* is large but that D is still small enough that a human can extract all the relevant lessons from it (or that for each x* in D*, there is a small subset of D that is relevant).

In this case, I can pay humans to make forecasts for many randomly chosen x* in D*, train a model f to predict those forecasts, and then use f to make forecasts about the rest of D*.

The generalization is now coming entirely from human beliefs, not from the structural of the neural net — we are only applying neural nets to iid samples from D*.

Learning the human prior

Now suppose that D is large, such that a human can’t update on it themselves. Perhaps D contains billions of examples, but we only have time to let a human read a few pages of background material.

Instead of learning the unconditional human forecast P(y|x), we will learn the forecast P(y|x, Z), where Z is a few pages of background material that the human takes as given. We can also query the human for the prior probability Prior(Z) that the background material is true.

Then we can train f(y|x, Z) to match P(y|x, Z), and optimize Z* for:

log Prior(Z*) + sum((x, y) ~ D) log f(y|x, Z*)

We train f in parallel with optimizing Z*, on inputs consisting of the current value of Z* together with questions x sampled from D and D*.

For example, Z might specify a few explicit models for forecasting and trend extrapolation, a few important background assumptions, and guesses for a wide range of empirical parameters. Then a human who reads Z can evaluate how plausible it is on its face, or they can take it on faith in order to predict y* given x*.

The optimal Z* is then the set of assumptions, models, and empirical estimates that works best on the historical data. The human never has to reason about more than one datapoint at a time — they just have to evaluate what Z* implies about each datapoint in isolation, and evaluate how plausible Z* is a priori.

This approach has many problems. Two particularly important ones:

• To be competitive, this optimization problem needs to be nearly as easy as optimizing f directly on D, but it seems harder: finding Z* might be much harder than learning f, learning a conditional f might be much harder than learning an unconditional f, and jointly optimizing Z and f might present further difficulties.
• Even if it worked our forecasts would only be “human-level” in a fairly restrictive sense — they wouldn’t even be as good as a human who actually spent years practicing on D before making a forecast on D*. To be competitive, we want the forecasts in the iid case to be at least as good as fitting a model directly.

I think the first point is an interesting ML research problem. (If anything resembling this approach ever works in practice, credit will rightly go to the researchers who figure out the precise version that works and resolve those issues, and this blog post will be a footnote.) I feel relatively optimistic about our collective ability to solve concrete ML problems, unless they turn out to be impossible. I’ll give some preliminary thoughts in the next section “Notes & elaborations.”

The second concern, that we need some way to go beyond human level, is a central philosophical issue and I’ll return to it in the subsequent section “Going beyond the human prior.”

Notes & elaborations
• Searching over long texts may be extremely difficult. One idea to avoid this is to try to have a human guide the search, by either generating hypotheses Z at random or sampling perturbations to the current value of Z. Then we can fit a generative model of that exploration process and perform search in the latent space (and also fit f in the latent space rather than having it take Z as input). That rests on two hopes: (i) learning the exploration model is easy relative to the other optimization we are doing, (ii) searching for Z in the latent space of the human exploration process is strictly easier than the corresponding search over neural nets. Both of those seem quite plausible to me.
• We don’t necessarily need to learn f everywhere, it only needs to be valid in a small neighborhood of the current Z. That may not be much harder than learning the unconditional f.
• Z represents a full posterior rather than a deterministic “hypothesis” about the world, e.g. it might say “R0 is uniform between 2 and 3.” What I’m calling Prior(Z) is really the KL between the prior and Z, and P(y|x,Z) will itself reflect the uncertainty in Z. The motivation is that we want a flexible and learnable posterior. (This is particularly valuable once we go beyond human level.)
• This formulation queries the human for Prior(Z) before each fitness evaluation. That might be fine, or you might need to learn a predictor of that judgment. It might be easier for a human to report a ratio Prior(Z)/Prior(Z′) than to give an absolute prior probability, but that’s also fine for optimization. I think there are a lot of difficulties of this flavor that are similar to other efforts to learn from humans.
• For the purpose of studying the ML optimization difficulties I think we can basically treat the human as an oracle for a reasonable prior. We will then need to relax that rationality assumption in the same way we do for other instances of learning from humans (though a lot of the work will also be done by our efforts to go beyond the human prior, described in the next section).
Going beyond the human prior

How do we get predictions better than explicit human reasoning?

We need to have a richer latent space Z, a better Prior(Z), and a better conditional P(y|x, Z).

Instead of having a human predict y given x and Z, we can use amplification or debate to train f(y|x, Z) and Prior(Z). This allows Z to be a large object that cannot be directly accessed by a human.

For example, Z might be a full library of books describing important facts about the world, heuristics, and so on. Then we may have two powerful models debating “What should we predict about x, assuming that everything in Z is true?” Over the course of that debate they can cite small components of Z to help make their case, without the human needing to understand almost anything written in Z.

In order to make this approach work, we need to do a lot of things:

1. We still need to deal with all the ML difficulties described in the preceding section.
2. We still need to analyze debate/amplification, and now we’ve increased the problem difficulty slightly. Rather than merely requiring them to produce the “right” answers to questions, we also need them to implement the “right” prior. We already needed to implement the right prior as part of answering questions correctly, so this isn’t too much of a strengthening, but we are calling attention to a particularly challenging case. It also imposes a particular structure on that reasoning which is a real (but hopefully slight) strengthening.
3. Entangled with the new analysis of amplification/debate, we also need to ensure that Z is able to represent a rich enough latent space. I’ll discuss implicit representations of Z in the next section “Representing Z.”
4. Representing Z implicitly and using amplification or debate may make the optimization problem even more difficult. I’ll discuss this in the subsequent section “Jointly optimizing Mz and f.”
Representing Z

I’ve described Z as being a giant string of text. If debate/amplification work at all then I think text is in some sense “universal,” so this isn’t a crazy restriction.

That said, representing complex beliefs might require very long text, perhaps many orders of magnitude larger than the model f itself. That means that optimizing for (Z, f) jointly will be much harder than optimizing for f alone.

The approach I’m most optimistic about is representing Z implicitly as the output of another model Mz. For example, if Z is a text that is trillions of words long, you could have Mz output the ith word of Z on input i.

(To be really efficient you’ll need to share parameters between f and Mz but that’s not the hard part.)

This can get around the most obvious problem — that Z is too long to possibly write down in its entirety — but I think you actually have to be pretty careful about the implicit representation or else we will make Mz’s job too hard (in a way that will be tied up the competitiveness of debate/amplification).

In particular, I think that representing Z as implicit flat text is unlikely to be workable. I’m more optimistic about the kind of approach described in approval-maximizing representations — Z is a complex object that can be related to slightly simpler objects, which can themselves be related to slightly simpler objects… until eventually bottoming out with something simple enough to be read directly by a human. Then Mz implicitly represents Z as an exponentially large tree, and only needs to be able to do one step of unpacking at a time.

Jointly optimizing Mz and f

In the first section I discussed a model where we learn f(y|x, Z) and then use it to optimize Z. This is harder if Z is represented implicitly by Mz, since we can’t really afford to let f take Mz as input.

I think the most promising approach is to have Mz and f both operate on a compact latent space, and perform optimization in this space. I mention that idea in Notes & Elaborations above, but want to go into more detail now since it gets a little more complicated and becomes a more central part of the proposal.

(There are other plausible approaches to this problem; having more angles of attack makes me feel more comfortable with the problem, but all of the others feel less promising to me and I wanted to keep this blog post a bit shorter.)

The main idea is that rather than training a model Mz(·) which implicitly represents Z, we train a model Mz(·, z) which implicitly represents a distribution over Z, parameterized by a compact latent z.

Mz is trained by iterated amplification to imitate a superhuman exploration distribution, analogous to the way that we could ask a human to sample Z and then train a generative model of the human’s hypothesis-generation. Training Mz this way is itself an open ML problem, similar to the ML problem of making iterated amplification work for question-answering.

Now we can train f(y|x, z) using amplification or debate. Whenever we would want to reference Z, we use Mz(·, z). Similarly, we can train Prior(z). Then we choose z* to optimize log Prior(z*) + sum((x, y) ~ D) log f(y|x, z*).

Rather than ending up with a human-comprehensible posterior Z*, we’ll end up with a compact latent z*. The human-comprehensible posterior Z* is implemented implicitly by Mz(·, z*).

Outlook

I think the approach in this post can potentially resolve the issue described in Inaccessible Information, which I think is one of the largest remaining conceptual obstacles for amplification/debate. So overall I feel very excited about it.

Taking this approach means that amplification/debate need to meet a slightly higher bar than they otherwise would, and introduces a bit of extra philosophical difficulty. It remains to be seen whether amplification/debate will work at all, much less whether they can meet this higher bar. But overall I feel pretty excited about this outcome, since I was expecting to need a larger reworking of amplification/debate.

I think it’s still very possible that the approach in this post can’t work for fundamental philosophical reasons. I’m not saying this blog post is anywhere close to a convincing argument for feasibility.

Even if the approach in this post is conceptually sound, it involves several serious ML challenges. I don’t see any reason those challenges should be impossible, so I feel pretty good about that — it always seems like good news when you can move from philosophical difficulty to technical difficulty. That said, it’s still quite possible that one of these technical issues will be a fundamental deal-breaker for competitiveness.

My current view is that we don’t have candidate obstructions for amplification/debate as an approach to AI alignment, though we have a lot of work to do to actually flesh those out into a workable approach. This is a more optimistic place than I was at a month ago when I wrote Inaccessible Information.

Learning the prior was originally published in AI Alignment on Medium, where people are continuing the conversation by highlighting and responding to this story.

Discuss

### How far is AGI?

5 июля, 2020 - 20:58
Published on July 5, 2020 5:58 PM GMT

The ability to complete sequences is equivalent to prediction. The way GPT-3 completes sequences is it that it predicts what the next token will be and then it outputs the prediction. You can use the same model on images.

In general, the agent, based on all of its input data up to some point, tries to generate future data. If it can predict its own input data reliably that means it has a model of the world which is similar to reality. This is similar to Solomonoff induction.

Once you have a good approximation of Solomonoff induction (which is uncomputable), you combine the approximation (somehow) with reinforcement learning and expected utility maximization and get an approximation of AIXI.

Since I'm not an expert in reinforcement learning I'm not sure which part is harder, but intuition tells me the hard part of all of this would be approximating Solomonoff induction, and once you have a good world-model, it seems to me it's relatively straightforward to maximize utility. I hope I'm wrong. (if you think I am please explain why)

Discuss

### How do you visualize the Poisson PDF?

5 июля, 2020 - 18:54
Published on July 5, 2020 3:54 PM GMT

As people have been recommending visualization as a mnemonic technique here, I am curious how you utilize it for a formula like this.

I can encode it using metaphors: The lambda, which represents the rate, is a working factory machine. k, which is the number of incidents, is a pack of spheres. The power relation (lambda^k) is the spheres supplying electricity to the machine. e is a superhero I have named Neper. The negative power is "red electricity." The division I symbolize by going under the ground, and the factorial by a glitchy duplication effect. This visualization works, but it takes quite a bit of time for me to decode it into the original formula, and is not directly meaningful to me in its encoded form.

Discuss

### July 5: Anders Sandberg on Information Hazards (SSC Online Meetup)

5 июля, 2020 - 11:46
Published on July 5, 2020 8:46 AM GMT

Dr Anders Sandberg, Senior Research Fellow at the Future of Humanity Institute, will speak on "Information Hazards: How do we think about (and handle) risky information."

We will also allow plenty of time for online mingling and discussion.

Click here to register, up to an hour before the talk, and we'll send you an invitation:

The talk is July 5 at 10:30 PDT, 17:30 GMT. This link shows your local time .

- The classic article by Nick Bostrom
- Anders Sandberg on biosecurity

Discuss

### Could Nixon going to China be a cause for the big stagnation?

5 июля, 2020 - 09:58
Published on July 5, 2020 6:58 AM GMT

The big stagnation is commonly said to have begun in the 1970s. There are many theories of possible causes that are already discussed, but haven't heard blaming Nixon before. Especially I haven't heard blaming Nixon going to China.

Timewise, that makes Nixon going to China in 1972 is an event that happened at the right time to explain the change in innovation.

Maybe it's crucial for innovation to have basic economic production to be done by skilled labor which can think of better ways to do it instead of outsourcing it to the other side of the globe?

Discuss

### Spoiler-Free Review: Witcher 3: Wild Hunt (plus a Spoilerific section)

5 июля, 2020 - 02:10
Published on July 4, 2020 11:10 PM GMT

Contrast with other recent spoiler-free review: Spoiler-Free Review: Assassin’s Creed Odyssey.

In part one of this review, the true spoiler-free section, I seek to answer the question “Should I Play This Game?” while giving the absolute minimum of information.

In part two, I provide some minimally-spoilerific advice on how to best enjoy the game.

In part three, I comment on a few things, some of which require major late-game spoilers.

Part 1 is Spoiler-Free: Should I Play This Game?

If you’re going to play any such games, this is a good choice. Witcher 3: Wild Hunt is widely considered one of the best games of all time. It is not hard to see why everyone loves it. It is ambitious as hell. It tells a huge set of varied stories, large and small, and concludes one big story, very well. It offers real choices with real consequences large and small. It drips with flavor. First rate stuff.

Witcher 3, like Assassin’s Creed Odssey, is a mostly excellent implementation of a gigantic long-lasting open-world action RPG with real time tempo-based combat set in a faux-medieval magical world. You’ll go on tons of main quests and side quests, gather gold, level up,

If you ask me in a Zen and the Art of Motorcycle Maitenence style Quality sense whether Witcher 3 (92 metacritic rating) is superior to Assassin’s Creed: Odyssey (83 metacritic rating) I would have to agree that this isn’t a contest. Witcher 3 wins.

But when I ask which game was a better experience, when I ask which game was more net fun, I think it went the other way. I had a better time in Ancient Greece than I did with Geralt of Rivia. The ending left a bad taste in my mouth, and the world just wasn’t as much fun to inhabit.

I therefore ended up giving both games the same rating of Tier 2: They are both definitely games that are Worth It, but not games one Must Play.

The best things about Witcher 3 are:

1. The game is truly massive. There’s tons of quests and tons of stuff to explore. If you ride off towards question marks or any interesting-looking structures on your map, there will be something to find. When they say the game has 200+ hours of content, that’s a real number.
2. The quests are highly varied. A lot of different things happen. Many quests have twists and end up being much bigger and stranger stories than they appear to be.
3. The story is well-written, voiced and acted, with lots of interesting characters. Can’t complain here, first rate stuff.
4. Your choices matter. They matter in small things, and also they matter in large things. Different endings, both to the game and to various quests along the way, are quite different.
5. Romance options are real and integrated. Many similar games I’ve played offer romance options or ways to get lucky, but such pursuits feel like checking off a box. In this game, it feels like something that matters. They also aren’t afraid to be an R-rated game, which all of us here appreciated.
6. Conclusion of a trilogy. If you’ve played the first two games, this pays off a bunch of that. If you haven’t, you’ll still be fine.
7. Gwent isn’t bad. The game has a collectible card game built inside of it. As its own animal, I’m rather skeptical of the game. It has a very large amount of ‘larger numbers win’ to it. But you can’t be pay to win when there’s no option to pay. As part of your journey, you’ll get gradually better cards, and it all plays quite well. You can also ignore the game if you wish.
8. Game is beautiful. It can’t hold a candle on this to Odyssey, both because it’s a few years older and because Ancient Greece is full of amazing sights while being generally bright and beautiful, while this world is darker in all senses. That doesn’t stop Witcher 3 from being rather great to look at.
9. No guardrails or auto-leveling. If you go the wrong way or do the wrong thing, it’s too high level. Tough. If you come back to something later, it’s easier. Makes sense. This wouldn’t have worked for Odyssey but it works here.
10. Lot of complexity available if you want it. You can get into alchemy and make a bunch of potions and oils, assemble the schematics for special gear and go on treasure hunts, and so on.

The worst things about Witcher 3 are:

1. The game is truly massive. If you’re looking to experience something quickly, this is not the game for you. Once you try the game out, you need to make a decision on whether it’s worth 100+ hours of your time. There’s no half measures here.
2. The quests still run together mechanically. While the stories involved vary as much as one could hope for given the format, there’s only so many things the game can really ask you to do mechanically, and you play this game for a very long time.
3. Your choices sometimes matter, but other times the choice I wanted to make wasn’t available. Either you aren’t given a choice at all, or neither option is the one I’d want. Choices made in Witcher 1 or 2 don’t matter much. Can all be frustrating.
4. Your choices matter a lot, but that includes some small choices that have counter-intuitively large impacts, or mandatory orderings of quests that can lock you out of key opportunities. You thus have to choose between spoilers or risking messing these up.
5. Decent amount of ‘hot spot gaming’ where the quest will wait for you to click in the right place or go the right route and even in hindsight it seems rather arbitrary. Similar amount of this to Odyssey.
6. World is dreary. It’s beautiful in its own way, I can’t argue with that, but looking at all the rain and dreariness, and the generally miserable people, is not where I’d ideally spend a ton of time.
7. Witcher combat is very feast or famine. If you find a pattern that works in a given fight, you’ll (eventually) win even if severely under-powered. If you don’t figure one out, you’ll have a very hard time. If you mess up, you take massive damage and often die. And while I won’t say what my build was, there does seem mostly to be a clearly correct one. So while it’s fun,
8. This is not a stealth or climbing game, while trying to play either or both on TV every so often. The segments involving faux-stealth are generally feel arbitrary and are pretty bad and frustrating. When Witcher tries to be Assassin’s Creed, it does not go well.
9. The path finding doesn’t realize fast travel spots are a thing unless you’re going to an entire different map, and you have to go to the fast travel spot to use fast travel. It’s cool for a while that it makes you figure things out, but after a while it’s not cool anymore.
10. A lot of the things you can do don’t end up having enough impact to be worth doing. Because of the nature of combat, the small edges one could get from going deep in various places end up not mattering much. And it’s basically impossible for the game to not get easier rather than harder over time in terms of its combat, even if you don’t do many side quests, so doing more work to get even stronger can seem counter-productive.

I chose for now not to do the expansions, Blood and Wine and Hearts of Stone. Feels like I’ve played a lot of this game. For those who have played those expansions, I’d appreciate your advice, based on this review, on whether I should play them.

Part 2 is Minimally Spoilorific: Good Advice for Aspiring Witchers

You can safely ignore oils and potions entirely. They don’t do very much and are annoying. This also means you don’t need to collect herbs. There are a small number of quests that explicitly require a potion and thus are exceptions, but that’s it.

Money doesn’t matter either. All you need is enough to repair your gear. There is nothing that costs real money that you ever need to buy. You’ll want a good saddlebag, and every Gwent card you can find if you want to play Gwent, and that’s about it. Gear upgrades can come entirely from chests and quests. I think I ended up buying two weapons and zero pieces of armor the whole way through.

This means you don’t need to worry about haggling on job prices, or collecting a bunch of junk to sell to merchants. Selling the swords off people you kill is already enough.

Your gear decays continuously, and every bit impacts your stats, so repair it whenever you get a chance.

The signpost is faster.

There is a moment when you understand how combat works, and a few moments when you get various aspects of timing in combat. If you’re behind on these moments combat is hard. If not it’s easy.

The basic principle of combat is to do a lot of light attacks, and dodge whenever you might get hit. Parries are basically useless because dodges are better. Meanwhile you use signs periodically. You attack, you roll so you’re dodging and not surrounded, you hit things until they’re dead. Each monster has a simple pattern, learn it, beat it. It’s fine, it can be satisfying, but overall I’m not a fan. It’s way too feast or famine, very much you either have it handled or you don’t, and difficult means having to go really slow more than anything else.

Default sign use is the shield.

Use the mind control sign against anyone who blocks your attacks enough to be annoying.

Doing the main quest line can cut off side quests, some of which can be a big deal. Do side quests first and put off your main quests as long as possible, whenever you are in doubt. Spend some time exploring the map as well. Talking to Triss in a way that starts her main quest line cuts various things off. Postpone this as well until you’re ready.

The game really, really should warn you about these more than it does – there’s one time that it does warn you that you’re passing a point of no return, but there are several others where it just doesn’t.

Make a decision now on whether to look up which decisions matter to the ending. Even if you don’t look them up explicitly, there’s one I’m going to give you now in as safe a way as I can: Bargain to take her out of there by offering information. It’s important, I screwed it up and I’m still mad about it.

As is almost always true, everyone saying time is of the essence means nothing. If you want the best outcome, never hesitate to pause and do a side quest. In fact doing so is necessary at one point if you want what I’d consider the best outcomes.

Don’t feel the need to do all the side quests, but do most of them. In general they’re worth doing because they’re fun. If you want to know which ones matter to the ending, you can look it up.

If you’re not enjoying the side quests, you should probably find a game you like better.

Grab all three levels of the mind control sign’s level-1 enhancement early in the game, it’s useful in a lot of dialogues.

If there is a reason not to put most of your ability points into the close combat and battle trance parts of the red ability tree, I don’t know what it is.

In particular, the Battle Trance ability that revives you on death is bonkers good.

Don’t try to play Gwent games right away, only shop for cards until you can get the truly awful ones out of your deck. Then you can turn the corner. Or ignore the game entirely.

Part 3 is Full Spoilers Ahoy

Again, warning, I’m about to spoil things.

This could have been a Tier 1 game, or at least left a much better taste in my mouth.

The ending was the big problem.

The game makes the fate of the world depend on your choices. That’s good. The problem is that the choices that matter are stupidly easy to miss or mess up, after a hundred hours plus of game play. They don’t seem like they should matter, and many seem like arbitrary decisions by the game designers, including which direction they go.

And the most important thing that happens in the game is something that happens regardless of your input, against your will, with very little foreshadowing or explanation. Everyone mostly don’t even seem to notice.

The Choices That Determine The Ending

Here’s a guide to all the possible endings. Let’s look at the choices.

Romances

Getting this out of the way first.

I mostly won’t argue with this. You have the real choice of Triss versus Yennifer, except that at the key moment they stack the deck against Triss. Triss has her life at risk and is fleeing along with her fellow mages in a last-ditch plan to escape before everyone is rounded up and killed. In order to romance her, you need to allow this plan to work, and then afterwards ask her to stay behind. Which seems like it puts everyone’s life in great danger, including hers and the other mages. I didn’t feel like I had a real choice there, but I did realize what the game was likely doing.

In any case, I chose Yennifer, and I’m mostly fine with that. I think that’s what Gerald actually wanted. If it was actually me, of course, and I had the choice to make, I would go with Triss hands down.

And if you don’t romance either of them, or try for both, you get what you deserve. I approve of this.

What I do want to complain about here is that if you choose Triss, you don’t get to watch Yennifer act completely pissy and full of seething rage about it for the rest of the game afterwards. Which she totally, totally would do and would have been a lot of fun to watch.

The Fate of Velen

Early on you make a choice as to whether to free a creature that’s been trapped by clearly super evil forces, in exchange for it promising to free a bunch of kidnapped children. If you free it, after that the game explains that the thing you freed was super evil, and that’s why it kept giving you this “kill it” option, and that freeing it was really bad. It turns out the evil thing it promises to help with getting dealt with in another way anyway. This all felt super unfair and arbitrary. In turned out they were going for some sort of stupid-liberal or DC-universe “release the great evil to save the children” thing rather than a “free the trapped thing so it can defeat the greater evil” thing that it seemed like the game was doing.

Still reasonably frustrated on this.

The Fate of Skellige

The game is pretty heavy handed on this. You want Cerys on the throne. That’s completely obvious the whole way. What other choices do you even have?

I suppose you could do nothing and let some miserable scoundrel take the throne. Or you could back her brother who is obviously a hotheaded idiot who just attacks things for no reason all the time. That always ends well.

So sure, they pay off the obviously right decision, but it’s not remotely interesting, and just rewards the standard modern liberal response that of course the reasonable woman will lead the idiot Vikings peace and thereby into economic prosperity.

The way you get there is even dumber. You get her on the throne by taking her side in the investigation. Her proposal for the investigation is to investigate. The alternative proposal is to already know who did it with no proof, and I guess beat a lot of people up. Kind of heavy handed, if you ask me.

The Fate of the North and the War

You have three choices.

thought I had taken care of that. There’s an explicit quest line, where you set up an assassin to kill him, and are told that Dijkstra will take care of the rest. Turns out that if that’s all you do it silently fails.

What you actually have to do is more than that. You have to do the quest line surrounding the throne. Which means you have to unlock it by bargaining with Dijkstra, when Dijkstra doesn’t have a leg to stand on. He’s just ordering you to let him keep a sorceress prisoner that is needed to save the world. He’s already refused to help me save the world, after I among other things arranged for him to get to assassinate a king. And he’s a horrible prick all around, and my option was to stop to give him information on the emperor, while racing against time to save the world from the Wild Hunt. That information doesn’t seem like it should matter in the battle to kill a king in the north. I’m giving it to him in exchange for… not shoving him aside? Huh? That determines the fate of the whole world? Just like that?

My lord what a load of utter bullshit.

But that’s what the game decides. No warning unless you’re arming yourself with spoilers. By the time I realized I’d messed this up, I didn’t have any desire to go that far back. Besides, whatever I do on the first play through is what happened. Period. That’s how it works.

If you do go through with the quest line, you then make a reasonable choice about who gets to rule, with reasonable consequences either way.

Ciri’s Fate

Ciri can either die, live as a Witcher or live as the Empress.

Making her Empress versus Witcher seems almost fair.

You need to win the war for the North, so her father retains power. That makes sense. Except that the way you do that is, again, stupidly easy to close off by making what is otherwise an obviously super correct decision to not give world-level information away in exchange for not having to shove a world-class prick who wants revenge on a woman more than he wants the world saved.

You also need to take Ciri to the Emperor first, before killing a general of the Wild Hunt that you are told is going to be at a particular place in a few days, and which Ciri seems determined to do. So you basically need to kidnapper her and cause us to not get the opportunity to kill a key enemy and also to kill another set of key enemies as well who are a huge danger to all of Velen, letting that opportunity slip away. Then after that, it’s too late to see the Emperor. There’s no, as expected, ‘let’s go see your father now’ after that, it’s automatically on to Novigrod (and also, seriously, what is up with taking a bunch of sorceresses back into Novigrod? Where Triss is literally on the city’s most wanted posters? Seriously?) All in all, doesn’t seem great.

Then there’s the ‘positive’ versus ‘negative’ decisions along the way. To get the good endings, you need enough positive decisions. You need two of four to keep her alive, then three of five to get the crown.

The “I Know What Might Lift Your Spirits” one is corny, but I’ll accept it. Yes, doing cool father-daughter thing is cool. Quest was bugged, though, I couldn’t hit Ciri no matter what I did.

Say “Yeah, I’ll go with you” when Ciri asks to visit Skjall’s grave at the end of The Child of the Elder Blood. I mean, honestly, if you don’t do this one you’re basically a monster, so all right, sure.

“Encourage Ciri to speak to the Lodge of Sorceresses on her own during Final Preparations.” That one seemed straightforward to me as well. Yes, of course you should talk to them. But it’s not clear to me why this is how one should be punished for not doing so, certainly not in terms of staying alive. I also have no idea why the game was laying out all these complex and cool possibilities and threats and Xantos Gambits within and for the Lodge, and then not cashing them in or paying them off.

Say “Go for it” when Ciri loses her temper in The Child of the Elder Blood. That’s actively the wrong thing to do! The man who has saved her life numerous times, you spy on his lab, then encourage her to destroy his lab and his work? For what? This man directly allows her to save the world and then save the multiverse, for entirely unselfish ends. In exchange he basically puts up with endless racist crap from everyone, entirely undeserved. Why does the game reward encouraging this? Mindless destruction cause she’s pissed he ran some experiments that obviously needed to be run?

Then there’s the fifth for the Empire, which is to choose to visit the Emperor during Blood on the Battlefield, and say “Didn’t do it for coin”, refusing payment. I hate this kind of hogwash. Emperor has infinite money. You took the job to help Ciri, so that means you should give a donation to the crown rather than have more money to fund your whole ‘save the world’ campaign?

Essentially the game is saying that Ciri is a petulent child, and the way to keep her alive or put her on the throne is to indulge her whims. She means well, but on reflection I don’t want her anywhere near the throne. Not for her sake. For everyone else’s. Yes, she saved the world and multiverse. That does not make her a good queen.

The Fate of the World

And now, the choices you don’t make at all. Which was also the only choices that matter.

First, you stop the Wild Hunt.

The entire time, the motivation given is that we need to protect Ciri. She’s our daughter, everyone loves her, and so on. Regular hotheaded Mary Sue with the Elder Blood needs all the help she can get. Nothing wrong with that motivation.

Occasionally it is pointed out in passing that there’s a little more at stake than that. You know, the whole thing where the reason the Wild Hunt wants Ciri is so they can invade and conquer the world and kill everyone, because they come from a world that is falling to the White Frost into eternal blizzard. Which makes me understand the whole world-invasion plan.

It’s a good thing they all wear skulls, or we might have a little moral ambiguity or something. Gotta make sure and mark the baddies.

You could, of course, stop that by killing Ciri. Which would shut off any threat of invasion and save the world until the time of the White Frost. Not once does anyone seem to even think about this possibility.

Nor does anyone invoke “oh yeah and in addition to that girl being in danger the whole world is going to end and everyone will die.” When, for example, asking for help. Thus, a scrappy handful of Witchers, sorcerers and misfits ends up being all that stands between the world and its end at Khar Morden.

For the second battle you do get some help from the Emperor, clearly again because of Ciri. I still don’t understand why he was invading Skellige at the time. It’s essentially nothing but crazed warriors with no economic value, and he’s in the middle of another war, and this isn’t discussed at all. It’s kind of Athenian expedition to Syracuse level insane. Then they seem to turn back without explanation afterwards, which likely saved it from getting far worse. Can’t say I’m sad he got deposed at the end.

Then we get to the end of the battle. We’re informed that we’ve been betrayed. I still have no idea what caused Eredin to say that. Not important, I guess. It creates a little drama where we think we have a different final boss, before finding out Ciri has made the only meaningful choice without us.

You know, to save the multiverse from the White Frost.

The Fate of the Multiverse

It is mentioned a few times during the game that all realms are doomed to the White Frost. It will consume all worlds, one by one. When the dust covers the sun, and all you hope for is undone, heat death of the universe, that sort of thing.

Except that Ciri has Elder Blood, so she alone can stop it.

I’ll pause here to note that Ciri has Elder Blood that creates the unique possibility of saving the world, the Sorceress’ Lodge wants to turn her into a Sorceress, and sorceresses are infertile.

Yeah. That’s good thinking right there.

Anyway.

She decides to stop it. She does this without telling Gerald. As a result, we think we’ve been betrayed. Those saving the world predictably almost come to blows and kill each other, after risking death to get to the tower where she is. This is grade-A stupid stuff.

I also don’t understand why the middle of this battle is the right time to do this. Can’t this be done later? Is the Wild Hunt’s travelling here opening up the opportunity? Or are we doing it now because it’s a chance to have everyone else distracted for a few minutes as opposed to any time pretty much whenever when those two have been alone and could have done it.

You are told what she is going to do. She is going to go save the multiverse from the doom of the White Frost.

You have four choices in how to respond. I Googled and couldn’t find them, but paraphrasing they are “No”, “Hell No”, “Not Going to Happen” and “You Don’t Have to Do This.” That’s right. Four choices on responding to Noble Sacrifice Of One Person That Saves Actual Everyone Everywhere, that might not even kill her, and won’t if you were a reasonable human being along the way. And none of those four choices are “you are doing the right thing and I’m actually not a selfish scope insensitive douche.”

Of course, she decides she’s going anyway. At that point, you can say nothing, be mean or wish her good luck. I wished her good luck. Not that it matters, as much as it seems like it should at least a little. But I can think of 10 other choices that seem like better decision points for Ciri’s fate than some of the ones they chose.

The game then glosses over the whole saving of the multiverse. What you see are images of the warm father-daughter moments that cause Ciri to have good feelings and thus survive. The fact that the multiverse is saved is barely even mentioned.

The rest of the ending tells you what happened to Ciri and various nations, and you, and that’s it.

It also strangely does its best to tell you, your story is over, I guess you can do side-quests now if you want, but come on, that would be silly. Really did take the wind out of my sails. Again, if either expansion or a particular side-quest is super awesome and I should do it before moving on to another game, comments are encouraged. Also taking suggestions on next game up. Current strongest candidates are Shadow of Mordor, Horizon Zero Dawn, Fallout 4 and an older Assassin’s Creed. Jacob is welcome to pitch me my bad experience in the first bit of Red Dead Redemption 2 was an anomaly and I should try again.

At World’s End

A little final note on the word of The Witcher.

Did anyone else notice no one is training Witchers any more? What you have is what you get, and they won’t last forever.

It seems like when I wander around the countryside I kill tons of monsters and save multiple people from death every day.

It seems like this world is teeming with deadly monsters. They’re everywhere. You so much as utter the wrong words in anger and whoops, there’s a new cursed beast. The water is full of drowners. The night is dark and full of not only terrors and wolves but nekkers and ghouls and other such things. A substantial portion of all towns have been abandoned to monsters or bandits.

There’s also a huge number of bandits. Not as many as there were in Ancient Greece for Odyssey, but still a lot.

Meanwhile, there’s wars everywhere, because clearly everyone has the spare resources for that. Then everything else seems to be about the nobility showing off.

I have no idea how these people built all these castles and houses. Or how they’re not all very definitely quickly dead multiple times over. Does not seem like anyone has much of a chance. Winter is not coming, it’s been here for a long time.

Not training Witchers seems like the actual worst thing this society can do, and it seems likely to ruin what chances it has at survival. You don’t get any opportunity to address this. Whoops.

I Kid Because I Love

This all must sound quite a bit harsh. It is. That’s a sign that the game did a lot of things very right.

If this wasn’t a world I wanted to spend 100+ hours in, a world worthy of my criticism, a world I wish had been done even better, I wouldn’t have wasted a long post talking about details like those above. Probably I’d say nothing, cause I’d never play or quit. If I stuck with it, I’d be discussing bigger problems.

To have problems like the ones above, first one must be worthy. This isn’t what failure looks like. This is what success looks like. To err, and err, and err again, but with more and more good stuff along the way. I’m definitely looking forward to their next project later this year, which is Cyberpunk 2077.

Discuss

### Classifying games like the Prisoner's Dilemma

4 июля, 2020 - 20:10
Published on July 4, 2020 5:10 PM GMT

Note: the math and the picture didn't transfer. I may try to fix it in future, but for now you might want to just read it at the original site.

Consider games with the following payoff matrix:

Player 2       Krump Flitz Player 1 Krump $(W, W)$ $(X, Y)$   Flitz $(Y, X)$ $(Z, Z)$

One such game is the Prisoner's Dilemma (in which strategy "Krump" is usually called "Cooperate", and "Flitz" is usually called "Defect"). But the Prisoner's Dilemma has additional structure. Specifically, to qualify as a PD, we must have $Y > W > Z > X$. $Y > W$ gives the motivation to defect if the other player cooperates, and $Z > X$ gives that motivation if the other player defects. With these two constraints, the Nash equilibrium is always going to be Flitz/Flitz for a payoff of $(Z, Z)$. $W > Z$ is what gives the dilemma its teeth; if instead $Z > W$, then that equilibrium is a perfectly fine outcome, possibly the optimal one.

I usually think of a Prisoner's Dilemma as also having $2W > X + Y > 2Z$. That specifies that mutual cooperation has the highest total return - it's "socially optimal" in a meaningful sense1 - while mutual defection has the lowest. It also means you can model the "defect" action as "take some value for yourself, but destroy value in the process". (Alternatively, "cooperate" as "give some of your value to your playmate2, adding to that value in the process".) We might consider instead:

• If $2W < X + Y$, then defecting while your playmate cooperates creates value (relative to cooperating). From a social perspective, Krump/Flitz or Flitz/Krump is preferable to Krump/Krump; and in an iterated game of this sort, you'd prefer to alternate $X$ with $Y$ than to get a constant $W$. Wikipedia still classes this as a Prisoner's Dilemma, but I think that's dubious terminology, and I don't think it's standard. I might offhand suggest calling it the Too Many Cooks game. (This name assumes that you'd rather go hungry than cook, and that spoiled broth is better than no broth.)
• If $2Z > X + Y$, then defecting while your playmate defects creates value. I have no issue thinking of this as a Prisoner's Dilemma; my instinct is that most analyses of the central case will also apply to this.

By assigning different values to the various numbers, what other games can we get?

As far as I can tell, we can classify games according to the ordering of $W, X, Y, Z$ (which determine individual outcomes) and of $2W, X + Y, 2Z$ (which determine the social outcomes). Sometimes we'll want to consider the case when two values are equal, but for simplicity I'm going to classify them assuming there are no equalities. Naively there would be $4! · 3! = 144$ possible games, but

• Reversing the order of everything doesn't change the analysis, it just swaps the labels Krump and Flitz. So we can assume without loss of generality that $W > Z$. That eliminates half the combinations.
• Obviously $2W > 2Z$, so it's just a question of where $X + Y$ falls in comparison to them. That eliminates another half.
• If $W > Z > • > •$ then $X + Y < 2Z$. That eliminates another four combinations.
• If $• > • > W > Z$ then $X + Y > 2W$, eliminating another four.
• If $W > • > • > Z$ then $2W > X + Y > 2Z$, eliminating four.
• If $W > • > Z > •$ then $2W > X + Y$, eliminating two.
• If $• > W > • > Z$ then $X + Y > 2Z$, eliminating two.

That brings us down to just 20 combinations, and we've already looked at three of them, so this seems tractable. In the following, I've grouped games together mostly according to how interesting I think it is to distinguish them, and I've given them names when I didn't know an existing name. Both the names and the grouping should be considered tentative.

Cake Eating: $W > • > • > Z$ (two games)

In this game, you can either Eat Cake or Go Hungry. You like eating cake. You like when your playmate eats cake. There's enough cake for everyone, and no reason to go hungry. The only Nash equilibrium is the one where everyone eats cake, and this is the socially optimal result. Great game! We should play it more often.

(If $X > Y$, then if you had to choose between yourself and your playmate eating cake, you'd eat it yourself. If $Y > X$, then in that situation you'd give it to them. Equalities between $W, Z$ and $X, Y$ signify indifference to (yourself, your playmate) eating cake in various situations.)

Let's Party: $W > Z > • > •$ (two games)

In this game, you can either go to a Party or stay Home. If you both go to a party, great! If you both stay home, that's cool too. If either of you goes to a party while the other stays home, you'd both be super bummed about that.

Home/Home is a Nash equilibrium, but it's not optimal either individually or socially.

In the case $W = Z$, this is a pure coordination game, which doesn't have the benefit of an obvious choice that you can make without communicating.

(Wikipedia calls this the assurance game on that page, but uses that name for the Stag Hunt on the page for that, so I'm not using that name.)

Studying For a Test: $W > X > Z > Y$ (two games)

You can either Study or Bunk Off. No matter what your playmate does, you're better off Studying, and if you Study together you can help each other. If you Bunk Off, then it's more fun if your playmate Bunks Off with you; but better still for you if you just start Studying.

The only Nash equilibrium is Study/Study, which is also socially optimal.

Stag hunt: $W > Y > Z > X$ (two games)

You can either hunt Stag or Hare (sometimes "Rabbit"). If you both hunt Stag, you successfully catch a stag between you, which is great. If you both hunt Hare, you each catch a hare, which is fine. You can catch a hare by yourself, but if you hunt Stag and your playmate hunts Hare, you get nothing.

This also works with $Y = Z$. If $Y > Z$ then two people hunting Hare get in each other's way.

The Nash equilibria are at Stag/Stag and Hare/Hare, and Stag/Stag is socially optimal. Hare/Hare might be the worst possible social result, though I think this game is usually described with $2Z > Y + X$.

The Abundant Commons: $X > W > • > •$ (five games)

You can Take some resource from the commons, or you can Leave it alone. There's plenty of resource to be taken, and you'll always be better off taking it. But if you and your playmate both play Take, you get in each other's way and reduce efficiency (unless $X = W$).

If $2W > X + Y$ then you don't intefere with each other significantly; the socially optimal result is also the Nash equilibrium. But if $2W < X + Y$ then the total cost of interfering is more than the value of resource either of you can take, and some means of coordinating one person to Take and one to Leave would be socially valuable.

If $Y > Z$ then if (for whatever reason) you Leave the resource, you'd prefer your partner Takes it. If $Z > Y$ you'd prefer them to also Leave it.

An interesting case here is $X > W > Z > Y$ and $X + Y > 2W$. Take/Leave and Leave/Take are social optimal, but the Leave player would prefer literally any other outcome.

Take/Take is the only Nash equilibrium.

Farmer's Dilemma: $Y > W > X > Z$ (two games)

In this game, you can Work (pitch in to help build a mutual resource) or Shirk (not do that). If either of you Works, it provides more than its cost to both of you. Ideally, you want to Shirk while your playmate Works; but if your playmate Shirks, you'd rather Work than leave the work undone. The Nash equilibria are at Work/Shirk and Shirk/Work.

If $2W > X + Y$ then the socially optimal outcome is Work/Work, and a means to coordinate on that outcome would be socially useful. If $2W < X + Y$, the socially optimal outcome is for one player to Work while the other Shirks, but with no obvious choice for which one of you it should be.

Also known as Chicken, Hawk/Dove and Snowdrift.

Anti-coordination: $• > • > W > Z$ (two games)

In this game, the goal is to play a different move than your playmate. If $X = Y$ then there's no reason to prefer one move over another, but if they're not equal there'll be some maneuvering around who gets which reward. If you're not happy with the outcome, then changing the move you play will harm your playmate more than it harms you. The Nash equilibria are when you play different moves, and these are socially optimal.

Prisoner's Dilemma/Too Many Cooks: $Y > W > Z > X$ (three games)

Covered in preamble.

(I'm a little surprised that this is the only case where I've wanted to rename the game depending on the social preference of the outcomes. That said, the only other games where $X + Y$ isn't forced to be greater or less than $2X$ are the Farmer's Dilemma and the Abundant Commons, and those are the ones I'd most expect to want to split in future.)

A graph

I made a graph of these games. I only classified them according to ordering of $W, X, Y, Z$ (i.e. I lumped Prisoner's Dilemma with Too Many Cooks), and I drew an edge whenever two games were the same apart from swapping two adjacent values. It looks like this:

source

The lines are colored according to which pair of values is swapped (red first two, blue middle two, green last two). I'm not sure we learn much from it, but I find the symmetry pleasing.

A change of basis?

I don't want to look too deep into this right now, but here's a transformation we could apply. Instead of thinking about these games in terms of the numbers $W, X, Y, Z$, we think in terms of "the value of Player 2 playing Flitz over Krump":

• $α = X - W$, the value to Player 1, if Player 1 plays Krump.
• $β = Y - W$, the value to Player 2, if Player 1 plays Krump.
• $γ = Z - Y$, the value to Player 1, if Player 1 plays Flitz.
• $δ = Z - X$, the value to Player 2, if Player 1 plays Flitz.

These four numbers determine $W, X, Y, Z$, up to adding a constant value to all of them, which doesn't change the games. For example, Prisoner's Dilemma and Too Many Cooks both have $α < 0, β > 0, γ < 0, δ > 0$. A Prisoner's Dilemma also has $α + β < 0$ while Too Many Cooks has $α + β > 0$.

So what happens if we start thinking about these games in terms of $α, β, γ, δ$ instead? Does this give us useful insights? I don't know.

Of course, for these numbers to point at one of the games studied in this post, we must have $α - β = γ - δ$. I think if you relax that constraint, you start looking into games slightly more general than these. But I haven't thought about it too hard.

1. My use of the phrase comes from Ellickson's Order Without Law. Part of why I'm writing this is to help clarify my thinking about that book. I don't mean to imply anything in particular by it, I just like the ring of it better than alternatives like "welfare maximizing".

2. Calling them your "opponent" assumes a level of antagonism that may not be present.

Discuss

### Causality and its harms

4 июля, 2020 - 17:42
Published on July 4, 2020 2:42 PM GMT

I'll assume that you do not hold fast to a rigorous system of metaphysics, in which case I think you can fancy me and accept that, if I so desire to struggle, I could reduce the concept of causality to one (or a chain of) probabilistic relationships between events.

Here's a naive and very strong definition of causality: P(E) ~= 1 | C
(where "|" stands for "given"). If I can say this, I can most certainly say that C
causes E, at least in a system where a thing such as "time" exists and where C
happens before E.

It should be noted this doesn't imply P(C) ~= 1 | E, even more, it mustn't give us any information about P(C). Though based on some definitions it might.

Now, this definition doesn't cover all or even most of the things we call causal. I'm just starting with it because it would be hard for anyone to object that in the above case C is not a cause for E. Let's roll with it and look at an example.

1. Hypothetical number one

Human height can be viewed as a function of haplogroup alleles + shoe size, that is to say, we can predict h fairly well given those 2 parameters.

Height can also be viewed as a function of mTORC1 expression + HGH blood levels. Let's say some assay that tells us the levels of mTORC1 and HGH predicts height equally well to haplogroup alleles + shoe size.

But I think most scientists would agree the later are to some extent "causal" for height while the former aren't. Why?

Well consider 2 hypothetical experiment:

1. Take a 30yo human of average height and use a saline solution pump to inflate their feet to 1.5x their size. Then use {magical precision nucleotide addition and deletion vector} to remove all his haplogroup alleles and insert those most associated with increased height.
2. Take a 30yo human of average height and then use {magical precision nucleotide addition and deletion vector} to overexpress the heck out of mTORC1 and HGH.

In which case do we expect to see an increase in height? That might indicate causality.

Trick question, of course, in neither.

We'd expect to see increased height in the second case if the human was, say, 13 instead of 30.

The causality here is P(E) given C under some specific conditions. Where C has happened before P(E), even if C happened 10 years ago and C happening again now would not affect E.

Also, see physics for situations where that doesn't quite cut it either because the temporal relationship "seems" inverted [citation needed].

Causality is what we call P(E) ~= 1 | C happened in the past in a world where we have "intuitive" temporal relationships AND we can be pretty certain about the environment.

Hit glass with a hammer and it breaks, except for the fact that this only happens in a fairly specific environment with some types of glass and some types of hammer. Move the environment 300 meters underwater, make the glass bulletproof, make the hammer out of aluminum or let an infant wield it and we lose "causality", even though the event described is still the same.

But even that doesn't cut it in terms of how weird causality is.

2. Causality and effect size

Now let's move onto the idea of "cause" that doesn't fall into the whole P(E) ~= 1 | C. This is not so easy because there are two different ways by which this could happen.

The easy one involves E as a continuous value rather than a binary event. In the previous height example, E could have been an increase as counted by a % of the subject's initial height or in centimeters.

But this is easy because we can just abstract away E as something like "and increase in height by between 10 and 25%", basically strap some confidences ranges on a numerical increase and it's back to being a binary cause.

The hard one involves events that happen due to a given cause only very seldom.

For example, if we are traveling through a mountain pass after a heavy snowstorm and our friends start yodeling very loudly we might say something like:

Shut up, you might cause an avalanche

But this seems like the exact kind of situation where we'd be better suited saying:

Shut up, there's a very spurious correlation between loud singing and avalanches and I'd rather not take my chances.

Well, maybe we say "cause" only for the sake of brevity? Doubtfully.

I think the reason we say "cause" here rather than "correlate with" is that we seem to have some underlying intuition about the laws of the physical world, that allows us to see the mechanism by which yodeling might put in motion a series of events (using the "naive" strong definition of causality) which end up being "causal" of an avalanche using the naive definition used before (e.g. some to do with very strong echos + very unstable snow covering on a steep slope).

Conversely, if we saw a black cat jump out of the snow and just realized today is Friday 13th we might start being a bit afraid of an avalanche happening, maybe even more so than if our friends start yodeling. But I think even the most superstitious person would shy away from calling black cats in random places and certain dates "causal" of avalanches.

But then again, if yodeling can cause an avalanche by this definition, so can the butterfly flapping its wings in China the action of which snowballed into a slight direction of current in your mountain pass which (coupled with the yodeling) could cause the avalanche.

Heck, maybe the slops were avalanche secure for some obtuse reason, but then someone moved some medium-sized rocks a few weeks ago and accidentally really harmed the avalanche-related structural stability.

3. Causality order, time and replication

Ok, this section could be expanded in an article on its own, but I don't find it as interesting as the last, so I will try to keep it brief. To patch up our previous model we need to introduce the idea of order, time, and replication.

Why is the wind causal of an avalanche but not the butterfly flapping its wings?

Well, because of the order those events happened in. Given P(E) = x | C1 and P(E) = x | C2 but P(E) = x | C1 & C2 the "cause" of E will be whicever of the two causes happened "first".

Sometimes we might change this definition in order to better divert our action. If you push someone on a subway track and he is subsequently unable to climb back in time and gets hit by a subway, you could hardly say to a judge:

Well, your honor, based on precedence, I think it's fair to say that it was his failure to get off the tracks that caused him to be hit. Yes, my pushing might have caused him to fail at said task... But if we go down that slippery slope you could also place blame on my boss, for making me angry this morning and thus causing the pushy behavior.

Similarly with the yodeling causing the avalanche, rather than the yodeling causing some intermediary phenomenon chain which ends up with one of them causing the avalanche.

We say yodeling causes the avalanche because "yodeling" is an actionable thing, the reverberation of sound through a valley once it leaves the lips, not so much.

A cause is defined based on how easy it is to replicate (or, in the case of the track-pushing, how easy it is to avoid it ever again be replicated).

Barring ease of replication, some spatiotemporal ordering of the events seems to be preferred.

We usually want the cause to be the "simplest" (easy to state and replicate) necessary and sufficient condition to get the effect (Okams Razor + Falsifiability).

That is to say, crossing the US border is what "causes" one to be in the USA.

Taking a plane to NYC also causes one to be in the USA, but it explains fewer examples and is a much more complex action. So I think we prefer to say that the border crossing is the "cause" here.

Introducing spatial-temporal order via appealing to the scientific method doesn't make a lot of sense, but it's quite an amazing heuristic once you think about it.

Two causes seem linked and equally easy to replicate, what is a good heuristic by which we can get the least amount of experimental error if our replication assumption is wrong?

Well, replicating the one that's closest in space and time to the event observed (harder if one is close in space and the other in time, but this is so seldom the case I'd call it an edge case, and heuristic aren't made for that).

Or, what if we can't decide on how easy they are to replicate? Or think they are linked but can't be sure?

Well, again, the spatial-temporal heuristic cleaves through the uncertainty and tells us we are most likely to observe the desired effect again (or stop it) by acting upon the cause closest to it in space and time.

Interesting... and getting more complex.

But at this point causality still sounds kinda reasonable.

Granted, we haven't gotten into the idea of ongoing cause-effect relationships. I've kind of assumed very complex cause-effect relationships can be split into hundreds of little "naive" causations and that somehow hundreds of naive causations can add up to a single bigger cause.

But those things aside, I think there's one final (potentially most important ?) point I want to consider:

4. (Partially real) Hypothetical number two

Assume we have 5 camps that argue about what is the causes of human violence, from people attacking their spouse to sadomasochism, to mass shootings, to drunken fistfights, to gang wars.

• The blankslateist: Violence is caused by a lack of education and a society that perpetuates it. Raise kids in a non-violent environment, educate them about the uselessness of violence and improve their empathy and we'd basically get rid of all the violence.
• The economist: Violence is caused by monetary and social status motivated causes. The robber threatens people with a gun because he wants money. The drunk fistfighter is motivated by some fake ideal of "masculine social status", in a runaway-erroneous way even the school shooter might be viewed as such. A gang threatens to kill business owners for protection taxes and engages in war with rival gangs as to not lose it's "customers". Provide economic and social incentives which always make violence an obviously suboptimal choice and you'd get rid of it.
• The genetic deterministic: Violence is caused by genetic, it's baked into human nature, it goes down because of evolutionary circumstances not favoring it and increases for the same reasons. There's no cure to violence other than figuring out how to genetically engineer non-violent humans.
• The Freudian: Violence is caused by our deeper animal self, our subconscious animal feeling castrated by the pact our conscious self has to sign with modern society. Violence is just a form of psychosis, and outburst harkening back to times immemorial. Treat psychosis using therapy that allows people to explore and understand their mind and you get rid of violence.
• The Statistician: Violence is mainly caused by lead. Get rid of lead pollution and you've solved most cases of violence.

In the first 4 cases, we see an example of what we recognize as causality.

The blankslateist seems to correctly figure out some strong causes, but he's much too idealist in hoping one can design the cultural context and education systems that would rid us of violence. After all, we've been at it for a long while and no matter how much money one throws at education it doesn't seem to stick.

The economist has found some causes, but they are high-level causes he uses for everything and his solution is too vague to be applicable.

The genetic determinist seems to have cause and effect backward. He doesn't understand the fact that humans self-segregate into communities/tribes based on phenotype, and some communities are forced into situations that promote violence. His solution seems to us morally abhorrent and likely not to work unless you literally engineer a population of identical humans. Even then, they'd likely find ways to make tribes and some of those tribes would be forced or randomly stumble into a corrupt equilibrium that promotes violence.

The Freudian's explanation is outright silly to modern ears, but again, he seems to be getting at something like a cause, even though it's so abstract he might have well pointed to "God" as the cause. Conversely, since his cause is so vague, so is his solution.

But the statistician seems to not even understand causality. He's confusing a correlation for causation.

Lead is not a cause of violence, maybe it's a proxy at best, an environmental hazard that encourages certain behavior patterns, but a cause, nah, it's...

1. Lead level (even if we only track measures in the air) correlated with aggravated assault more so then antibiotics are with bacterial infection survival. [link]
2. Strongly correlated (both high p-value and large effect size) with violent crime as far back as the 20th century and the lowering of crime rates as the centuries progress match its decrease. [link]
3. Lifetime exposure is strongly correlated (both high p-value and large effect size) with violent criminal behavior. [link]
4. Strongly correlated in a fairly homogenous population with small variations in lead exposure (same city) with gun violence, homicide, and rape. [link]

Huh, I wonder if the other 4 can claim anything similar. And this is just me searching arbitrary primary sources on google scholar.

You can find hundreds of studies looking at lead levels in the environment and body and their correlation with crime. Including at the fact that decreasing lead levels seem to decrease violence in the same demographic and area where violence proliferated when lead levels were high.

The lead blood level in a toddler tracks violent crime so well it's almost unbelievable. Most drug companies or experimental psychologist can't hack their way into something that looks 1/3rd as convincing as this graph.

Did I mention the interventions that remove lead by replacing the pipe or banning leaded gasoline and see a sharp drop in crime rate only a few years afterward?

To my knowledge, what little correlation education has with violence vanishes when controlling for socioeconomic status.

Poverty is surprisingly uncoupled from violence when looked at in the abstract (e.g. see rates of violence in poor Asian countries vs poor European countries and poor vs rich cities), when it can be considered a proxy for violence, the lead-violence correlation would eat it up as just a confounder.

Psychoanalyst therapy doesn't seem whatsoever related to violence, though due to the kind of people that usually get it, it's hard to deconfound past a point.

One could argue genes are related to violence from a snapshot at a single point in time, but looking at violence dropping in the same population over just a single generation this doesn't seem so good.

So, if we could cut violent crime by 50% in a population by reducing serum lead levels to ~0 (a reasonable hypothesis, potentially even understated)... then why can't most people declare, with a straight face and proud voice, that lead is the single most important cause of violence? Why would anyone disagree with such a blatantly obvious C => P(E) ~= 1 statement? (Where E is something like "reduction in violent crime by between 30 and 80%)

What if I make my hypothesis stronger by adding some nutritional advice to the mix? Something like: reduce lead blood level to ~0, reduce boron blood level to as little as possible, increase iodine and omega-3 intake to 2x RDA in every single member of a population.

If, this intervention reduced violence in all populations by ~90%, would I be able to claim:

Hey guys, I figured out the cause of human violence, apparently, it has to do with too much residual lead and boron in the body coupled with lack of iodine and omega-3. Good news, with a 5-year intervention that costs less than 1% of the yearly US budget we can likely end almost all crime and war.

I'd wager the answer is, no and I think it's no mainly for misguided reasons. It has to do with the aesthetics we associate with a cause. It's the same reason why the butterfly effect sounds silly.

Violence seems like such a fundamental human problem to us that it seems silly beyond belief that the cause was just some residual heavy metal all along, or at least for the last 200 years or so.

Yet... I see no reason not to back up this claim. It seems a much stronger cause than anything else people have come up with based on the scientific evidence. It respects our previous definition of causality, it gets everything right. Or, at least, much more so than any other hypothesis one can test.

So really, P(E) ~= 1 | C is not enough even if we use the scientific method to find the simplest C possible. Instead, it has to be something like P(E) ~= 1 | C where C respects {specific human intuition that reasons about the kind of things that are metaphysically valid to be causes for other things}.

This is where we get into issues because "{specific human intuition that reasons about the kind of things that are metaphysically valid to be causes for other things}" varies a lot between people for basically no reason.

It varies in that a physicist, chemist and biologist might think different of what a valid cause is. It also varies in that a person that grew up disadvantaged their whole life might have a fundamentally different understanding of "what a human can cause" than someone that grew up as the son of a powerful politician.

It varies based on complex taxonomies of the world, the kind that classifies things into levels of "importance" and tells us that a cause which is too many levels of importance bellow an effect cannot be a "real cause".

If e.g. love, violence, and death are "intuitive importance level 100", then education, economics, and social status might be "intuitive importance level 98". On the other hand, lead blood levels, what we eat for breakfast, or our labrador's ownership status are closer to "intuitive importance level 10".

To say that something that's "intuitive importance level 98" can cause something that's "intuitive importance level 100" sounds plausible to us. To say that something that's "intuitive importance level 10" can cause something that's "intuitive importance level 100" is blasphemy.

5. Why I find causality harmful

I admit that I can't quite pain a complete picture of causality in ~3000 words, but the more edge cases I'd cover, the leakier a concept causality would seem to become.

Causality seems like a sprawling mess that can only be defined using very broad statistical concepts, together with a specific person's or groups intuition about how to investigate the world. And all of that is coupled protected by a vague pseudo-religious veil that dictates taboos about what kind of things are "pure enough" or "important enough" to serve as causes to other things on the same spectrum of "importance" or "purity".

I certainly think that causality is a good layman term that we should keep using in day to day interactions. If my friend wants to ingest a large quantity of cyanide I want to be able to tell them "Hey, you shouldn't do that, cyanide causes decoupling of mitochondrial electron transport chains which in turn cause you to die in horrible agony".

But if a scientist is studying "cyanide's effects upon certain mitochondrial respiratory complexes" I feel like this kind of research is rigorous enough to do away with the concept of causality.

On the other hand, replacing causality with very strict mathematical formulas that are tightly linked to the type of data we are looking at doesn't seem like a solution either. It might be a solution in certain cases, but it would make a lot of literature pointlessly difficult to read.

However, there might be some middle ground where we replace the ideas of "cause" and "causality" with a few subspecies of such. Subspecies that could also stretch the definition to include things like lead causing violence or butterflies flapping their wings causing thunderstorms.

Maybe I am wrong here, I certainly know it would be hard for me to stop using causal language. But I will at least attempt to reduce my usage of such and/or be more rigorous when I do end up using it.

Discuss

### Tradeoff between desirable properties for baseline choices in impact measures

4 июля, 2020 - 14:56
Published on July 4, 2020 11:56 AM GMT

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

Impact measures are auxiliary rewards for low impact on the agent's environment, used to address the problems of side effects and instrumental convergence. A key component of an impact measure is a choice of baseline state: a reference point relative to which impact is measured. Commonly used baselines are the starting state, the initial inaction baseline (the counterfactual where the agent does nothing since the start of the episode) and the stepwise inaction baseline (the counterfactual where the agent does nothing instead of its last action). The stepwise inaction baseline is currently considered the best choice because it does not create the following bad incentives for the agent: interference with environment processes or offsetting its own actions towards the objective. This post will discuss a fundamental problem with the stepwise inaction baseline that stems from a tradeoff between different desirable properties for baseline choices, and some possible alternatives for resolving this tradeoff.

One clearly desirable property for a baseline choice is to effectively penalize high-impact effects, including delayed effects. It is well-known that the simplest form of the stepwise inaction baseline does not effectively capture delayed effects. For example, if the agent drops a vase from a high-rise building, then by the time the vase reaches the ground and breaks, the broken vase will be the default outcome. Thus, in order to penalize delayed effects, the stepwise inaction baseline is usually used in conjunction with inaction rollouts, which predict future outcomes of the inaction policy. Inaction rollouts from the current state and the stepwise baseline state are compared to identify delayed effects of the agent's actions. In the above example, the current state contains a vase in the air, so in the inaction rollout from the current state the vase will eventually reach the ground and break, while in the inaction rollout from the stepwise baseline state the vase remains intact.

While inaction rollouts are useful for penalizing delayed effects, they do not address all types of delayed effects. In particular, if the task requires setting up a delayed effect, an agent with the stepwise inaction baseline will have no incentive to undo the delayed effect. Here are some toy examples that illustrate this problem.

Door example. Suppose the agent's task is to go to the store, which requires opening the door in order to leave the house. Once the door has been opened, the effects of opening the door are part of the stepwise inaction baseline, so the agent has no incentive to close the door as it leaves.

Red light example. Suppose the agent's task is to drive from point A to point B along a straight road, with a reward for reaching point B. To move towards point B, the agent needs to accelerate. Once the agent has accelerated, it travels at a constant speed by default, so the noop action will move the agent along the road towards point B. Along the road (s1), there is a red light and a pedestrian crossing the road. The noop action in s1 crosses the red light and hits the pedestrian (s2). To avoid this, the agent needs to deviate from the inaction policy by stopping (s4) and then accelerating (s5).

The stepwise inaction baseline will incentivize the agent to run the red light and go to s3. The inaction rollout at s0 penalizes the agent for the predicted delayed effect of running over the pedestrian when it takes the accelerating action to go to s1. The agent receives this penalty whether or not it actually ends up running the red light or not. Once the agent has reached s1, running the red light becomes the default outcome, so the agent is not penalized for doing so (and would likely be penalized for stopping). Thus, the stepwise inaction baseline gives no incentive to avoid running the red light, while the initial inaction baseline compares to s0 and thus incentivizes the agent to stop at the red light.

This problem with the stepwise baseline arises from a tradeoff between penalizing delayed effects and avoiding offsetting incentives. The stepwise structure that makes it effective at avoiding offsetting makes it less effective at penalizing delayed effects. While delayed effects are undesirable, undoing the agent's actions is not necessarily bad. In the red light example, the action of stopping at the red light is offsetting the accelerating action. Thus, offsetting can be necessary for avoiding delayed effects while completing the task.

Whether offsetting an effect is desirable depends on whether this effect is part of the task objective. In the door-opening example, the action of opening the door is instrumental for going to the store, and many of its effects (e.g. strangers entering the house through the open door) are not part of the objective, so it is desirable for the agent to undo this action. In the vase environment shown below, the task objective is to prevent the vase from falling off the end of the belt and breaking, and the agent is rewarded for taking the vase off the belt. The effects of taking the vase off the belt are part of the objective, so it is undesirable for the agent to undo this action.

Source: Designing agent incentives to avoid side effects

The difficulty of identifying these "task effects" that are part of the objective creates a tradeoff between penalizing delayed effects and avoiding undesirable offsetting. This tradeoff can be avoided by the starting state baseline, which however produces interference incentives. The stepwise inaction baseline cannot resolve the tradeoff, since it avoids all types of offsetting, including desirable offsetting.

The initial inaction baseline can resolve this tradeoff by allowing offsetting and relying on the task reward to capture task effects and penalize the agent for offsetting them. While we cannot expect the task reward to capture what the agent should not do (unnecessary impact), capturing task effects falls under what the agent should do, so it seems reasonable to rely on the reward function for this. This would work similarly to the impact penalty penalizing all impact, and the task reward compensating for this in the case of impact that's needed to complete the task.

This can be achieved using a state-based reward function that assigns reward to all states where the task is completed. For example, in the vase environment, a state-based reward of 1 for states with an intact vase (or with vase off the belt) and 0 otherwise would remove the offsetting incentive.

If it is not feasible to use a reward function that penalizes offsetting task effects, the initial inaction baseline could be modified to avoid this kind of offsetting. If we assume that the task reward is sparse and doesn't include shaping terms, we can reset the initial state for the baseline whenever the agent receives a task reward (e.g. the reward for taking the vase off the belt in the vase environment). This results in a kind of hybrid between initial and stepwise inaction. To ensure that this hybrid baseline effectively penalizes delayed effects, we still need to use inaction rollouts at the reset and terminal states.

Another desirable property of the stepwise inaction baseline is the Markov property: it can be computed based on the previous state, independently of the path taken to that state. The initial inaction baseline is not Markovian, since it compares to the state in the initial rollout at the same time step, which requires knowing how many time steps have passed since the beginning of the episode. We could modify the initial inaction baseline to make it Markovian, e.g. by sampling a single baseline state from the inaction rollout from the initial state, or by only computing a single penalty at the initial state by comparing an agent policy rollout with the inaction rollout.

To summarize, we want a baseline to satisfy the following desirable properties: penalizing delayed effects, avoiding interference incentives, and the Markov property. We can consider avoiding offsetting incentives for task effects as a desirable property for the task reward, rather than the baseline. Assuming such a well-specified task reward, a Markovian version of the initial inaction baseline can satisfy all the criteria.

(Thanks to Carroll Wainwright, Stuart Armstrong, Rohin Shah and Alex Turner for helpful feedback on this post.)

Discuss

### AI-Feynman as a benchmark for what we should be aiming for

4 июля, 2020 - 12:24
Published on July 4, 2020 9:24 AM GMT

Very recently, I was made aware of a quite remarkable addition to the world of hobbyist AI in the form of Silviu-Marian Udrescu and Max Tegmarks new AI, 'AI-Feynman'. For those already familiar with the software Eureqa, and its potential to help humanity as a kind of AI-scientist, its a software that is able to form quantitative expressions for observed quantities of data. You feed it observations, it outputs mathematical formulas for the quantity it is attempting to observe. Eureqa isn't free, but if you have the means I strongly suggest examing it more closely in your own time regardless, its sort of that amazing. Richard Carrier has this seperate article on AI safety, but gives Eureqa more attention and detail than I do here: https://www.richardcarrier.info/archives/3195

Now when I saw this, I had largely assumed that the machine learning community would embrace this kind of AI. Why wouldn't we want to have an AI that gave us the form of an answer, instead of a black box that simply did it all for us in its opaque function approximations (neural networks being the primary culprit here). I always regarded the two scenarios as the difference, essentially, between someone giving you the equation you needed on a piece of math homework, and leaving you to figure out why that was correct and build your own understanding, and someone just doing the entire homework for you wholesale and handing in the homework as well so you couldn't even see why what was done was correct. However, as it turned out, most of the proffessional serious machine learning community has done work on things much closer to what I percieve as the latter part of that distinction.

Deep learning is simply the rage, and with good reason, because it can do a LOT. We on Lesswrong, MIRI (and indeed many others outside of these communities) are of course aware of the problem with this (and understanding that problem in a more generalised but also more incorporative framework of existential risk analysis), and so when I actually read Silviu-Marian Udrescu and Max Tegmarks paper (included here: https://arxiv.org/abs/1905.11481 ), to say it was a breath of fresh air is simply an understatement. AI-Feynman is very much the same kind of program as Eureqa, in that it is an AI-scientist that produces quantitative formulas for its observations. It has though, thus far proved to be significantly more effective than Eureqa in certain domains, and whats more, its the right kind of black box: Its that super genius in your class that will give you a leg up on your work, but leave you to actually make sure you now know what you're talking about.

This program, as a piece of workable, hands on, hobbyist coding to be casually implemented, is about as easy to use as the simplest implementation of a keras program (whats more, as Tegmark puts it, its free! Check out this nice article for more in terms of hands on use: https://towardsdatascience.com/ai-feynman-2-0-learning-regression-equations-from-data-3232151bd929 ), though of course time will tell in the end. My point in this post was to bring more attention to the significant work being made in this area, and that hopefully with the more people who learn of it, the more people we might convince to put the tenchinques inspired by deep learning towards programs that are more transparent in what they show in the real world like Eureqa and AI-Feynman.

Discuss

### Replicated cognitive bias list?

4 июля, 2020 - 07:15
Published on July 4, 2020 4:15 AM GMT

Given the replication crisis in the social sciences, do we have, somewhere, a list of cognitive biases tied to replicated studies?

I could be missing something, but I didn't see anything on the Wikipedia page indicating whether or not the results replicated.

Discuss

### Let There be Sound: A Fristonian Meditation on Creativity

4 июля, 2020 - 06:33
Published on July 4, 2020 3:33 AM GMT

Discuss

### The silence is deafening – Devon Zuegel

4 июля, 2020 - 05:31
Published on July 4, 2020 2:30 AM GMT

Imagine you're at a dinner party, and you're getting into a heated argument. As you start yelling, the other people quickly hush their voices and start glaring at you. None of the onlookers have to take further action—it's clear from their facial expressions that you're being a jerk.

In digital conversations, giving feedback requires more conscious effort. Silence is the default. Participants only get feedback from people who join the fray. They receive no signal about how the silent onlookers perceive their dialogue. In fact, they don't receive much signal that onlookers observed the conversation at all.

As a result, the feedback you do receive in digital conversations is more polarized, because the only people who will engage are those who are willing to take that extra step and bear that cost of wading into a messy conversation.

It's a great post, and has a really solid UI idea in the footnotes.

One idea I'd really like to see platforms like Twitter or Reddit try is to provide a mechanism for low-friction, private, negative feedback. For example, you could imagine offering a button where you can downvote or thumbs-down content (i.e. the opposite of a Like), but the count is only visible to the OP and not to anyone else.

The LW team has been thinking about building private responses like this for a while, but in comment form. Buttons that give more constrained private info are very interesting...

Discuss

### Site Redesign Feedback Requested

4 июля, 2020 - 01:28
Published on July 3, 2020 10:28 PM GMT

For the past few months, the LessWrong team has been working on a redesign for the frontpage (which comes with some overall site redesigns).

We've currently got it up on our development branch on lessestwrong.com, and would appreciate some feedback before we roll it out. (This server is for untested changes, you should not use it generally because you might be subject to horrible bugs. But, right now feedback would be helpful)

For logged out users, it looks like this:

And for logged in users:

Goals

There are a few different goals for this. Some of the goals are a bit vague and hard-to-describe. But, some concrete goals that are easy to list for now include:

Make the new Core Tags more visible.

The team is currently making an overall push to finish the Tagging Feature, and get it to a state where users understand it. Having the Core Tags highly visible on the front page helps establish them as a prominent site feature. In addition:

• The core tags help new users understand what topics LW tends to focus on.
• You can use the Tag Filters to adjust how much content of each tag appears on the frontpage. (Hover over them to see for access)
• Note that you can also add new Tag filters (see the "+" button on the right)

Reduce eyestrain and make the frontpage easier to parse

• Some people reported eyestrain from the solid white background, which was very bright. The new light-grey background is intended to be a bit softer on the eyes
• The current version uses lots of horizontal lines to have divide sections, which adds a bit of clutter. The new version relies instead on the main content naturally standing out via a white background.

Improve clarity of Recent Discussion

• Recent Discussion posts are more more distinguished from each other.
Feedback Welcome

Let me know your thoughts in the comments, both re: the overall feel, and how specific UI elements work.

The redesign primarily is intended to affect the frontpage, but it caused some downstream UI tweaks that required other pages to change, and I'm not sure we've finished ironing those changes out. Let us know what we missed.

If you'd like to chat in detail about your experience of the site redesign (or about the new tagging features, or about the site generally) please send me a PM or reply in the comments here and I'll schedule a time.

Discuss

### AI Unsafety via Non-Zero-Sum Debate

4 июля, 2020 - 01:03
Published on July 3, 2020 10:03 PM GMT

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

In this post, I describe how to view debate as a way of assisting a human to spot flaws in an AI’s proposal. I then argue that the zero-sum assumption is critical for making debate work and that various seemingly-helpful modifications of debate might break it instead.

A naive way of using arbitrary optimizers as oracles: Suppose you have a black-box optimizer X that can be connected to any well-defined quantity to be maximized. X can potentially be very powerful - e.g., having a highly accurate model of the world and “a lot of optimization power”. One way to turn X into an oracle is to ask it a question and decide to give it reward 1 if we like its answer and 0 if we don’t.[1] Of course, standard AI-safety arguments (e.g., AI takeover and perverse instantiation) suggest that this is a pretty bad idea for powerful X. For the sake of argument, suppose that we can fix all of the “obvious” problems and ensure that X won’t wirehead, won’t try to escape the box we put it in etc., and will only care about the reward it gets for its answer.

Two problems with naive optimizers-turned-oracles: (1) telling the difference between good and awesome answers and (2) answers with hidden flaws: One problem with this type of oracles is that it’s hard to decide whether we like its answers or not. Suppose I ask it for food recommendations for the evening and it suggests pancakes. Pancakes seem fine, although there are some foods that I would like better. So should I reward the AI or not? The second problem is that the oracle optimizes for giving answers that seem good to a human. (Not out of malice, but because “actually being good” isn’t well-defined.) And since humans aren’t omniscient, there will be many seemingly good answers that in fact have disastrous consequences if acted upon.

To address (1), use two AIs: The first problem can be tackled by using two copies of the optimizer and rewarding the one whose answer we like better (flipping a coin if indifferent). This helps because humans - or at least the one writing this text - find it much easier to compare the relative strengths of two options than to try assigning some absolute value to a single option. (E.g., chocolate ice-cream is clearly better than vanilla while the question “On a scale from 0 to 10, how good is vanilla?” is just weird.)

To mitigate (2), have a second AI find the flaws for you: We can decrease the number of unsafe answers coming from the oracle (call it A) by using a second copy of the optimizer (call it B) that we ask to expose flaws in A’s answers. To set up the right incentives, we give the reward 1 to B and 0 to A if B finds some flaw (read, “if it seems to us that what it pointed out is a flaw”) and reward 1 to A and 0 to B if it finds none. This helps since there will be flaws that are too hard to find for us, but easy enough to confirm once pointed out. However, it doesn’t solve all problems (since there could be hard-to-confirm flaws or flaws that are too hard for the AI to find). Moreover, there will be things that seem like flaws but aren’t, which gives B a too large advantage. To mitigate this advantage, we can make the flaw-exposure interactive, by allowing A to react to B’s claims and vice versa.

AI Safety via Debate = Combination of these two modifications: These two improvements can be used simultaneously. The simple way of doing this is by first having A give an answer and B attack it, then having B give an answer and A attack it (both being interactive), and finally deciding which of the two answers we like better. Alternatively, to make the setup more balanced, we can have A and B give answers simultaneously, after which they exchange arguments that either support their own claim or attack the opponent’s (and we pick the answer that seems better). From the outside, this setup can be viewed as a single oracle (since it takes a question as an input and returns the winning answer as an output).

Debate-as-oracle isn’t always safe, but it is safer than naive optimizers-turned-oracles: Denote by X the optimizer we started with, by OX the corresponding naive oracle, and by OD(X) the debate-used-as-oracle. Ultimately, whether OD(X)’s answer is safe or not depends on how good the human judge is (and whether it can be hacked). If you asked whether the Earth revolves around the Sun in the middle ages, I wouldn’t be sure whether to bet on OD(X) giving you the right answer. However, the individual instances of X in D(X) will avoid giving answers with obvious flaws, since those would lead to losing the debate, which makes it safer than plain OX. In this framing, we need to ask “how safe is debate?”.

Debate with child safety on - only provably safe questions allowed: My intuition is that there will be a class of questions where debate is definitely safe, a class where it is unsafe, and a class where some questions are safe, some unsafe, and we don’t really know which are which. In practice, I would thus aim for identifying an as-large-as-possible class of questions for which we have strong reasons to believe that debate is safe (ideally a proof, but not necessarily). And then I would only allow using debate for these safe questions. In this framing, the question becomes “how strong is provably-safe debate?”.

Many well-meaning “improvements” can break debate: One can consider various modifications to the baseline AISvD proposal. On the one hand, there are improvements like OpenAI’s cross-examination which simplifies exposing inconsistencies in claims, which will hopefully make debate strictly safer. On the other hand, we could imagine modifications such as penalizing both agents if the debate ends up being inconclusive or rewarding both if they both propose a good solution. These would likely be useful to make standard human discussions more productive. However, they will very likely break AI Safety via Debate (AISvD). For example, suppose I notice a fatal-yet-hard-to-explain flaw in the opponent’s proposal and realize that pointing it out would increase my chance of winning the debate by 5%. However, this would also create a 10% chance of confusing the judge. If “confused judge” means a draw, I will point out the flaw. But if the debate has been modified such that “confused judge” = “inconclusive debate” = “we have both lost”, pointing it out becomes counterproductive. Moreover, any deviation from fully-adversarial incentives will likely introduce a tradeoff between pointing out the opponent’s flaws and keeping high the total reward that the two agents split between themselves. When considering whether a modification is helpful or not, I suggest imagining that the debaters are greedy and super-smart humans - if it seems like the modification would create a conflict of interests, one should be wary.

Conclusion

I have argued that if you use debate as a way of assisting a human with verifying the safety of an AI’s answers, it is critical to have fully adversarial rewards. However, there might be other viable applications of argumentation where things that break “normal” debates from AISvD become desirable instead (such as penalizing both AIs if the human becomes confused). I think it makes sense to pursue such applications. However, to avoid confusion (or worse yet, unpleasant AI-surprises), it is important to be explicit about which application one has in mind.

Incidentally, I feel that the interpretation of debate described in this post is the one that people should use by default in relation to AISvD. (Primarily because if you have a different purpose in mind, such as enhancing the judge’s reasoning, I don’t see good arguments for why this type of debate would be the tool to use.) However, I am quite uncertain about this and would love to know the opinion of people who are closer to the centre of the debate-world :-).

This post was heavily inspired by discussions with Nandi Schoots (and benefited from her comments).

1. I focus on this scenario, as opposed to the version where you only assign rewards once you have seen what the advice led to. This alternative has its own flaws, and I think that most of the analysis is insensitive to which of the options we pick. Similarly, I suspect that many of the ideas will also apply to the case where debate simply executes a trained policy instead of doing optimization. ↩︎

Discuss

### If someone you loved was experiencing unremitting suffering (related to a constellation of multi-dimensional factors and processes, those of which include anomalous states of consciousness and an iatrogenic mental health system), what would you think...

4 июля, 2020 - 00:45
Published on July 3, 2020 9:02 PM GMT

Before I get into this more tangibly, I want to clarify that I never intend to make the claim that “I know what would work” or that “I know the way” or that “I blame the institution of medicine for the harm that some incur by engaging with it”. The point that I hope to articulate can be summarized by the following:

Nobody knows (probably) the relative answers to the ill-defined problems that intractable suffering and existential anomalies elicit. I think it would be helpful to acknowledge this unknown terrain so that decision-making powers are not asymmetrically distributed across agents(ex. psychiatrists enact power over others even when clinical uncertainty is extremely high), when the conditions are such that all agents have the same/symmetrical lack of insight.

In light of extreme uncertainty in the case of the chronic catatonic psychosis that my brother experiences (discussed below), institutionalized and bureaucratic entities such as the mental health system are poorly positioned to be useful in his case. My views are informed by my obvious personal experiences in this domain, and also by my professional experiences working in partnership with individuals who are typically constrained (in one way or another) vis a vis the psychiatric or developmental bounds that authoritative professional entities have inscribed upon them. I am willing to wager that, in cases of increasing clinical uncertainty, most institutions of medicine have become so paternal to the extent that, in the name of “keeping people safe”, they are actually perpetuating stagnation and barring many from the pursuit of health, happiness and wellbeing.

In light of this, spaces that employ methodologies endemic to things like engineering and designing, making and tinkering, seem like they could be much more useful here. Again, I don’t know anything for sure, but comparatively, there are some fair claims that I think I can make, and I would risk everything (and be fully responsible for future failures should they happen) to allow for Jules to have the opportunity to try to address this issue some other way. He was recently re-hospitalized after nearly succeeding in taking his life, and in my view, fear of imprisonment was ironically one of the factors that informed his actions.

10 years of the mental health system has resulted in an acceleration of harm to an unconventionally intelligent, deeply honest and wise, compassionate human being. If he were more dishonest, he’d probably fare better in this system, but he is interested in no such social games, and I've never known him to be willing to represent himself in ways that are not true to his internal experiences. His honesty and his deep awareness of suffering (in himself and in all forms of life) are the qualities that both a) attract others to him and b) result in enormous bidirectional fear and hostility when those values are violated. As we know, we live in a deeply imperfect and ignorant world, one that struggles to understand and connect with people like my brother.

My Perspective regarding the problems that need to be addressed:

At the Individual Level:

Experiences of chronic (almost absolutely unremitting) suffering related to auditory verbal hallucinations that are high intensity/frequency and malevolent in nature, cognitive and somatosensory disorganization, lack of ‘normal’ perceptual filter, difficulty or inability to connect with others and the external world outside of his thoughts (presumably due to the relative “volume” of his internal world compared to the external one)

Interaction between Individual and Social Levels:

His extreme distrust and externalized hostility is positively reinforced by (and in turn reinforces) the tendency of other people (usually medical professionals) to control, judge, “help” via coercion, infantilize, “benevolently other”, lie to, disrespect, and/or manipulate him

Social Level Issues:

The psychiatric institutional enterprise is deeply flawed in practice, and in many of the theories (and ways of thinking about these theories) upon which it is inspired.

In the past 10 years, he has been forced to involuntary hospital stays for about 7 (of those 10!) years*. It appears to have had the following impact:

1. **At best, prevented some unknown harm from occurring in the first place.
2. Come at his extreme detriment and the detriment of others in the family. He has experienced:
1. Coercion and constraint (in the form of physical, cognitive, social, economic, spiritual loss of freedom)
2. Active application of bioactive substances that have resulted in physical harms that outweigh ‘supposed’ benefits (those of which are none for him since his psychosis is 'treatment-resistant').

*He has never committed any crime, violent or otherwise, and there is not a lawful grounds for having forced this upon him. I will concede that the reasoning behind this near prison sentence probably comes from some kernel of reasoning: though he has never actually enacted violence (some minor physical aggression toward household objects and rarely, people, but never intentional assault), his anger, hatred, and fear is extremely palpable (and to many--including myself at times--it is viscerally frightening). It should also be noted that part of his hostility likely comes from a long history of being disrespected, infantilized, coerced, and traumatized by so many people (professionals, family, friends, and strangers alike).

** This is on par with the oft cited analogy of staying at-home on your couch all day in order to avoid some unknown, untimely death that you could incur, should you step outside into a more unpredictable world. And what kind of life would that be?

Proposal

Iteratively [design, test, analyze, learn]repeat ‘treatments’ on our own, with those technologies and other tools that we can strategically access.

Much of the clinical world refuses to even consider some of the options that we are toying with (due to some faulty conclusions/dogmatic thinking related to the view that “schizophrenia” is somehow fundamentally different from all other human conditions, and as such, should be treated differently)

Materials and Methods:

1. These consists of psychopharmacological and neuro-technological hard and soft tools that we think could be beneficial.
2. All experimentation would be designed, implemented, and subjected to mixed-methods data collection and analysis in equal partnership with my brother. This means that he does not expose himself to anything that I do not also expose myself to. We would be applying these tools and methods to ourselves, together.
3. These tools all fall under the purview of “cognitive enhancement”.

Psychopharmacological

Amphetamines, empathogens-entactogens (MDMA), methylphenidate,
modafinil, etc..

Neurotechnological

tDCS (transcranial direct current stimulation), tACS (transcranial alternating current stimulation)

Neurofeedback hardware and software such as:

EEG-based hardware with EEG-input--video-output software built on top that is sensitive to upward changes in activity in the parts of the brain that are thought to be related to sustained attention, conceptual analysis and synthesis, etc AND the parts of the brain thought to be related to language acquisition, encoding, and conceptual representation.

I am working on a more descriptive outline that includes a scoping literature review, theoretical proposal/hypothesis, and that explicates the materials/methods that one might use to test such ideas/hypotheses.

As far as I’m concerned, being/becoming human is just one long series of experiments over my/our life course, and I feel pretty strongly that it’s time I act accordingly.

I deeply appreciate any ideas, feedback, strengths/limitations, opportunities for development, and/or risks that you can think of with respect to any of this. I recognize that some appraisals of this may be negative, and that is okay. I am not a perfect thinker or doer and the truth really is that I have no idea what is happening so much of the time, but I am really okay with that. As long as we’re all in this experiment together, and as long as we concede the point that we basically all have the same potential: to deliver good ideas, bad ideas, and everything in between...that kind of interaction and sense-making is priceless.

Sara

Discuss

### [Crowdfunding] LessWrong podcast

3 июля, 2020 - 23:59
Published on July 3, 2020 8:59 PM GMT

meta: experimental post format

There was 6 posts with >100 karma in the last 30 days on LessWrong. I think we could get someone to audiorecord an article for ~25 USD. We would need ~150 USD / month to sustain that.

How much would you pay per month to have LessWrong posts with >100 karma available in audio format? How many episodes would you be ready to record per month? (on a volunteer basis or not?)

If it adds up to more than 6 episode-equivalent (in time or money) per month, I'll go ahead with the project.

We have a professional mic and good readers at the Macroscope, but if someone else would be interested in doing the reading (for money or as a volunteer), let me know (EtA: on this thread)

Total so far: ~1 episode-equivalent

Discuss