Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 48 минут 18 секунд назад

Why a New Rationalization Sequence?

13 января, 2020 - 09:46
Published on January 13, 2020 6:46 AM UTC

This is the first in a five-post mini-sequence about rationalization, which I intend to post one-per-day. And you may ask, why should we have such a sequence?

What is Rationalization and Why is it Bad?

For those of you just tuning in, rationalization is when you take a conclusion you want to reach and try to come up with a argument that concludes it. The argument looks very similar to one in which you started from data, evaluated as well as you could, and reached this conclusion naturally. Almost always similar enough to fool the casual observer, and often similar enough to fool yourself.

If you're deliberately rationalizing for an outside audience, that's out-of-scope for this sequence. All the usual ethics and game theory apply.

But if you're involuntarily rationalizing and fooling yourself, then you've failed at epistemics. And your arts have turned against you. Know a lot about scientific failures? Now you can find them in all the studies you didn't like!

Eliezer wrote the against rationalization sequence back in 2007/8. If you haven't read it, you probably should. It does a good job of describing what rationalization is, how it can happen, and how bad it can be. It does not provide a lot of tools for you to use in protecting yourself from rationalization. That's what I'll be focusing on here.

And, besides, if we don't revisit a topic this important every decade or so with new developments, then what is this community for?

Is There Hope?

Periodically, I hear someone give up on logical argument completely. "You can find an argument for anything," they say, "Forget logic. Trust [ your gut / tradition / me ] instead." Which brushes over the question of whether the proposed alternative is any better. There is no royal road to knowledge.

Still, the question needs answering. If rationalization looks just like logic, can we ever escape Cartesian Doubt?

A common delusion among grandiose schizophrenics in institutions is that they are themselves psychiatrists. Consider a particularly underfunded mental hospital, in which the majority of people who "know" themselves to be psychiatrists are wrong. No examination of the evidence will convince them otherwise. No matter how overwhelming, some reason to disbelieve will be found.

Given this, should any amount of evidence suffice to convince you that you are such a psychiatrist?

I am not aware of any resolution to this paradox.

But the Psychiatrist Paradox is based on an absolute fixed belief and total rationalization as seen in theoretically ideal schizophrenics. (How closely do real-world schizophrenics approximate this ideal? That question is beyond the scope of this document.) Let's consider people a little more reality-affiliated: the dreaming.

Given that any evidence of awakeness is a thing that can be dreamed, should you ever be more than 90% confident you're awake? (Assuming 16 hours awake and 2 dreaming in a typical 24 hour period.)

(Boring answer: forget confidence, always act on the assumption that you're awake because it's erring on the side of safety. We'll come back to this thought.)

(Also boring: most lucid dreaming enthusiasts report they do find evidence of wakefulness or dreaminess which dreams never forge. Assume you haven't found any for yourself.)

Here's my test: I ask my computer to prime factor a large number (around ten digits) and check it by hand. I can dream many things, but I'm not going to dream that my computer doesn't have the factor program, nor will I forget how to multiply. And I can't dream that it factored correctly, because I can't factor numbers that big.

You can't outsmart an absolute tendency to rationalize, but you can outsmart a finite one. Which, I suspect, is what we mostly have.

A Disclaimer Regarding Authorship

Before I start on the meat of the sequence (in the next post) I should make clear that not all these ideas are mine. Unfortunately, I've lost track of which ones are and which aren't, and of who proposed the ones which aren't. And the ones that aren't original to me have still gone through me enough to not be entirely as their original authors portrayed them.

If I tried to untangle this mess and credit properly, I'd never get this written. So onward. If you wish to fix some bit of crediting, leave a comment and I'll try to do something sensible.

Beyond Rationalization

Much of what appears here also applies to ordinary mistakes of logic. I'll try to tag such as they go.

The simplest ideal of thinking deals extensively with uncertainty of external facts, but trusts its own reasoning implicitly. Directly imitating this, when your own reasoning is not 100% trustworthy, is a bad plan. Hopefully this sequence will provide some alternatives.

Discuss

How do you do hyperparameter searches in ML?

13 января, 2020 - 06:45
Published on January 13, 2020 3:45 AM UTC

I know how to do hyperparameter searches. ☺

This is a survey. I want to know is how you do hyperparameter searches. It doesn't matter whether your system is good or bad. I won't judge you. I just want to know what systems other people are using in the real world right now.

Any information you're willing to share would help me out here, but there are two questions I'm especially interested in.

1. What algorithm do you use? (Do you use random search, grid search or Bayes search? Do you do some iterative process? Do you do something else entirely?)
2. Do you cache anything? If so, what's your process?

I'm also curious what industry you're in, but if you're not comfortable sharing that some information is better than none.

Discuss

How has the cost of clothing insulation changed since 1970 in the USA?

13 января, 2020 - 02:31
Published on January 12, 2020 11:31 PM UTC

I sometimes hear the claim that innovation in the physical world has stagnated since around 1970. More specifically, chapter 1 of The Rise and Fall of American Growth by Robert J Gordon claims that there has been basically no innovation in clothing other than changes in fashion. This is somewhat contrary to my intuition (although I definitely believe that innovation in the 50 years before 1970 was greater that in the 50 years after), and price of insulation seems like a relatively objective metric for this.

My favourite type of response would be time series data of clo per inflation-adjusted dollar, but I'd also appreciate people's subjective experience of this.

Discuss

"human connection" as collaborative epistemics

13 января, 2020 - 02:16
Published on January 12, 2020 11:16 PM UTC

Surely there are all kinds of other ways to cooperate. A friend can help you move your stuff. You can exchange gifts. You can fend for each other. But objectively none of these are worth the huge chunk of resources we allocate to maintaining friendships and relationships.

Only the upgrades to your worldview you get from interacting with other people is worth the trouble of interacting.

Collaborative epistemics is mostly divide and conquer.

The world is way, way too complex for one mind to make sense of. So instead of diving deeply into every single aspect of life, we borrow the results of other people's thinking. Then for a small slice, maybe 5%, we think for ourselves. This is the value we offer in exchange for the sense-making others do in other places.

Consider a hypothesis space, and two agents that are seeking an answer to a problem. Their best strategy is to carve the space at it's joints, and specialize into searching among the hypotheses of their respective subsets. When one agent finds an answer, they can communicate it to the other at low cost. It's almost double value for money.

After divide and conquer comes reconciliation.

If the agents don't have the same starting assumptions, they will disagree on the hypothesis set to search. To each one of them it might not seem so worthwhile to cooperate, since the other agent will just check hypotheses that they already deem false.

I recall that one of the most predictive variables of friendship is whether two people share the same general memespace. If you don't even believe or understand things that are fundamental to my worldview, I can't trust the rest of your ideas either. I'd have no use of your perspective on life, so I'd have no use for your friendship.

The other predictive variable for friendship was whether two people tended to be in the same environment. Even if we share the same worldview, if we're not grappling with the same problems, there's no point in comparing notes. You could be telling me stories about how you mastered the tuba, but I really wanted stories about how you came to terms with polyamory, or how you managed to find a cheap house in Amsterdam, or anything else I'm presently dealing with.

Or, after divide and conquer comes loneliness.

I like to imagine that we were all born with the same model of the world. Then we went forth and carved up the world, and we specialized, and we developed many useful models, but we never got around to merge our models back into one coherent worldview. So we just called the models ideologies and starting angrily poking at each other instead.

But there's a step further than this, where you've thought for so long and hard that you look around and find that no one's left in earshot. We call this lonely dissent.

You could assume that lonely dissent is one type of loneliness, but I started to see it as the only type of loneliness. Loneliness can be defined as epistemic dissent.

Most people are smart enough to avoid it. They don't update in the face of evidence, because they're afraid of disenchantment from their beloved communities. And for good reason! How many of us have lost touch with friends because we felt they couldn't see the world our way?

Reconciliation is the bottleneck

We think, and we learn, and we update, and we leave our friends behind, until we're left with a sense of alienation and depression, and the whole endeavor comes to a screeching halt.

Bar the lone soul on a heroic dissent, I don't think most of us are able to keep meaningfully developing our worldview if there is no one to enthusiastically share our findings with.

This is why I feel like the most important aspect of the rationalist project is the part where we develop the culture and the techniques that speed up reconciliation.

Think double crux. Think Ideological Turing Test. Think good faith principle. Think optimizing research culture.

With this in mind, LW Netherlands will emphasize reconciliation in their meetups. Consider doing the same in your own life!

Discuss

13 января, 2020 - 00:59
Published on January 12, 2020 9:59 PM UTC

(This is part six in a sequence on Machine Learning based on this book. Click here for part 1.)

Stochastic Gradient Descent is the big Machine Learning technique. It performs well in practice, it's simple, and it sounds super cool. But what is it?

Roughly, Gradient Descent is an approach for function minimization, but it generally isn't possible to apply it to Machine Learning tasks. Stochastic Gradient Descent, which is Gradient Descent plus noise, is a variant which can be used instead.

Let's begin with regular Gradient Descent.

∇f(x)=(∂f∂x1(x),...,∂f∂xd(x))

So the gradient at a point is an element of Rd. In contrast, the gradient itself (not yet applied to a point) is the function ∇f:Rd→Rd defined by the above rule. Going forward, gradient always means "gradient at a point".

If d=1, then the gradient will be f′(x), a number in R. It can look like this:

Or like this:

Or also like this:

Point being, if f′(x)>0 it'll point rightward and if f′(x)<0 it'll point leftward, also it can have different lengths, but that's it. The idea of gradient descent is that the gradient points into the direction of fastest positive change, thus the opposite of the gradient is the direction of fastest negative change. It follows that, to minimize a function, one can start anywhere, compute the gradient, and then move into the opposite direction.

In the above example, the function goes up, therefore the derivative is positive, therefore the gradient points rightward. Gradient Descent tells us to go into the opposite direction, i.e. leftwards. Indeed, leftward is where the function decreases. Clearly, this is a display of utter brilliance.

Importantly, note that the picture is misleading insofar as it suggests there are more directions than two. But actually, the gradient lives in the domain space, in this case, R, not in the Cartesian product of domain space and target space, in this case, R2. Therefore, it cannot point upward or downward.

As silly as the one-dimensional case is, it quickly becomes less trivial as we increase the dimension. Consider this function (picture taken from Wikipedia):

Again, the gradient lives in the domain space, which is R2, as this image illustrates nicely. Thus it cannot point upward or downward; however, it can point into any direction within the flat plane. If we look at a point on this function, it is not immediately obvious in which direction one should move to obtain the fastest possible decrease of its function value. But if we look at the little arrows (the gradients at different domain points), they tell us the direction of the fastest positive change. If we reverse them, we get the direction of the fastest negative change.

The following is important to keep in mind: in the context of Machine Learning, the domain space for our loss function is the space of all possibe hypotheses. So the domain is itself a function space. What we want to do is to minimize a function (a loss function) that takes a function (a predictor, properly parametrized), as an argument, and outputs the performance of that predictor on the real world. For example, if we allow predictors of the form h(x)=ax2+bx+c for a regression problem, i.e. X=Y=R, our function space could be H=R3, where each (a,b,c)∈H defines one predictor h(a,b,c). Then, there is some well-defined number ℓ((a,b,c)) that evaluates the performance of h(a,b,c) in the real world. Furthermore, if ℓ is differentiable, there is also some well-defined three-dimensional vector ∇ℓ((a,b,c)). The vector, which lives in H, tells us that the real error increases most quickly if we change the predictor in the direction of the gradient – or equivalently, that it decreases most quickly if we change the predictor in the direction opposite to the gradient. This is why we will refer to our elements of H by the letters a,b,c,a,b,c and such; not x or x.

If the loss function were known, one could thus minimize it by starting with an arbitrary predictor parametrized by a(0), computing the gradient ∇ℓ(a(0)), and then "moving" into the direction opposite to the gradient, i.e. setting a(1):=a(0)−&#x3B7;⋅∇ℓ(a(0)), where &#x3B7;∈R+ is a parameter determining the step size. There is no telling how long moving into the direction opposite to the gradient will decrease the function, and therefore the choice of &#x3B7; is nontrivial. One might also decrease it over time.

But, of course, the loss function isn't known, which makes regular Gradient Descent impossible to apply.

As always, we use the training sequence S=((x1,y1),...,(xm,ym)) as a substitute for information on the real error, because that's all we have. Each element (x,y) defines the point-based loss function ℓ(x,y):H→R, which we can actually use to compute a gradient. Here, H has to be some familiar set – and to apply the formal results we look at later, it also has to be bounded, so think of H=Bd(0,M)={x∈Rd|||x||≤M}.

Now we do the same thing as described above, except that we evaluate the performance of our predictor at only a single point at a time – however, we will use a different point for each step. Thus, we begin with some a(0)∈H which defines a predictor ha(0), and we will update it each step so that, after step t, we have the predictor ha(t). To perform step t, we take the loss function ℓ(xt,yt):H→R which maps each a∈H onto the error of the predictor ha on (xt,yt). We compute the gradient of this loss function at our current predictor, i.e. we compute ∇ℓ(xt,yt)(a(t−1)). Then we update a(t−1) by doing a small step in the direction opposite to this gradient. Our complete update rule can be expressed by the equation

a(t):=a(t−1)−&#x3B7;⋅∇ℓ(xt,yt)(a(t−1))

The reason why, as the book puts it, Stochastic Gradient Descent can be used to "minimize the loss function directly" is that, given the i.i.d. assumption, the point (xt,yt) is an unbiased estimate of the real distribution, and therefore, one can prove that the expected value – with respect to the point (xt,yt) – of the gradient we compute equals the gradient of the real loss function ℓ. Thus, we won't always update into the right direction, but we will update into the right direction + noise. The more steps one does, the lower the expected total noise will be, at least if &#x3B7; is decreased over time. (This is so because, if one does random steps from some origin, then as the number of steps increases, even though the expected absolute distance from the origin will increase, the expected relative distance decreases. The expected relative distance is the term expected absolute distancenumber of steps, which is an analog to expected net total noisenumber of steps in our case. Then, since we have this consistent movement into the correct direction that grows linearly with "number of steps", the term expected net total noiseexpected net total movement also decreases, making us likely to converge toward the solution.)

In the context of supervised learning, i.e. where training data is available at all, the only things that needs to hold in order for Stochastic Gradient Descent to be applicable is that (a) the hypothesis class can be represented as a familiar set, and (b) we can compute gradients on the point-based loss functions (in practice, they don't even necessarily need to be differentiable at every point). The remaining question is whether or not it will perform well. The most important property here is certainly the convexity of the loss function because that is what guarantees that the predictor will not get stuck in a local minimum, which could otherwise easily happen (the negative gradient will just keep pushing us back toward that local minimum).

In the previous chapter, we introduced the classes of convex Lipschitz bounded and convex smooth bounded problems. For both of these, one can derive upper-bounds on the expected error of a classifier trained via stochastic gradient descent for a certain number of steps. In the remainder of this chapter, we will do one of those proofs, namely the one for convex smooth bounded problems. This is an arbitrary choice; both proofs are technical and difficult, and unfortunately also quite different, so we're not doing both.

Deriving an upper-bound on the Error

The proof which upper-bounds the expected real error is of the kind that I usually don't work out in detail because it just doesn't seem worth it. On the other hand, I don't want difficult proofs to be excluded in principle. So this will be an experiment of how my style of dissecting each problem until its pieces are sufficiently small works out if applied to a proof that seems to resist intuitive understanding. In any case, working through this proof is not required to understand Stochastic Gradient Descent, so you might end this chapter here.

Now let us begin.

The plan

Recall that we wish are given a problem out of the class of convex smooth bounded problems (with parameters &#x3B2;,M), and we run the algorithm ASGD defined earlier in this post on our training sequence, S. We wish to bound the (expected) real error of ASGD(S).

Meet the problem instance:

Of course, this is merely the problem instance within the images-world; the symbolic world where the actual proof happens will make no assumptions about the nature of the problem. That said, in the problem instance, we have d=1 and m=2, because that's enough to illustrate the steps. Our training sequence (not shown!) consists of two elements, i.e. S=((x0,y0),(x1,y1)), this time indexed from 0 to avoid having to write a lot of "−1". For our hypothesis class, think of H=[−10,10], where each a∈H defines a simple linear predictor ha – although going forward, we will pretend as if a itself is the predictor. What the picture does show is the predictors a(0)→a(1)→a(2) through which the algorithm iterates (the highest one is a(0), the second-highest a(1), and the one without a gradient arrow is a(2)). They live in 1-dimensional space (on the x-axis). Their real error corresponds to their position on the y-axis. Thus, both the predictors themselves and their errors are monotonically decreasing. The green arrows denote the gradients ∇ℓ(x0,y0)(a(0)) and ∇ℓ(x1,y1)(a(1)). The filled point denotes the optimal predictor a∗.

For unspecified reasons – probably because it makes analysis easier, although it could also reduce variance in practice – the precise algorithm we analyze puts out the predictor 1m∑m−1i=0a(i) rather than a(m), even though a(m) naively looks like the better choice. In our case, that means we put out 12(a(0)+a(1)). With that in mind, now meet the proof roadmap:

The proof roadmap illustrates what terms we're concerned with (red), and which upper-bounds we're deriving. Dashed lines mean a small (<1) factor in front of the term; fat lines mean a large (>1) factor. In the beginning, we look at the difference between the error of our output predictor 12(a(0)+a(1)) and that of the optimal predictor a∗ (leftmost picture). Then we do three steps; in each step, we bound our current term by a different term. First, we bound it in terms of the inner product between the predictors themselves and the gradients. Then, we bound that by &#x3B4; times the norm of the gradients plus many times the norm of the optimal predictor, where &#x3B4;<1. Then we bound that by &#x3B4; times the total error of our output predictor plus a lot of times the norm of the optimal predictor.

At that point, we've made a circle. Well – not quite, since we started with the difference between the error of the output predictor and that of the optimal predictor (hence why the lines don't go all the way down in picture one) and ended with the total error of the output predictor. But it's good enough. Through algebraic manipulations (adding the error of the optimal predictor on both sides and rescaling), we obtain a bound on the error of our output predictor, in terms of the error of the optimal predictor and the norm of the optimal predictor, i.e.:

And the equation corresponding to this picture will be the theorem statement.

Now we just have to do the three steps, and then the algebraic stuff at the end.

Step 1

We start by bounding the term ∑T−1t=0[ℓ(xt,yt)(a(t))−ℓ(a∗)], which is closely related to but not identical to the real error of the output predictor (it is defined in terms of the point-based loss functions rather than the real error). T will be the number of steps, so T=m. Recall the proof roadmap:

The first stop (second picture) is the term ∑T−1t=0⟨a(t)−a∗,∇t⟩, where we write ∇t for ∇ℓ(xt,yt)(a(t)). This bound applies for each summand individually, and it relies on a fundamental property of convex functions, namely that every tangent remains below the function:

This should be easy enough to believe; let's skip the formal proof and continue.

Step 2

Our current term is ∑T−1t=0⟨a(t)−a∗,∇t⟩. Recall the proof roadmap:

The next stop (third picture) is the term 12&#x3B7;||a∗||2+&#x3B7;2∑T−1t=0||∇t||2. This will be the hardest step – let's begin by reflecting on what it means.

We wish to bound the inner product of predictors and corresponding gradients with the norm of the gradients and the norm of the final predictor. So what we lose is the values of the predictors themselves. However, these are implicit in the gradients, since they encode how we move toward our final predictor (or equivalently, how we move away from the final predictor). Consequently, we will need to use the update rule, a(t+1)=a(t)−&#x3B7;∇t, to prove this result.

Our approach will be to reduce the inner product ⟨a(t)−a∗,∇t⟩ to the difference between ||a(t)−a∗||2 and ||a(t+1)−a∗||2=||a(t)−&#x3B7;∇t−a∗||2. Alas,

||a(t)−a∗||2−||a(t)−&#x3B7;∇t−a∗||2=2⟨a(t),&#x3B7;∇t⟩−2⟨a∗,&#x3B7;∇t⟩−⟨&#x3B7;∇t,&#x3B7;∇t⟩

Or differently written, ⟨a(t)−a∗,2&#x3B7;∇t⟩−&#x3B7;2||∇t||2. Thus, we have

⟨a(t)−a∗,∇t⟩=12&#x3B7;⟨a(t)−a∗,2&#x3B7;∇t⟩=12&#x3B7;(||a(t)−a∗||2−||a(t+1)−a∗||2+&#x3B7;2||∇t||2)

Then, ∑T−1t=0⟨a(t)−a∗,∇t⟩=12&#x3B7;∑T−1t=1||a(t)−a∗||2−||a(t+1)−a∗||2+&#x3B7;2||∇t||2, and the first two elements of this are a telescopic sum, i.e. each segment negates part of the next segment. What remains is 12&#x3B7;(||a(0)−a∗||2−||a(T)−a∗||2+∑T−1t=0&#x3B7;2||∇t||2), and we can upper bound it as 12&#x3B7;||a∗||2+∑T−1t=0&#x3B7;2||∇t||2, which is the bound we wanted to have. Here, the a(0) disappeared because the algorithm assumes we start with a(0)=0, which I conveniently ignored in my drawings.

Step 3

Our current term is 12&#x3B7;||a∗||2+&#x3B7;2∑T−1t=0||∇t||2. Recall the proof roadmap:

The next stop (last picture) is the term 12&#x3B7;||a∗||2+&#x3B2;⋅&#x3B7;∑T−1t=0ℓ(xt,yt)(a(t)). This one will also work for each summand separately. We wish to prove that

||∇t||2=||∇ℓ(xt,yt)(a(t))||2≤2&#x3B2;⋅ℓ(xt,yt)(a(t)),

where &#x3B2; is from the smoothness definition of our point-wise loss function. So this bound is all about smooth functions. It even has a name: functions with this property are called self-bounded, because one can bound the value of the gradient in terms of the value of the same function.

If one were to prove this formally, one would do so for arbitrary smooth functions, so one would prove that, if O⊆Rd, where O is convex, and f:O→R is a &#x3B2;-smooth and nonnegative function, then

||∇f(x)||2≤2&#x3B2;⋅f(x)∀x∈O

Recall that f is &#x3B2;-smooth iff ||∇f(x)−∇f(y)||≤&#x3B2;||x−y|| for all x and y. Now, why is this statement true? Well, if we imagine f to model the position of a space ship, then smoothness says that f cannot accelerate too quickly, so in order to reach a certain speed, it will gradually have to accelerate towards that point and thus will have to bridge a certain distance. Let's take the one-dimensional case, and suppose we start from 0 and consistently accelerate as fast as we're allowed to (this should be the optimal way to increase speed while minimizing distance). Then f(x)=0 and f′′(x)=&#x3B2;, which means that f′(x)=&#x3B2;x and f(x)=12&#x3B2;x2; then f′(x)2=(&#x3B2;x)2=2&#x3B2;⋅12&#x3B2;x2=2&#x3B2;f(x). This is reassuring, because we've tried to construct the most difficult case possible, and the statement held with equality.

The formal proof, at least the one I came up with, is quite long and relies on line integrals, so we will also skip it.

The algebraic stuff

We have established that

∑T−1t=0[ℓ(xt,yt)(a(t))−ℓ(a∗)]≤12&#x3B7;||a∗||2+&#x3B2;⋅&#x3B7;∑T−1t=0ℓ(xt,yt)(a(t))

which can be reordered as

∑T−1t=0ℓ(xt,yt)(a(t))≤12&#x3B7;||a∗||2+&#x3B2;⋅&#x3B7;∑T−1t=0ℓ(xt,yt)(a(t))+∑T−1t=0ℓ(a∗).

At this point, we better have that &#x3B2;&#x3B7;<1, otherwise, this bound is utterly useless. Assuming &#x3B7;&#x3B2;<1 does indeed hold, we can further reformulate this equation as

(1−&#x3B2;&#x3B7;)∑T−1t=0ℓ(xt,yt)(a(t))≤12&#x3B7;||a∗||2+T⋅ℓ(a∗)

Rescaling this, we obtain

∑T−1t=0ℓ(xt,yt)(a(t))≤1(1−&#x3B2;&#x3B7;)(T⋅ℓ(a∗)+12&#x3B7;||a∗||2)

Dividing this by T to have the left term closer to our actual error, we get

1T∑T−1t=0ℓ(xt,yt)(a(t))≤1(1−&#x3B2;&#x3B7;)(ℓ(a∗)+12&#x3B7;T||a∗||2).

Now we take expectations across the randomization of the training sequence S on both sides. Then the left side becomes the expected real error of our output predictor ASGD(S)=1m∑m−1i=0a(i). The right term doesn't change, because it doesn't depend on S. Thus, we have derived the bound

E(ASGD(S))≤1(1−&#x3B2;&#x3B7;)(ℓ(a∗)+12&#x3B7;T||a∗||2)

and this will be our final result. By choosing &#x3B7; correctly and rearranging, it is possible to define a function s∗ such that, for any &#x3B2; and M upper-bounding the hypothesis class (||a∗|| cannot be an input since we don't know the optimal predictor a∗) and any arbitrarily small &#x3F5;∈R+, if T≥s∗(&#x3B2;,M,&#x3F5;), then the above is upper-bounded by ℓ(a∗)+&#x3F5;. To be specific, s∗ will be given by s∗(&#x3B2;,M,&#x3F5;)=12⋅M2⋅&#x3B2;⋅1&#x3F5;2.

Reflections

Recall that &#x3B2; is the parameter from the smoothness of our loss function, T is the number of training steps we make (which equals the number of elements in our training sequence), and &#x3B7; is the step size, which we can choose freely. In the result, we see that a smaller &#x3B2; is strictly better, a larger T is strictly better, and &#x3B7; will have its optimum at some unknown point. This all makes perfect sense – if things had come out any other way, that would be highly surprising. Probably the most non-obvious qualitative property of this result is that the norm of the optimal predictor plays the role that it does.

The most interesting aspect of the proof is probably the "going around in a circle" part. However, even after working this out in detail, I still don't have a good sense of why we chose these particular steps. If anyone has some insight here, let me know.

Discuss

Update on Ought's experiments on factored evaluation of arguments

13 января, 2020 - 00:20
Published on January 12, 2020 9:20 PM UTC

Ought has written a detailed update and analysis of recent experiments on factored cognition. These are experiments with human participants and don’t involve any machine learning. The goal is to learn about the viability of IDA, Debate, and related approaches to AI alignment. For background, here are some prior LW posts on Ought: Ought: Why it Matters and How to Help, Factored Cognition presentation.

Here is the opening of the research update:

Evaluating Arguments One Step at a Time We’re studying factored cognition: under what conditions can a group of people accomplish complex cognitive tasks if each person only has minimal context?In a recent experiment, we focused on dividing up the task of evaluating arguments. We created short, structured arguments for claims about movie reviews. We then tried to distinguish valid from invalid arguments by showing each participant only one step of the argument, not the review or the other steps.In this experiment, we found that:1. Factored evaluation of arguments can distinguish some valid from invalid arguments by identifying implausible steps in arguments for false claims.2. However, experiment participants disagreed a lot about whether steps were valid or invalid. This method is therefore brittle in its current form, even for arguments which only have 1–5 steps.3. More diverse argument and evidence types (besides direct quotes from the text), larger trees, and different participant guidelines should improve results.In this technical progress update, we describe these findings in depth.

The rest of the post is here.

Discuss

How would we check if "Mathematicians are generally more Law Abiding?"

12 января, 2020 - 23:23
Published on January 12, 2020 8:23 PM UTC

In Local Validity, Eliezer notes:

But I would venture a guess and hypothesis that you are better off buying a used car from a random mathematician than a random non-mathematician, even after controlling for IQ. The reasoning being that mathematicians are people whose sense of Law was strong enough to be appropriated for proofs, and that this will correlate, if imperfectly, with mathematicians abiding by what they see as The Law in other places as well. I could be wrong, and would be interested in seeing the results of any study like this if it were ever done. (But no studies on self-reports of criminal behavior, please. Unless there's some reason to believe that the self-report metric isn't measuring "honesty times criminality" rather than "criminality".)

I'm guessing such a study hasn't been done, but it seems like the sort of thing you should be able to actually go and check.

I'm interested in both:

• What is the expensive, impractical study you could hypothetically run that would give strong evidence about this question?
• What's the cheapest, practical study we could  run that would provide at least some meaningful data about it?

Discuss

What long term good futures are possible. (Other than FAI)?

12 января, 2020 - 21:04
Published on January 12, 2020 6:04 PM UTC

Does anyone know any potential long term futures that are good, and do not involve the creation of a friendly super-intelligence.

To be clear, long term future means billion years+. In most of my world models, we settle into a state from which it is much easier to predict the future within the next few hundred years. (Ie a state where it seems unlikely that anything much will change)

By good, I mean any future that you would prefer to be in if you cared only about yourself, or would be replaced with a robot that would do just as much good here. A weaker condition would be any future that you would want not to be erased from existence.

A superintelligent agent running around doing whatever is friendly or moral or whatever would meet these criteria, I am excluding it because I already know about that possibility. Your futures may contain Superintelligences that aren't fully friendly. A superintelligence that acts as a ZFC oracle is fine.

Your potential future doesn't have to be particularly likely, just remotely plausible. You may assume that a random 1% of humanity reads your reply and goes out of their way to make that future happen. Ie people optimizing for this goal can use strategies of the form "someone does X" but not "everyone does X". You can get "a majority of humans does X" if X is easy to do and explain and most people have no strong reason not to X.

You should make clear what stops somebody making a UFAI which goes on to destroy the world. (Eg paperclip maximizer)

What stops Moloch, what stops us trashing away everything of value in to win competitions? (Hansons Hardscrabble frontier replicators.)

Discuss

Malign generalization without internal search

12 января, 2020 - 21:03
Published on January 12, 2020 6:03 PM UTC

In my last post, I challenged the idea that inner alignment failures should be explained by appealing to agents which perform explicit internal search. By doing so, I argued that we should instead appeal to the more general concept of malign generalization, and treat mesa-misalignment as a special case.

Unfortunately, the post was light on examples of what we should be worrying about instead of mesa-misalignment. Evan Hubinger wrote,

Personally, I think there is a meaningful sense in which all the models I'm most worried about do some sort of search internally (at least to the same extent that humans do search internally), but I'm definitely uncertain about that.

Wei Dai expressed confusion why I would want to retreat to malign generalization without some sort of concrete failure mode in mind,

Can you give some realistic examples/scenarios of “malign generalization” that does not involve mesa optimization? I’m not sure what kind of thing you’re actually worried about here.

In this post, I will outline a general category of agents which may exhibit malign generalization without internal search, and then will provide a concrete example of an agent in the category. Then I will argue that, rather than being a very narrow counterexample, this class of agents could be competitive with search-based agents.

The switch case agent

Consider an agent governed by the following general behavior,

It's clear that this agent does not perform any internal search for strategies: it doesn't operate by choosing actions which rank highly according to some sort of internal objective function. While you could potentially rationalize its behavior according to some observed-utility function, this would generally lead to more confusion than clarity.

However, this agent could still be malign in the following way. Suppose the agent is 'mistaken' about the state of the world. Say that it believes that the state of the world is 1, whereas the actual state of the world is 2. Then it could take the wrong action, almost like a person who is confident in a falsehood and makes catastrophic mistakes because of their error.

To see how this could manifest as bad behavior in our artificial agents, I will use a motivating example.

The red-seeking lunar lander

Suppose we train a deep reinforcement learning agent on the lunar lander environment from OpenAI's Gym.

We make one crucial modification to our environment. During training, we make it so the landing pad is always painted red, and this is given to the agent as part of its observation of the world. We still reward the agent like normally for successfully landing in a landing pad.

Suppose what really determines whether a patch of ground is a landing pad is whether it is enclosed by two flags. Nevertheless, instead of picking up on the true indicator of whether something is a landing pad, the agent may instead pick up the proxy that held during training -- namely, that landing pads are parts of the ground that are painted red.

Using the psuedocode earlier and filling in some details, we could describe the agent's behavior something like this. LOOP:State = GetStateOfWorld(Observation)IF State == RedIsToTheLeft:ApplyLeftThruster(45%)ApplyRightThruster(50%)IF State == RedIsToTheRight:ApplyLeftThruster(50%)ApplyRightThruster(45%)IF State == RedIsDirectlyBelow:ApplyLeftThruster(35%)ApplyRightThruster(35%)END_LOOP

During deployment, this could end catastrophically. Assume that some crater is painted red but our landing pads is painted blue. Now, the agent will guide itself competently towards the crater and miss the real landing pad entirely. That's not what we wanted.

(ETA: If you think I'm using the term 'catastrophically' too loosely here, since the agent actually lands safely in a crater rather than crashing into the ground, we could instead imagine a lunar vehicle which veers off into the red crater rather than just sitting still and awaiting further instruction since it's confused.)

What made the agent become malign

Above, I pointed to the reason why agents like ours could be malign. Specifically, it was 'mistaken' about what counted as a landing pad. However, it's worth noting that saying the agent is mistaken about the state of the world is really an anthropomorphization. It was actually perfectly correct in inferring where the red part of the world was -- we just didn't want it to go to that part of the world. We model the agent as being 'mistaken' about where the landing pad is, but it works equally well to model the agent as having goals that are counter to ours.

Since the malign failure doesn't come from a pure epistemic error, we can't merely expect that the agent will self-correct as it gains more knowledge about the world. Saying that it is making an epistemic mistake is just a model of what's going on that helps us interpret its behavior, and it does not imply that this error is benign.

Imagining more complex agents

But what's to worry about if this sort of thing only happens in very simple agents? Perhaps you think that only agents which perform internal search could ever reach the level of competence required to perform a real-world catastrophe?

I think that these concerns about my example are valid, but I don't believe they are compelling. As a reply, I think the general agent superstructure I outlined in the initial pseudocode could reach very high levels of competence.

Consider an agent that could, during its operation, call upon a vast array of subroutines. Some of these subroutines can accomplish extremely complicated actions, such as "Prove this theorem: [...]" or "Compute the fastest route to Paris." We then imagine that this agent still shares the basic superstructure of the pseudocode I gave initially above. In effect, the agent has an outer loop, during which it takes in observations from the real world, and outputs action sequences depending on which state of the world it thinks its in, and using the subroutines it has available.

Since the subroutines are arbitrarily complex, I don't think there is any fundamental barrier for this agent to achieve high levels of competence in the real world. Moreover, some subroutines could themselves perform powerful internal searches, pretty clearly obviating the competitive advantage that explicit search agents offer.

And even while some subroutines could perform powerful internal searches, these subroutines aren't the only source of our malign generalization concern. The behavior of the agent is still well-described as a switch-case agent, and this means that the failure mode of the agent being 'mistaken' about the state of the world remains. Therefore, it's inaccurate to say that the source of malign generalization must come from an internal search being misaligned with the objective function we used during training.

Discuss

Beantown Stomp Registration Update

12 января, 2020 - 18:10
Published on January 12, 2020 3:10 PM UTC

Beantown Stomp is a bit over two months away, and here's what registrations look like:

The way these things usually work is that there's an initial wave of registrations when things open, then there's a long period of gradual registrations before at some point people start thinking "this is going to sell out soon!" and there's a second wave.

I'm not sure when that will happen, but if you've been meaning to register better to do it sooner!

Of the 173 people who have registered so far, 91 have paid. We're doing sliding scale admission:

Last year we asked for $50-$150/person, and did a little better than break-even. We're planning to admit more dancers this year, so we think we're able to move the price a bit lower, to a sliding scale of $30-$130. If you're able to pay toward the higher end, that's what makes it possible for us to offer admission to others below cost. We also have volunteer spots available.

Here's what people have been paying so far:

This is a bit higher than we've been expecting, which is helpful! Though it's also possible that being able to register and pay sooner correlates with being able to pay more?

Overall I think we're in good shape, and I'm looking forward to the weekend!

Discuss

How to Identify an Immoral Maze

12 января, 2020 - 15:10
Published on January 12, 2020 12:10 PM UTC

Figuring out what parts of what organizations are Immoral Mazes (hereafter mazes), and to what extent, is both important and hard to get exactly right.

What is easier is using simple heuristics to get a good approximation, then keeping an eye out for and updating on new evidence.

I offer seven heuristics, the first two of which will do the bulk of the work on their own. You benefit from the ‘right’ answer to all of them even absent concerns about mazes, so they are good questions to get into the habit of asking.

1. How many levels of hierarchy exist?

Full mazes require at least three levels of hierarchy, without which one cannot have middle management.

Each level beyond that makes things worse. The fourth and fifth levels both make things much worse.

With only one level, there’s nothing to worry about.

With only two levels, a boss and those who report to the boss, the boss has skin in the game, no boss causing problems for them, and not enough reason to reward bad outcomes.

With three levels, there are middle managers in the second layer, so one should be wary. But things are unlikely to be too bad. No middle manager has a boss or underling who is also a middle manager. This means that in any interaction between non-equals either involves the head of the company, or it involves someone ‘on the line’ who doesn’t have anyone reporting to them, and must deal with object-level reality. Either of them has reason to keep things grounded. Since there is only one person at the top, every conversation includes someone who interacts regularly with object level reality.

With four levels, we start to have interactions between middle managers in charge of each other. These dynamics start to get serious, but everyone still interacts with someone on the top or bottom.

At five levels, we have people who never interact directly with either the boss or anyone dealing with the object level.

At six levels, those people interact with each other.

And so on.

Meanwhile, the boss has less and less need or ability to comprehend the object level, and we get more and more problems with lack of skin in the game, which is question two.

At least one of the corporations in Moral Mazes had more than twenty ranks. That is way, way too many. By that point, it would be surprising if you weren’t doomed. I have actual no idea how to have twenty ranks and keep things sane.

Note that those outside the company, such as investors or regulators, seem like they should effectively count as a level under some circumstances, but not under others.

As a spot check, I looked back on the jobs I’ve had. This matches my experience.

Most impressive is that I can observe what happened when several of those jobs added new layers of hierarchy. This led in every case to traceable ways to additional maze-like behavior. In every case, that made life much worse for me and other employees, and hurt our productivity. In one case I was running the company at the time, and it still happened.

I would be very wary of any organization that had four levels of hierarchy. I would be progressively more skeptical of any organization with more than that, to the point of assuming it was a maze until proven otherwise.

2. Do people have skin in the game?

Skin in the game is a robust defense against mazes, if it can be distributed widely enough and in the right ways. That can be tough. There’s only 100% total equity to go around.

One can only reward what can be observed or often only what can be quantified and measured. Something about Goodhart’s Law, and so on. The problem with levels of hierarchy and middle management is in large part a problem of inability to provide skin in the game.

For sufficiently large organizations, as described in Moral Mazes, skin in the game is not so much spread thin as deliberately destroyed. The successful keep enough momentum to run away from the consequences of their problems. This alone is fatal.

If an organization has solved these problems for real, it likely isn’t a maze.

If an organization lacks skin in the game and also has many levels of hierarchy, you’re almost certainly dealing with a maze.

If it lacks skin in the game but also lacks levels of hierarchy, maze levels can differ. But also keep in mind that lack of skin in the game causes a whole host of problems. Only some of those are the problems of mazes. Detailing these issues is beyond the scope here, but be highly skeptical whenever skin in the game is lacking.

3. Do people have soul in the game?

What’s better than having skin in the game? Having soul in the game. Caring deeply about the outcome for reasons other than money, or your own liability, or being potentially scapegoated. Caring for existential reasons, not commercial ones.

Soul in the game is incompatible with mazes. Mazes will eliminate anyone with soul in the game. Therefore, if the people you work for have soul in the game, you’re safe. If you have it too, you’ll be a lot happier, and likely doing something worthwhile. Things will be much better on most fronts.

It’s worth prioritizing soul in the game, above and beyond skin in the game.

4. How do people describe their job when you ask?

Remember this quote:

When managers describe their work to an outsider, they almost always first say: “I work for [Bill James]” or “I report to [Harry Mills]” or “I’m in [Joe Bell’s] group,”* and only then proceed to describe their actual work functions. (Location 387, Quote 2)

You want them to say almost anything else. Anything that does not make you recoil in horror a different way. Hopefully something worthwhile and interesting. I don’t know how good this rule is, but I suspect it’s quite powerful.

5. Is there diversity of skill levels? Is excellence possible and rewarded?

The belief that all middle managers have the same skills, and are all equally capable of doing any managerial job aside from the politics involved, is a lot of what makes mazes so bad. If there is no good reason to diverge from standard practice, if everybody knows that you cannot do better, then any divergence is blameworthy, and shows you are not doing your job. There’s no need to ask why, or what advantages it might have.

It also all but ensures the wrong answer to the next question.

6. Is there slack?

A world without slack is not a place one wants to be. Mazes systematically erase all slack. Slack is evidence of not being fully committed, and given that everyone’s skills are equal and competition is perfect, holding anything back means losing even if undetected.

7. Pay Attention

Sounds silly, but it works. Observe people and what they do and how they do it. If you work in a maze for long enough, you’re not going to shout it from the rooftops, but every sentence you speak will reflect it.

And as always, when people tell you who they are, believe them.

Other Notes

These questions do not differentiate between corporations, non-profits, governments, parties, clubs or other organizational forms. That’s not a good indicator. Corporations are only the original observed case.

Asking how proposed or expected changes will change the answers to these questions is a good way to know if those changes will raise the maze level of an organization.

Like most puzzles, there are multiple solutions, and the pieces reinforce each other. Most of the time, hardcore mazes will give alarm-bell level answers to all seven heuristics.

Are there any other good simple heuristics?

Next is How to Work With Moral Mazes, providing my best advice in detail to those dealing with the threat of mazes on a personal level.

Discuss

Key Decision Analysis - a fundamental rationality technique

12 января, 2020 - 08:59
Published on January 12, 2020 5:59 AM UTC

The technique

This post is signal-boosting and recommending a strategy for improving your decision making that I picked up from the entrepreneur Ivan Mazour. He describes the process here, and publishes his own results every year on his blog.

In his words...

I believe that life is far too fast-paced for us to be able to make rational, carefully thought-out, decisions all the time. This completely contradicts my mathematical upbringing and training, but is something I have come to realise throughout my twenties. We need a way of keeping up with the constant barrage of decisions, even when the inevitable ‘decision fatigue’ sets in. The only way to do this, I find, is to act on instinct, but this only works if your instincts are correct. They cannot be correct all the time, of course, but if we can maximise the chance of making the right decision by instinct, then we have a strategy for coping with a complicated and highly productive life.To sharpen my instincts, I keep a monthly journal of all key decisions which I make – decisions that could be truly life changing – and my instinctive reasons for why I made them. I go back only after exactly a year has passed, and I note down whether the decision was correct, and more importantly whether my instincts were right. At the end of the year, I go over all twelve months worth of notes, and search for any patterns amongst all of the right and wrong choices. This is not a short-term strategy, as you can tell. In fact it takes exactly two years from the day you start following it, to the time that you can get some useful insights to sharpen your instincts. Keeping a diary of decisions has other uses, and there are many ways of getting an overview of your life prior to this, but it is only after the two years have passed that a genuine clear pattern presents itself.

Some theory

This accords with some abstract theory about human rationality. A perfect-Bayesian expected utility maximizer doesn't start out with an optimal policy. Rather, its strength is being able to learn from its experience (optimally), so that it converges towards the optimal policy.

Of course, humans have a number of limitations standing between us and perfect-decision-making-in-the-limit. Due to computational constraints, perfect Bayesian updating is out of reach. But among a number of limitations, the first and most fundamental consideration is "are you learning from your data at all?".

If the consequences of your decisions don't propagate back to the process(es) that you use to makes decisions, then that decision process isn't going to improve.

And I think that, by default, I mostly don't learn from my own experience, for a couple of reasons.

• Reflection isn't automatic, I'm likely to make many decisions, important and unimportant, without every going back to check how they turned out, especially on long timescales.
• With hindsight bias and whatnot, I can't trust myself to remember why I made a decision, when I've seen how it turned out.
• In general, each situation is treated as an isolated incident, instead examining at the level of my heuristics (i.e. the level of my decision making apparatus).

So I need some process, that involves writing things down, that allows me to intentionally implement back-propagation.

Personal experience

I only started logging my decisions a little more than year ago, and did the analysis for the end of 2018 this week, so I don't have that much personal experience to share. I'm sharing anyway, because it will be years until I have lots of experience with this technique.

That said,

• I've been logging very big decisions ("should I abandon X project?") along with small decisions ("Some friends (y, z) just asked me if I want to go out to dinner with them. Should I join them, or keep working?"). In some situations, I get feedback about whether what I did was the right choice or not, pretty much immediately, in which case I'll log that too, so that I can draw out heuristics later.
• I've also been logging my mistakes ("I put a lot of effort into setting things up so that I could work on the plane, and then my laptop ran out of battery in the first hour.").
• Overall, I didn't log enough over the past year, such that my document is sparser than I think it could have been. I averaged 2 to 4 entries a month, but I think I could have had 5-10 a month. From looking over what I do have, I can feel how having more entries would have been useful. So even given the bullet points above, I think my conception of what counts as a "decision" was too strict.
• Relatedly, making logging low-friction seems important. This year, I'm going to implement this in Roam, using #[[decision]] tag, and integrate this into my existing daily / weekly review system.

Even given the issues I described above, I found the assessment activity to be extremely useful. There were some places where I was able to highlight "past Eli was flat-out wrong", and others where, having seen how things turned out, I could outline nuanced heuristics that take into account the right considerations in the right circumstances.

It also clearly affirmed two principles / hamming problems, that had occurred to me before, but hadn't really slapped my in the face. This was helpful for realizing that "my tendency to X is preemptively destroying most of the value might might create", which is an important thing to get to full conscious attention.

Good luck!

Discuss

Using Vickrey auctions as a price discovery mechanism

12 января, 2020 - 04:00
Published on January 12, 2020 1:00 AM UTC

I recommend "Pricing niche products: Why sell a mechanical keyboard kit for $1,668?" for providing a practical case study in price dynamics that helped with my economic intuitions. The author's friend had created a new custom keyboard kit. Their friend's previous kit had sold out in minutes, so clearly something was amiss with their "estimate costs and premiums and then set a price" approach: I’m not a fan of this inside-out approach, for several reasons: • factors like “brand premium” are inherently subjective — the temptation to compare to others limits potential upside and differentiation • picking a (new, higher) price may have reputational downsides (because o_f course_ your customers spend all day in mechanical keyboard chat rooms and may gripe about you “selling out the community”) • you will second guess yourself regardless of the outcome; either you sell out again (goto 0) or you sell too few and then must live with the shame of having$20k worth of unsold keyboard in your garage

The most compelling argument against simply picking a price, though, is that it limits how much you can learn about your market.

Instead, they run a Vickrey auction (or "second-price sealed-bid auction") and find that the demand curve supports 3x the list price they would have chosen:

I can’t overstate the benefits of knowing the demand curve.

In my friend’s case, the auction let them sell far above their initial price and revealed that the market was deep enough to justify a larger production run.

(I discovered this post via The Prepared, a newsletter that I'd strongly recommend.)

Discuss

Is it worthwhile to save the cord blood and tissue?

12 января, 2020 - 00:52
Published on January 11, 2020 9:52 PM UTC

My wife and I are expecting a baby this May. I only recently learned about this, but apparently there is an option to save the cord blood and tissue during the delivery. This seems potentially very useful, so I'm wondering if anyone has done research into this or has found trustworthy, informative resources on this topic.

My current understanding is that this tissue has a lot of stem cells. And the stem cells are useful for all sorts of medical procedures, and are even more likely to be used in the future. But currently we don't have an easy (or cheap?) way to get them.

• How valuable are the stem cells right now and how valuable are they expected to be in the future?
• How hard is it to get stem cells for yourself / your child right now vs in the future?
• Will the collected stem cells be only useful for the baby or the mother too?
• Can we reasonably expect the cryo companies to last long enough and not go under?
• Have you had experience donating it?
• Have you had experience storing it?

Discuss

Please Critique Things for the Review!

11 января, 2020 - 23:59
Published on January 11, 2020 8:59 PM UTC

I’ve spent a lot of time defending LW authors’ right to have the conversation they want to have, whether that be early stage brainstorming, developing a high context idea, or just randomly wanting to focus on some particular thing.

LessWrong is not only a place for finished, flawless works. Good intellectual output requires both Babble and Prune, and in my experience the best thinkers often require idiosyncratic environments in order to produce and refine important insights. LessWrong is a full-stack intellectual pipeline.

But the 2018 Review is supposed to be late stage in that pipeline. We’re pruning, not babbling here, and criticism is quite welcome. We’re deliberately offering just as much potential prize money (2000) to reviewers as to the top-rated authors. Nominated authors had the opportunity to opt out of the review process, and none of them did. Getting nominated is meant to feel something like “getting invited to the grown-ups table”, where your ideas are subjected to serious evaluation, and that scrutiny is seen as a sign of respect. In my current expectations, the Review is one of the primary ways that LessWrong ensures high epistemic standards. But how well that plan works is proportional to how much effort critics put into it. The Review and Voting Phases will continue for another until January 19th. During that time, review-comments will appear on the voting page, so anyone considering how to vote on a given post will have the opportunity to see critiques. The reviews will appear abridged initially, so I’d aim for the first couple sentences to communicate your overall takeaway. The Review norms aren’t “literally anything goes” – ridicule, name-calling etc still aren’t appropriate. I’d describe the intended norms for reviews as “professional”. But, posts nominated for the Review should treated as something like “the usual frontpage norms, but with a heavier emphasis on serious evaluation.” I’m still not sure precisely what the rules/guidelines should be about what is acceptable for the final Best of 2018 Book. In some cases, a post might make some important points, but also make some unjustified claims. (I personally think Local Validity as Key to Sanity and Civilization falls in this category). My current best guess is that it’d be fine if such posts end up in the book, but I’d want to make sure to also include reviews that highlighted any questionable statements. Happy Critiquing! Discuss Is there a moral obligation to respect disagreed analysis? 11 января, 2020 - 22:01 Published on January 11, 2020 1:22 AM UTC I wish to perform some action which invites material risk upon myself and another person P, as well as inviting benefits. My personal evaluation is that the benefits far outweigh the risk, but the risk is both speculative and partially subjective so I decided to consult P before taking the action. P was vigorously opposed to the action and gave his reasoning, a list of factors that he claims suggests the risk is much higher and the benefits much lower. Some of the factors given were of the inherently subjective variety, such as feeling proud of the current status quo. Afterwards, I sat and thought about these factors and still reached my initial conclusion that the benefits far outweigh the risk. Furthermore, not only can I perform this action unilaterally but P will not even be aware I have performed this action unless one of the Bad Outcomes (which in my evaluation are exceeding unlikely) occur. If the risk were greater than the benefit I would not take this action but P failed to convince me this is the case. My question: If my analysis is correct, with overwhelming probability not only will P be unharmed by my actions but also be entirely unaware of them. With this in mind, do I have a moral obligation to return to P and argue my case and obtain consent before performing my action? Discuss Moral uncertainty vs related concepts 11 января, 2020 - 13:03 Published on January 11, 2020 10:03 AM UTC Overview How important is the well‐being of non‐human animals compared with the well‐being of humans? How much should we spend on helping strangers in need? How much should we care about future generations? How should we weigh reasons of autonomy and respect against reasons of benevolence? Few could honestly say that they are fully certain about the answers to these pressing moral questions. Part of the reason we feel less than fully certain about the answers has to do with uncertainty about empirical facts. We are uncertain about whether fish can feel pain, whether we can really help strangers far away, or what we could do for people in the far future. However, sometimes, the uncertainty is fundamentally moral. [...] Even if were to come to know all the relevant non‐normative facts, we could still waver about whether it is right to kill an animal for a very small benefit for a human, whether we have strong duties to help strangers in need, and whether future people matter as much as current ones. Fundamental moral uncertainty can also be more general as when we are uncertain about whether a certain moral theory is correct. (Bykvist; emphasis added)[1] I consider the above quote a great starting point for understanding what moral uncertainty is; it gives clear examples of moral uncertainties, and contrasts these with related empirical uncertainties. From what I’ve seen, a lot of academic work on moral uncertainty essentially opens with something like the above, then notes that the rational approach to decision-making under empirical uncertainty is typically considered to be expected utility theory, then discusses various approaches for decision-making under moral uncertainty. That’s fair enough, as no one article can cover everything, but it also leaves open some major questions about what moral uncertainty actually is.[2] These include: 1. How, more precisely, can we draw lines between moral and empirical uncertainty? 2. What are the overlaps and distinctions between moral uncertainty and other related concepts, such as normative, metanormative, decision-theoretic, and metaethical uncertainty, as well as value pluralism? • My prior post answers similar questions about how morality overlaps with and differs from related concepts, and may be worth reading before this one. 3. Is what we “ought to do” under moral uncertainty an objective or subjective matter? 4. Is what we “ought to do” under moral uncertainty a matter of rationality or morality? 5. Are we talking about “moral risk” or about “moral (Knightian) uncertainty” (if such a distinction is truly meaningful)? 6. What “types” of moral uncertainty are meaningful for moral antirealists and/or subjectivists?[3] In this post, I collect and summarise ideas from academic philosophy and the LessWrong and EA communities in an attempt to answer the first two of the above questions (or to at least clarify what the questions mean, and what the most plausible answers are). My next two posts will do the same for the remaining questions. I hope this will benefit readers by facilitating clearer thinking and discussion. For example, a better understanding of the nature and types of moral uncertainty may aid in determining how to resolve (i.e., reduce or clarify) one’s uncertainty, which I’ll discuss two posts from now. (How to make decisions given moral uncertainty is discussed later in this sequence.) Epistemic status: The concepts covered here are broad, fuzzy, and overlap in various ways, making definitions and distinctions between them almost inevitably debatable. Additionally, I’m not an expert in these topics (though I have now spent a couple weeks mostly reading about them). I’ve tried to mostly collect, summarise, and synthesise existing ideas. I’d appreciate feedback or comments in relation to any mistakes, unclear phrasings, etc. (and just in general!). Empirical uncertainty In the quote at the start of this post, Bykvist (the author) seemed to imply that it was easy to identify which uncertainties in that example were empirical and which were moral. However, in many cases, the lines aren’t so clear. This is perhaps most obvious with regards to, as Christian Tarsney puts it: Certain cases of uncertainty about moral considerability (or moral status more generally) [which] turn on metaphysical uncertainties that resist easy classification as empirical or moral. [For example,] In the abortion debate, uncertainty about when in the course of development the fetus/infant comes to count as a person is neither straightforwardly empirical nor straightforwardly moral. Likewise for uncertainty in Catholic moral theology about the time of ensoulment, the moment between conception and birth at which God endows the fetus with a human soul [...]. Nevertheless, it seems strange to regard these uncertainties as fundamentally different from more clearly empirical uncertainties about the moral status of the developing fetus (e.g., uncertainty about where in the gestation process complex mental activity, self-awareness, or the capacity to experience pain first emerge), or from more clearly moral uncertainties (e.g., uncertainty, given a certainty that the fetus is a person, whether it is permissible to cause the death of such a person when doing so will result in more total happiness and less total suffering).[4] And there are also other types of cases in which it seems hard to find clear, non-arbitrary lines between moral and empirical uncertainties (some of which Tarsney [p. 140-146] also discusses).[5] Altogether, I expect drawing such lines will quite often be difficult. Fortunately, we may not actually need to draw such lines anyway. In fact, as I discuss in my post on making decisions under both moral and empirical uncertainty, many approaches for handling moral uncertainty were consciously designed by analogy to approaches for handling empirical uncertainty, and it seems to me that they can easily be extended to handle both moral and empirical uncertainty, without having to distinguish between those “types” of uncertainty.[6][7] The situation is a little less clear when it comes to resolving one’s uncertainty (rather than just making decisions given uncertainty). It seems at first glance that you might need to investigate different “types” of uncertainty in different ways. For example, if I’m uncertain whether fish react to pain in a certain way, I might need to read studies about that, whereas if I’m uncertain what “moral status” fish deserve (even assuming that I know all the relevant empirical facts), then I might need to engage in moral reflection. However, it seems to me that the key difference in such examples is what the uncertainties are actually about, rather than specifically whether a given uncertainty should be classified as “moral” or “empirical”. (It’s also worth quickly noting that the topic of “cluelessness” is only about empirical uncertainty - specifically, uncertainty regarding the consequences that one’s actions will have. Cluelessness thus won’t be addressed in my posts on moral uncertainty, although I do plan to later write about it separately.) Normative uncertainty As I noted in my prior post: A normative statement is any statement related to what one should do, what one ought to do, which of two things are better, or similar. [...] Normativity is thus the overarching category (superset) of which things like morality, prudence [essentially meaning the part of normativity that has to do with one’s own self-interest, happiness, or wellbeing], and arguably rationality are just subsets. In the same way, normative uncertainty is a broader concept, of which moral uncertainty is just one component. Other components could include: • prudential uncertainty • decision-theoretic uncertainty (covered below) • metaethical uncertainty (also covered below) - although perhaps it’d make more sense to see metaethical uncertainty as instead just feeding into one’s moral uncertainty Despite this, academic sources seem to commonly either: • focus only on moral uncertainty, or • state or imply that essentially the same approaches for decision-making will work for both moral uncertainty in particular and normative uncertainty in general (which seems to me a fairly reasonable assumption). On this matter, Tarsney writes: Fundamentally, the topic of the coming chapters will be the problem of normative uncertainty, which can be roughly characterized as uncertainty about one’s objective reasons that is not a result of some underlying empirical uncertainty (uncertainty about the state of concretia). However, I will confine myself almost exclusively to questions about moral uncertainty: uncertainty about one’s objective moral reasons that is not a result of etc etc. This is in part merely a matter of vocabulary: “moral uncertainty” is a bit less cumbersome than “normative uncertainty,” a consideration that bears some weight when the chosen expression must occur dozens of times per chapter. It is also in part because the vast majority of the literature on normative uncertainty deals specifically with moral uncertainty, and because moral uncertainty provides more than enough difficult problems and interesting examples, so that there is no need to venture outside the moral domain. Additionally, however, focusing on moral uncertainty is a useful simplification that allows us to avoid difficult questions about the relationship between moral and non-moral reasons (though I am hopeful that the theoretical framework I develop can be applied straightforwardly to normative uncertainties of a non-moral kind). For myself, I have no taste for the moral/non-moral distinction: To put it as crudely and polemically as possible, it seems to me that all objective reasons are moral reasons. But this view depends on substantive normative ethical commitments that it is well beyond the scope of this dissertation to defend. [...] If one does think that all reasons are moral reasons, or that moral reasons always override non-moral reasons, then a complete account of how agents ought to act under moral uncertainty can be given without any discussion of non-moral reasons (Lockhart, 2000, p. 16). To the extent that one does not share either of these assumptions, theories of choice under moral uncertainty must generally be qualified with “insofar as there are no relevant non-moral considerations.” Somewhat similarly, this sequence will nominally focus on moral uncertainty, even though: • some of the work I’m drawing on was nominally focused on normative uncertainty (e.g., Will MacAskill’s thesis) • I intend most of what I say to be fairly easily generalisable to normative uncertainty more broadly. Metanormative uncertainty In MacAskill’s thesis, he writes that metanormativism is “the view that there are second-order norms that govern action that are relative to a decision-maker’s uncertainty about first-order normative claims. [...] The central metanormative question is [...] about which option it’s appropriate to choose [when a decision-maker is uncertain about which first-order normative theory to believe in]”. MacAskill goes on to write: A note on terminology: Metanormativism isn’t about normativity, in the way that meta-ethics is about ethics, or that a meta-language is about a language. Rather, ‘meta’ is used in the sense of ‘over’ or ‘beyond’ In essence, metanormativism focuses on what metanormative theories (or “approaches”) should be used for making decisions under normative uncertainty. We can therefore imagine being metanormatively uncertain: uncertain about what metanormative theories to use for making decisions under normative uncertainty. For example: • You’re normatively uncertain if you see multiple (“first-order”) moral theories as possible and these give conflicting suggestions. • You’re _meta_normatively uncertain if you’re also unsure whether the best approach for deciding what to do given this uncertainty is the “My Favourite Theory” approach or the “Maximising Expected Choice-worthiness” approach (both of which are explained later in this sequence). This leads inevitably to the following thought: It seems that, just as we can suffer [first-order] normative uncertainty, we can suffer [second-order] metanormative uncertainty as well: we can assign positive probability to conflicting [second-order] metanormative theories. [Third-order] Metametanormative theories, then, are collections of claims about how we ought to act in the face of [second-order] metanormative uncertainty. And so on. In the end, it seems that the very existence of normative claims—the very notion that there are, in some sense or another, ways “one ought to behave”—organically gives rise to an infinite hierarchy of metanormative uncertainty, with which an agent may have to contend in the course of making a decision. (Philip Trammell) I refer readers interested in this possibility of infinite regress - and potential solutions or reasons not to worry - to Trammell, Tarsney, and MacAskill (p. 217-219). (I won’t discuss those matters further here, and I haven’t properly read those Trammell or Tarsney papers myself.) Decision-theoretic uncertainty (Readers who are unfamiliar with the topic of decision theories may wish to read up on that first, or to skip this section.) MacAskill writes: Given the trenchant disagreement between intelligent and well-informed philosophers, it seems highly plausible that one should not be certain in either causal or evidential decision theory. In light of this fact, Robert Nozick briefly raised an interesting idea: that perhaps one should take decision-theoretic uncertainty into account in one’s decision-making. This is precisely analogous to taking uncertainty about first-order moral theories into account in decision-making. Thus, decision-theoretic uncertainty is just another type of normative uncertainty. Furthermore, arguably, it can be handled using the same sorts of “metanormative theories” suggested for handling moral uncertainty (which are discussed later in this sequence). Chapter 6 of MacAskill’s thesis is dedicated to discussion of this matter, and I refer interested readers there. For example, he writes: metanormativism about decision theory [is] the idea that there is an important sense of ‘ought’ (though certainly not the only sense of ‘ought’) according which a decision-maker ought to take decision-theoretic uncertainty into account. I call any metanormative theory that takes decision-theoretic uncertainty into account a type of meta decision theory [- in] contrast to a metanormative view according to which there are norms that are relative to moral and prudential uncertainty, but not relative to decision-theoretic uncertainty.[8] Metaethical uncertainty While normative ethics addresses such questions as "What should I do?", evaluating specific practices and principles of action, meta-ethics addresses questions such as "What is goodness?" and "How can we tell what is good from what is bad?", seeking to understand the nature of ethical properties and evaluations. (Wikipedia) To illustrate, normative (or “first-order”) ethics involves debates such as “Consequentialist or deontological theories?”, while _meta_ethics involves debates such as “Moral realism or moral antirealism?” Thus, in just the same way we could be uncertain about first-order ethics (morally uncertain), we could be uncertain about metaethics (metaethically uncertain). It seems that metaethical uncertainty is rarely discussed; in particular, I’ve found no detailed treatment of how to make decisions under metaethical uncertainty. However, there is one brief comment on the matter in MacAskill’s thesis: even if one endorsed a meta-ethical view that is inconsistent with the idea that there’s value in gaining more moral information [e.g., certain types of moral antirealism], one should not be certain in that meta-ethical view. And it’s high-stakes whether that view is true — if there are moral facts out there but one thinks there aren’t, that’s a big deal! Even for this sort of antirealist, then, there’s therefore value in moral information, because there’s value in finding out for certain whether that meta-ethical view is correct. It seems to me that, if and when we face metaethical uncertainties that are relevant to the question of what we should actually do, we could likely use basically the same approaches that are advised for decision-making under moral uncertainty (which I discuss later in this sequence).[9] Moral pluralism A different matter that could appear similar to moral uncertainty is moral pluralism (aka value pluralism, aka pluralistic moral theories). According to SEP: moral pluralism [is] the view that there are many different moral values. Commonsensically we talk about lots of different values—happiness, liberty, friendship, and so on. The question about pluralism in moral theory is whether these apparently different values are all reducible to one supervalue, or whether we should think that there really are several distinct values. MacAskill notes that: Someone who [takes a particular expected-value-style approach to decision-making] under uncertainty about whether only wellbeing, or both knowledge and wellbeing, are of value looks a lot like someone who is conforming with a first-order moral theory that assigns both wellbeing and knowledge value. In fact, one may even decide to react to moral uncertainty by just no longer having any degree of belief in each of the first-order moral theories they’re uncertain over, and instead having complete belief in a new (and still first-order) moral theory that combines those previously-believed theories.[10] For example, after discussing two approaches for thinking about the “moral weight” of different animals’ experiences, Brian Tomasik writes: Both of these approaches strike me as having merit, and not only am I not sure which one I would choose, but I might actually choose them both. In other words, more than merely having moral uncertainty between them, I might adopt a "value pluralism" approach and decide to care about both simultaneously, with some trade ratio between the two.[11] But it’s important to note that this really isn’t the same as moral uncertainty; the difference is not merely verbal or merely a matter of framing. For example, if Alan has complete belief in a pluralistic combination of utilitarianism and Kantianism, rather than uncertainty over the two theories: 1. Alan has no need for a (second-order) metanormative theory for decision-making under moral uncertainty, because he no longer has any moral uncertainty. • If instead Alan has less than complete belief in the pluralistic theory, then the moral uncertainty that remains is between the pluralistic theory and whatever other theories he has some belief in (rather than between utilitarianism, Kantianism, and whatever other theories the person has some belief in). 2. We can’t represent the idea of Alan updating to believe more strongly in the Kantian theory, or to believe more strongly in the utilitarian theory.[12] 3. Relatedly, we’re no longer able to straightforwardly apply the idea of value of information to things that may inform Alan degree of belief in each theory.[13] Closing remarks I hope this post helped clarify the distinctions and overlaps between moral uncertainty and related concepts. (And as always, I’d welcome any feedback or comments!) In my next post, I’ll continue exploring what moral uncertainty actually is, this time focusing on the questions: 1. Is what we “ought to do” under moral uncertainty an objective or subjective matter? 2. Is what we “ought to do” under moral uncertainty a matter of rationality or morality? 3. Are we talking about “moral risk” or about “moral (Knightian) uncertainty” (if such a distinction is truly meaningful)? 1. For another indication of why the topic of moral uncertainty as a whole matters, see this quote from Christian Tarsney’s thesis: The most popular method of investigation in contemporary analytic moral philosophy, the method of reflective equilibrium based on heavy appeal to intuitive judgments about cases, has come under concerted attack and is regarded by many philosophers (e.g. Singer (2005), Greene (2008)) as deeply suspect. Additionally, every major theoretical approach to moral philosophy (whether at the level of normative ethics or metaethics) is subject to important and intuitively compelling objections, and the resolution of these objections often turns on delicate and methodologically fraught questions in other areas of philosophy like the metaphysics of consciousness or personal identity (Moller, 2011, pp. 428- 432). Whatever position one takes on these debates, it can hardly be denied that our understanding of morality remains on a much less sound footing than, say, our knowledge of the natural sciences. If, then, we remain deeply and justifiably uncertain about a litany of important questions in physics, astronomy, and biology, we should certainly be at least equally uncertain about moral matters, even when some particular moral judgment is widely shared and stable upon reflection. ↩︎ 2. In an earlier post which influenced this one, Kaj_Sotala wrote: I have long been slightly frustrated by the existing discussions about moral uncertainty that I've seen. I suspect that the reason has been that they've been unclear on what exactly they mean when they say that we are "uncertain about which theory is right" - what is uncertainty about moral theories? Furthermore, especially when discussing things in an FAI [Friendly AI] context, it feels like several different senses of moral uncertainty get mixed together. ↩︎ 3. In various places in this sequence, I’ll use language that may appear to endorse or presume moral realism (e.g., referring to “moral information” or to probability of a particular moral theory being “correct”). But this is essentially just for convenience; I intend this sequence to be as neutral as possible on the matter of moral realism vs antirealism (except when directly focusing on such matters). I think that the interpretation and importance of moral uncertainty is clearest for realists, but, as will be discussed more in my post on the sixth question listed, I also think that moral uncertainty can still be a meaningful and important topic for many types of moral antirealist. (To very briefly foreshadow: an antirealist may still have a meaningful conception of “moral learning” in terms of gaining conceptual clarity, encountering new arguments that change what the antirealist values, simplifying their collection of intuitions into a more elegant theory, etc.) ↩︎ 4. As another example of this sort of case, suppose I want to know whether fish are “conscious”. This may seem on the face of it an empirical question. However, I might not yet know precisely what I mean by “conscious”, and I might in fact only really want to know whether fish are “conscious in a sense I would morally care about”. In this case, the seemingly empirical question becomes hard to disentangle from the (seemingly moral) question: “What forms of consciousness are morally important?” And in turn, my answers to that question may be influenced by empirical discoveries. For example, I may initially believe that avoidance of painful stimuli demonstrates consciousness in a morally relevant sense, but then revise that belief when I learn that this behaviour can be displayed in a stimulus-response way by certain extremely simple organisms. ↩︎ 5. The boundaries become even fuzzier, and may lose their meaning entirely, if one assumes the metaethical view moral naturalism, which: refers to any version of moral realism that is consistent with [...] general philosophical naturalism. Moral realism is the view that there are objective, mind-independent moral facts. For the moral naturalist, then, there are objective moral facts, these facts are facts concerning natural things, and we know about them using empirical methods. (SEP) This sounds to me like it would mean that all moral uncertainties are effectively empirical uncertainties, and that there’s no difference in how moral vs empirical uncertainties should be resolved or incorporated into decision-making. But note that that’s my own claim; I haven’t seen it made explicitly by writers on these subjects. That said, one quote that seems to suggest something this claim is the following, from Tarsney’s thesis: Most generally, naturalistic metaethical views that treat normative ethical theorizing as continuous with natural science will see first-order moral principles as at least epistemically if not metaphysically dependent on features of the empirical world. For instance, on Railton’s (1986) view, moral value attaches (roughly) to social conditions that are stable with respect to certain kinds of feedback mechanisms (like the protest of those who object to their treatment under existing social conditions). What sort(s) of social conditions exhibit this stability, given the relevant background facts about human psychology, is an empirical question. For instance, is a social arrangement in which parents can pass down large advantages to their offspring through inheritance, education, etc, more stable or less stable than one in which the state intervenes extensively to prevent such intergenerational perpetuation of advantage? Someone who accepts a Railtonian metaethic and is therefore uncertain about the first-order normative principles that govern such problems of distributive justice, though on essentially empirical grounds, seems to occupy another sort of liminal space between empirical and moral uncertainty. Footnote 15 of this post discusses relevant aspects of moral naturalism, though not this specific question. ↩︎ 6. In fact, Tarsney’s (p.140-146) discussion of the difficulty of disentangling moral and empirical uncertainties is used to argue for the merits of approaching moral uncertainty analogously to how one approaches empirical uncertainty. ↩︎ 7. An alternative approach that also doesn’t require determining whether a given uncertainty is moral or empirical is the “worldview diversification” approach used by the Open Philanthropy Project. In this context, a worldview is described as representing “a combination of views, sometimes very difficult to disentangle, such that uncertainty between worldviews is constituted by a mix of empirical uncertainty (uncertainty about facts), normative uncertainty (uncertainty about morality), and methodological uncertainty (e.g. uncertainty about how to handle uncertainty [...]).” Open Phil “[puts] significant resources behind each worldview that [they] find highly plausible.” This doesn’t require treating moral and empirical uncertainty any differently, and thus doesn’t require drawing lines between those “types” of uncertainty. ↩︎ 8. As with metanormative uncertainty in general, this can lead to complicated regresses. For example, there’s the possibility to construct causal meta decision theories and evidential meta decision theories, and to be uncertain over which of those meta decision theories to endorse, and so on. As above, see Trammell, Tarsney, and MacAskill (p. 217-219) for discussion of such matters. ↩︎ 9. In a good, short post, Ikaxas writes: How should we deal with metaethical uncertainty? [...] One answer is this: insofar as some metaethical issue is relevant for first-order ethical issues, deal with it as you would any other normative uncertainty. And insofar as it is not relevant for first-order ethical issues, ignore it (discounting, of course, intrinsic curiosity and any value knowledge has for its own sake). Some people think that normative ethical issues ought to be completely independent of metaethics: "The whole idea [of my metaethical naturalism] is to hold fixed ordinary normative ideas and try to answer some further explanatory questions" (Schroeder [...]). Others [...] believe that metaethical and normative ethical theorizing should inform each other. For the first group, my suggestion in the previous paragraph recommends that they ignore metaethics entirely (again, setting aside any intrinsic motivation to study it), while for the second my suggestion recommends pursuing exclusively those areas which are likely to influence conclusions in normative ethics. This seems to me like a good extension/application of general ideas from work on the value of information. (I’ll apply such ideas to moral uncertainty later in this sequence.) Tarsney gives an example of the sort of case in which metaethical uncertainty is relevant to decision-making (though that’s not the point he’s making with the example): For instance, consider an agent Alex who, like Alice, divides his moral belief between two theories, a hedonistic and a pluralistic version of consequentialism. But suppose that Alex also divides his metaethical beliefs between a robust moral realism and a fairly anemic anti-realism, and that his credence in hedonistic consequentialism is mostly or entirely conditioned on his credence in robust realism while his credence in pluralism is mostly or entirely conditioned on his credence in anti-realism. (Suppose he inclines toward a hedonistic view on which certain qualia have intrinsic value or disvalue entirely independent of our beliefs, attitudes, etc, which we are morally required to maximize. But if this view turns out to be wrong, he believes, then morality can only consist in the pursuit of whatever we contingently happen to value in some distinctively moral way, which includes pleasure but also knowledge, aesthetic goods, friendship, etc.) ↩︎ 10. Or, more moderately, one could remove just some degree of belief in some subset of the moral theories that one had some degree of belief in, and place that amount of belief in a new moral theory that combines just that subset of moral theories. E.g., one may initially think utilitarianism, Kantianism, and virtue ethics each have a 33% chance of being “correct”, but then switch to believing that a pluralistic combination of utilitarianism and Kantianism is 67% likely to be correct, while virtue ethics is still 33% likely to be correct. ↩︎ 11. Luke Muelhauser also appears to endorse a similar approach, though not explicitly in the context of moral uncertainty. And Kaj Sotala also seems to endorse a similar approach, though without using the term “pluralism” (I’ll discuss Kaj’s approach two posts from now). Finally, MacAskill quotes Nozick appearing to endorse a similar approach with regards to decision-theoretic uncertainty: I [Nozick] suggest that we go further and say not merely that we are uncertain about which one of these two principles, [CDT] and [EDT], is (all by itself) correct, but that both of these principles are legitimate and each must be given its respective due. The weights, then, are not measures of uncertainty but measures of the legitimate force of each principle. We thus have a normative theory that directs a person to choose an act with maximal decision-value. ↩︎ 12. The closest analog would be Alan updating his beliefs about the pluralistic theory’s contents/substance; for example, coming to believe that a more correct interpretation of the theory would lean more in a Kantian direction. (Although, if we accept that such an update is possible, it may arguably be best to represent Alan as having moral uncertainty between different versions of the pluralistic theory, rather than being certain that the pluralistic theory is “correct” but uncertain about what it says.) ↩︎ 13. That said, we can still apply value of information analysis to things like Alan reflecting on how best to interpret the pluralistic moral theory (assuming again that we represent Alan as uncertain about the theory’s contents). A post later in this sequence will be dedicated to how and why to estimate the “value of moral information”. ↩︎ Discuss A method for fair bargaining over odds in 2 player bets! 11 января, 2020 - 10:39 Published on January 11, 2020 1:18 AM UTC Alice and Bob are talking about the odds of some event E. Alice's odd's of E are 55% and Bob's are 90%. It becomes clear to them that they have different odds, and being good (and competitive) rationalists they decide to make a bet. Essentially, bet construction can be seen as a bargaining problem, with the gap in odds as surplus value. Alice has positive EV on the "No" position for bets at >55% odds. Bob has neutral or better EV on the "Yes" position for bets at <90% odds. Naive bet construction strategy: bet with 50/50 odds. Negative EV for Alice, so this bet doesn't work. Less naive bet construction strategy: Alice and Bob negotiate over odds. The problem here, in my eyes, is that Alice and Bob have an incentive to strategically misrepresent their private odds of E in order to negotiate a better bet. If Alice is honest that her odds are 50%, and Bob lies that his odds are 70%, so they split the difference at 60%, Bob takes most of the surplus value. If both were honest and bargaining equitably, they'd have split the difference at 72.5% instead. So I'll call 72.5% the "fair" odds for this bet. A nicer and more rationalist aligned bet construction strategy wouldn't reward dishonesty! So, here it is. 1. Alice and Bob submit their maximum bets and their odds. 2. Take the minimum of the two maximum bets. Let's say its198.

3. Construct 99 mini bets*; one at 1% odds of E, 2% odds of E... 99% odds of E. Each player automatically places 2$on each mini bet that is favorable according to their odds ($198/99 = $2). *99 chosen for simplicity. You could choose a much higher number for the sake of granularity. So, in this case, Alice accepts the No position on all bets at =>55% odds, and Bob accepts the Yes position on all bets at =<90% odds, so they make 35$2 bets, the average odds of which are 72.5%, which is the fair odds.

Observe that there is no incentive for either player to have misrepresented their odds. If Alice overrepresented her odds as 60%, she would just deny herself the ability to bet on bets at 56% through 59%, which have positive EV for her.

Note that Alice and Bob only bet 70 -- less than half of the maximum bet. If Bob wanted Alice to bet more money than she was really willing to risk, he might try to convince her that his odds were close to hers, such that a high maximum bet would still lead to a low actual bet. Does this seem like a problem to you? I think this method is still an improvement. *The mini bets can be abbreviated analytically as one bet at average odds, I just like the mini bets concept for making the intuition clear. -- I am not sure how exciting this method is to anyone. I like it because misrepresentation of value is a core problem in 2 player bargaining, and I realized betting is a bit of a special case. Other special cases exist too-- anything where players would be willing to accept uncertainty over the size of the trade (and are trading a continuous good such that that's even possible). Discuss 10 posts I like in the 2018 Review 11 января, 2020 - 05:23 Published on January 11, 2020 2:23 AM UTC I see basically every post that gets submitted to LessWrong, whereas many users come in and read things more occasionally, so I thought I'd list 10 posts I like in the 2018 Review that people might have missed. I wasn't able to write quick reviews of each of them, so this is more like a list of nominations. I've left out a few posts that I expect will naturally be very popular. 1. Naming the Nameless by Sarah Constantin • This points to a variety of specific mechanisms and gears by which culture and aesthetics affect our judgments and our choices, and combines them into an essay that walked me much further than I was before-hand in noticing these effects. I think it might be in my top 5 posts of 2018. 2. Explicit and Implicit Communication by lionhearted • This has some powerful arguments about when not to say things out loud - when not to make background assumptions explicit - which I think is a really powerful datapoint for a rationalist to take on board. And the anti-Nazi 'Simple Sabotage Field Manual' is amazing. 3. Challenges to Christiano's capability amplification proposal + Paul's research agenda FAQ by Eliezer Yudkowsky and zhukeepa (respectively) • These together are maybe the first time I really understood what Paul's ideas were. Really helpful. I'm mostly talking about the Eliezer-Paul dialogue, both on Eliezer's post, and in the comments of Alex's post. I appreciate zhukeepa putting in the explanatory work post as a necessary step for that dialogue to continue. 4. Varieties of Argumentative Experience by Scott Alexander • So. Many. Examples. This is a strong step forward in conceptualising arguments and disagreements, written by someone who's read and been involved in an incredible amount of good (and bad) ones on the internet. 5. Unrolling social metacognition: Three levels of meta are not enough. by Academian • This very clearly lays out the iterative process by which social emotions and attitudes are built up. Foundational for a lot of social modelling. 6. Act of Charity by Jessicata • A thoughtful dialogue throwing into contrast a lot of key intuitions around deception, self-deception, and ethics. 7. Public Positions and Private Guts by Vaniver • This post says a lot about how formal communication ignores many messy parts of what it's actually like being a human, that helps show how what communication needs to be changes at scale. 8. Optimization Amplifies by Scott Garrabrant • One of the most important ideas IMO when thinking about AI and AI alignment. Best summarised by the line: "I am not just saying that adversarial optimization makes small probabilities of failure large. I am saying that in general any optimization at all messes with small probabilities and errors drastically." 9. Intelligent Social Web by Valentine • Really well-written explanation of a basic way of understanding social dynamics, that helps me understand a lot of my experiences. 10. Prediction Markets: When Do They Work by Zvi • This post communicates a lot of Zvi's taste and experience about how to design basic markets - prediction markets, in particular. Shaped a lot of my thinking about this subject, and has been a good guide when I've been (a little bit) involved in trying to build forecasting infrastructure. I'd be interested to see posts that other users really liked. Discuss Of arguments and wagers 11 января, 2020 - 01:20 Published on January 10, 2020 10:20 PM UTC (In which I explore an unusual way of combining the two.) Suppose that Alice and Bob disagree, and both care about Judy’s opinion. Perhaps Alice wants to convince Judy that raising the minimum wage is a cost-effective way to fight poverty, and Bob wants to convince Judy that it isn’t. If Judy has the same background knowledge as Alice and Bob, and is willing to spend as much time thinking about the issue as they have, then she can hear all of their arguments and decide for herself whom she believes. But in many cases Judy will have much less time than Alice or Bob, and is missing a lot of relevant background knowledge. Often Judy can’t even understand the key considerations in the argument; how can she hope to arbitrate it? Wagers For a warm-up, imagine that Judy could evaluate the arguments if she spent a long enough thinking about them. To save time, she could make Alice and Bob wager on the result. If both of them believe they’ll win the argument, then they should be happy to agree to the deal: “If I win the argument I get100; if I lose I pay $100.” (Note: by the end of the post, no dollars will need to be involved.) If either side isn’t willing to take the bet, then Judy could declare the case settled without wasting her time. If they are both willing to bet, then Judy can hear them out and decide who she agrees with. That person “wins” the argument, and the bet: Alice and Bob are betting about what Judy will believe, not about the facts on the ground. Of course we don’t have to stick with 1:1 bets. Judy wants to know the probability that she will be convinced, and so wants to know at what odds the two parties are both willing to bet. Based on that probability, she can decide if she wants to hear the arguments. It may be that both parties are happy to take 2:1 bets, i.e. each believes they have a 2/3 chance of being right. What should Judy believe? (In fact this should always happen at small stakes: both participants are willing to pay some premium to try to convince Judy. For example, no matter what Alice believes, she would probably be willing to take a bet of$0.10 against \$0.01, if doing so would help her convince Judy.)

If this happens, there is an arbitrage opportunity: Judy can make 2:1 bets with both of them, and end up with a guaranteed profit. So we can continuously raise the required stakes for each wager, until either (1) the market approximately clears, i.e. the two are willing to bet at nearly the same odds, or (2) the arbitrage gap is large enough to compensate Judy for the time of hearing the argument. If (2) happens, then Judy implements the arbitrage and hears the arguments. (In this case Judy gets paid for her time, but the pay is independent of what she decides.)

Recursion

Betting about the whole claim saved us some time (at best). Betting about parts of the claim might get us much further.

In the course of arguing, Alice and Bob will probably rely on intermediate claims or summaries of particular evidence. For example, Alice might provide a short report describing what we should infer from study Z, or Bob might claim “The analysis in study Z is so problematic that we should ignore it.”

Let’s allow anyone to make a claim at any time. But if Alice makes a claim, Bob can make a counterclaim that he feels better represents the evidence. Then we have a recursive argument to decide which version better represents the evidence.

The key idea is that this recursive argument can also be settled by betting. So one of two things happens: (1) Judy is told the market-clearing odds, and can use that information to help settle the original argument, or (2) there is an arbitrage opportunity, so Judy hears out the argument and collects the profits to compensate her for the time.

This recursive argument is made in context: that is, Judy evaluates which of the two claims she feels would be a more helpful summary within the original argument. Sometimes this will be a question of fact about which Alice and Bob disagree, but sometimes it will be a more complicated judgment call. For example, we could even have a recursive argument about which wording better reflects the nuances of the situation.

When making this evaluation, Judy uses facts she learned over the course of the argument, but she interprets the claim as she would have interpreted it at the beginning of the argument. For example, if Bob asserts “The ellipsoid algorithm is efficient” and Alice disagrees, Bob cannot win the argument by explaining that “efficient” is a technical term which in context means “polynomial time”—unless that’s how Judy would have understood the statement to start with.

This allows Judy to arbitrate disagreements that are too complex for her to evaluate in their entirety, by showing her what she “would have believed” about a number of intermediate claims, if she had bothered to check. Each of these intermediate claims might itself be too complicated for Judy to evaluate directly—if Judy needed to evaluate it, she would use the same trick again.

Betting with attention

If Alice and Bob are betting about many claims over the course of a long argument, we can replace dollars by “attention points,” which represent Judy’s time thinking about the argument (perhaps 1 attention point = 1 minute of Judy’s time). Judy considers an arbitrage opportunity “good enough” if the profit is more than the time required to evaluate the argument. The initial allocation of attention points reflects the total amount of time Judy is willing to spend thinking about the issue. If someone runs out of attention points, then they can no longer make any claims or use up any of Judy’s time.

This removes some of the problems of using dollars, and introduces a new set of problems. The modified system works best when the total stock of attention points is large compared to the number at stake for each claim. Intuitively, if there are N comparable claims to wager about, the stakes of each should not be more than a 1/sqrt(N) of the total attention pool — or else random chance will be too large a factor. This requirement still allows a large gap between the time actually required to evaluate an argument (i.e. the initial bankroll of attention points) and the total time that would have been required to evaluate all of the claims made in the argument (the total stake of all of the bets). If each claim is itself supported by a recursive argument, this gap can grow exponentially.

Talking it out

If Alice and Bob disagree about a claim (rather, if they disagree about Judy’s probability of accepting the claim) then they can have an incentive to “talk it out” rather than bringing the dispute to Judy.

For example, suppose that Alice and Bob each think they have a 60% chance of winning an argument. If they bring in Judy to arbitrate, both of them will get unfavorable odds. Because the surplus from the disagreement is going to Judy, both parties would be happy enough to see their counterparty wise up (and of course both would be happy to wise up themselves). This creates room for positive sum trades.

Rather than bringing in Judy to arbitrate their disagreement, they could do further research, consult an expert, pay Judy attention points to hear her opinion on a key issue, talk to Judy’s friends—whatever is the most cost-effective way to resolve the disagreement. Once they have this information, their betting odds can reflect it.

An example

Suppose that Alice and Bob are arguing about how many trees are in North America; both are experts on the topic, but Judy knows nothing about it.

The easiest case is if Alice and Bob know all of the relevant facts, but one of them wants to mislead Judy. In this case, the truth will quickly prevail. Alice and Bob can begin by breaking down the issue into “How many trees are in each of Canada, the US, and Mexico?” If Alice or Bob lie about any of these estimates, they will quickly be corrected. Neither should be willing to bet much for a lie, but if they do, the same thing will happen recursively — the question will be broken down into “how many trees are east and west of the Mississippi?” and so on, until they disagree about how many trees are on a particular hill—a straightforward disagreement to resolve.

In reality, Alice and Bob will have different information about each of these estimates (and geography probably won’t be the easiest way to break things down — instead they might combine the different considerations that inform their views, the best guess suggested by different methodologies, approximate counts of each type of tree on each type of land, and so on). If Alice and Bob can reach a rational consensus on a given estimate, then Judy can use that consensus to inform her own view. If Alice and Bob can’t resolve their disagreement, then we’re back to the previous case. The only difference is that now Alice and Bob have probabilistic disagreements: if Alice disagrees with Bob she doesn’t expect to win the ensuing argument with 100% probability, merely with a high probability.

Odds and ends

This writeup leaves many details underspecified. In particular, how does Judy estimate how long it will take her to arbitrate a disagreement? This can be handled in several ways: by having Judy guess, by having Alice and Bob bet on the length of time until Judy reaches a conclusion, by having them make bets of the form “Alice will agree with me with Z effort,” or so on. I don’t know what would work best.

Despite my use of the word “recursion,” the estimate for “time to settle an argument” (which Judy uses to decide when the stakes are high enough to step in and resolve a disagreement) probably shouldn’t include the time required to settle sub-arguments, since Judy is being paid separately for arbitrating each of those. The structure of the arguments and sub-arguments need not be a tree.

This is a simple enough proposal that it can be realistically implemented, so eventually we’ll hopefully see how it works and why it fails.

I expect this will work best if Alice and Bob often argue about similar topics.

This scheme was motivated by a particular exotic application: delegating decision-making to very intelligent machines. In that setting the goal is to scale to very complex disagreements, with very intelligent arguers, while being very efficient with the overseer’s time (and more cavalier with the arguers’ time).

Of arguments and wagers was originally published in AI Alignment on Medium, where people are continuing the conversation by highlighting and responding to this story.

Discuss