# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 5 минут 15 секунд назад

### Realigning Housing Coalitions

8 октября, 2019 - 14:10
Published on October 8, 2019 11:10 AM UTC

There are currently two main coalitions in US housing politics in high cost-of-living areas:
• Developer / Pro-business / YIMBY: make it easier to build more housing. Allow higher densities, more subdivision, and smaller units. Lower affordable housing minimums, lower parking minimums, lower anything else that acts as a tax on development.

• Anti-displacement / Affordable housing / NIMBY: new housing is luxury housing, they're building units that are too small for families, this construction only benefits the small slice of people rich enough to afford it. There's already too much traffic, not enough parking, and construction means tree removal. Stop condo conversions and evictions. Impose a transfer tax to stop speculators.

This division has been shifting some, however, with the development of density bonus programs. The idea is, developers are allowed to build more units than zoning would normally allow, but they also have to make a larger fraction of them available below market rate. For example, an affordable housing bonus density program could apply to a property where current zoning would allow eight $1M units and two affordable$400k units, and offer an option of ten $1M units and ten affordable$400k units. Programs like this have several advantages:

• The number of new affordable units built is higher than it would be under either current zoning or a new zoning that simply increased the required fraction of affordable units.

• The total number of units available, regardless of affordability, is higher, taking pressure off the market elsewhere.

• The housing actually gets built, because construction is profitable.

Other programs along these lines could include:

• Remove parking minimums for new housing that's near transit, allowing more housing to be built in a way that doesn't cause as much traffic.

• Remove parking minimums for new housing that isn't entitled to street parking and pre-pays for decades of public transit passes

• Raise the cap on on floor-area-ratio more than the cap on allowed units, leading developers to build more family-sized units.

• For buildings with tenants, allow higher density for landlords that retain their tenants while rebuilding, providing interim housing and similarly affordable units in the new building.

These have the same structure, allowing higher density in exchange for improvements elsewhere: affordability, traffic, parking, number of family-sized units, displacement. A coalition along these lines could be pretty broad, and include most of the the anti-displacement, affordable housing, developer, business, and YIMBY camps. I'm sad that Cambridge's affordable housing overlay proposal has been put on hold after only getting 5/9 councillors when it needed 6/9, but I think future efforts in this direction are very promising.

Discuss

### South Bay Meetup

8 октября, 2019 - 05:17
Published on October 8, 2019 2:17 AM UTC

Discuss

### AI Alignment Writing Day Roundup #2

8 октября, 2019 - 02:36
Published on October 7, 2019 11:36 PM UTC

Here are some of the posts from the AI Alignment Forum writing day. Due to the participants writing 34 posts in less than 24 hours (!), I'm re-airing them to let people have a proper chance to read (and comment) on them, in roughly chronological order.

1) Computational Model: Causal Diagrams with Symmetry by Johns Wentworth

This post is about representing logic, mathematics, and functions with causal models.

For our purposes, the central idea of embedded agency is to take these black-box systems which we call “agents”, and break open the black boxes to see what’s going on inside.Causal DAGs with symmetry are how we do this for Turing-computable functions in general. They show the actual cause-and-effect process which computes the result; conceptually they represent the computation rather than a black-box function.

2) Towards a mechanistic understanding of corrigibility by Evan Hubinger

This post builds off of Paul Christiano's post on Worst Case Guarantees. That post claims:

Even if we are very careful about how we deploy ML, we may reach the point where a small number of correlated failures could quickly become catastrophic... I think the long-term safety of ML systems requires being able to rule out this kind of behavior, which I’ll call unacceptable, even for inputs which are extremely rare on the input distribution.

Paul then proposes a procedure built around adversarial search, where one part of the system searches for inputs that produce unacceptable outputs in the trained agent, and talks more about how one might build such a system.

Evan's post tries to make progress on finding a good notion of acceptable behaviour from an ML system. Paul's post offers two conditions about the ease of training an acceptable model (in particular, that it should not stop the agent achieving a high average reward and that is shouldn't make hard problems much harder), but Evan's conditions are about the ease of choosing an acceptable action. His two conditions are:

1. It must be not that hard for an amplified overseer to verify that a model is acceptable.
2. It must be not that hard to find such an acceptable model during training.
If you want to be able to do some form of informed oversight to produce an acceptable model, however, these are some of the most important conditions to pay attention to. Thus, I generally think about choosing an acceptability condition as trying to answer the question: what is the easiest-to-train-and-verify property such that all models that satisfy that property (and achieve high average reward) are safe?

The post then explores two possible approaches, act-based corrigibility and indifference corrigibility.

3) Logical Optimizers by Donald Hobson

This approach offers a solution to a simpler version of the FAI problem:

Suppose I was handed a hypercomputer and allowed to run code on it without worrying about mindcrime, then the hypercomputer is removed, allowing me to keep 1Gb of data from the computations. Then I am handed a magic human utility function, as code on a memory stick. [The approach below] would allow me to use the situation to make a FAI.

4) Deconfuse Yourself about Agency by VojtaKovarik

This post offers some cute formalisations, for example, generalising the notion of anthropomorphism to A(&#x398;)-morphization, about morphing/modelling any system by using an alternative architecture A(&#x398;).

This is an attempt to remove the need to explicitly use the term 'agency' in conversation, out of a sense that the use of the word is lacking in substance. I'm not sure I agree with this, I think people are using it to talk about a substantive thing they don't know how to formalise yet. Nonetheless I liked all the various technical ideas offered.

My favourite part personally was the opening list of concrete architectures organised by how 'agenty' they feel, which I will quote in full:

1. Architectures I would intuitively call "agenty":
1. Monte Carlo tree search algorithm, parametrized by the number of rollouts made each move and utility function (or heuristic) used to evaluate positions.
2. (semi-vague) "Classical AI-agent" with several interconnected modules (utility function and world model, actions, planning algorithm, and observations used for learning and updating the world model).
3. (vague) Human parametrized by their goals, knowledge, and skills (and, of course, many other details).
2. Architectures I would intuitively call "non-agenty":
1. A hard-coded sequence of actions.
2. Look-up table.
3. Random generator (outputting x∼&#x3C0; on every input, for some probability distribution &#x3C0;).
3. Multi-agent architectures:
1. Ant colony.
2. Company (consisting of individual employees, operating within an economy).
3. Comprehensive AI services.

5) Thoughts from a Two-Boxer by jaek

I really liked this post, even though the author ends by saying the post might not have much of a purpose any more.

Having written that last paragraph I suddenly understand why decision theory in the AI community is the way it is. I guess I wasn't properly engaging with the premises of the thought experiment.

The post was (according to me) someone thinking for themselves about decision theory and putting in the effort to clearly explain their thoughts as they went along.

My understanding of the main disagreement between academia's CDT/EDT and the AI Alignment's UDT/FDT alternatives is the same as Paul Christiano's understanding, which is that they are motivated by asking slightly different questions (the former being more human-focused and the latter being motivated by how to enter code into an AI). This post shows someone thinking through that and coming to that same realisation for themselves. I expect to link to it in the future as an example of this.

Discuss

### Bets and updating

8 октября, 2019 - 02:06
Published on October 7, 2019 11:06 PM UTC

Suppose the US presidential election is tomorrow. You currently assign a probability of 50% to each outcome. (We are ignoring the small possibility that neither of the main party candidates will win).

A man approaches you, and offers you a bet of $10, at 2-1 odds. In other words, if candidate one wins, he pays you$20, if candidate two wins, you pay him $10. Should you accept this bet? What if the bet was for$10000 instead? Assume that your utility is linear in dollars (or assume that the bet is for utilons instead, whatever). If not, why not? Try to think about this before reading on.

The answer is that it depends on your priors - in particular, it depends on how you interpret the evidence of being offered the bet. In general, if someone offers you a large bet on some outcome, it's probably safe to assume they have access to a reasonable amount of information about the outcome. Depending on how much information your own probability estimate is based on, you should update towards the odds they offered you.

If you update too little, and accept too many bets, you will lose a lot of money to people with better information than you. On the other hand, you can also go too far in the other direction. If your response to being offered a five-cent bet is to immediately update to accept their probabilities (and refuse the bet), you will be very easy to fool (although hard to exploit by betting).

Now suppose there are two superintelligences, Omega and Omicron. They are both excellent at modelling both you and the presidential election. Omega has a strong preference for money, and a weak preference for having you believe false things about the presidential election. Omicron has this swapped - it wants you to believe that actual outcome of the election (which it has predicted) is extremely unlikely, and has a weak preference for money.

Omega executes the following plan: It looks through a large number of possible bets, looking for ones that will give it a lot of money (according to its predicted outcome of the election). For each of them, it predicts whether you will, if offered, take the bet, or simply update your belief to be more accurate (if the bet is such that Omega wins money, it must be "in the correct direction"). It finds the best bet you will take (if any), and offers you this bet.

Omicron does a similar thing, but instead looks for bets that you won't take - bets which will instead cause you to update strongly in the wrong direction, and not take the bet (since it doesn't want to give you money). Again, it finds the best bet you won't take (if any) and approaches you.

You are approached by someone and offered a bet, although you don't know if it's Omega or Omicron. What should your policy be?

If you accept a bet, the bet is likely to have come from Omega, and thus be extremely costly for you. So you shouldn't take the bet.

On the other hand, if you update strongly in the direction of the bet, it most likely came from Omicron, and this means it's probably an update in the wrong direction. So you shouldn't update.

This leaves you in the perplexing situation of believing that the bet is probably extremely good, but not wanting to take it.

Discuss

### Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann

7 октября, 2019 - 22:52
Published on October 7, 2019 7:52 PM UTC

[Epistemic Status: My inside view feels confident, but I’ve only discussed this with one other person so far, so I won't be surprised if it turns out to be confused.]

Armstrong and Mindermann (A&M) argue "that even with a reasonable simplicity prior/Occam’s razor on the set of decompositions, we cannot distinguish between the true decomposition and others that lead to high regret. To address this, we need simple ‘normative’ assumptions, which cannot be deduced exclusively from observations."

I explain why I think their argument is faulty, concluding that maybe Occam's Razor is sufficient to do the job after all.

In what follows I assume the reader is familiar with the paper already or at least with the concepts within it.

Brief summary of A&M's argument:

(This is merely a brief sketch of A&M’s argument; I’ll engage with it in more detail below. For the full story, read their paper.)

Take a human policy pi = P(R) that we are trying to represent in the planner-reward formalism. R is the human’s reward function, which encodes their desires/preferences/values/goals. P() is the human’s planner function, which encodes how they take their experiences as input and try to choose outputs that achieve their reward. Pi, then, encodes the overall behavior of the human in question.

Step 1: In any reasonable language, for any plausible policy, you can construct “degenerate” planner-reward pairs that are almost as simple as the simplest possible way to generate the policy, yet yield high regret (i.e. have a reward component which is very different from the "true"/"Intended" one.)

• Example: The planner deontologically follows the policy, despite a buddha-like empty utility function
• Example: The planner greedily maximizes the reward function "obedience-to-the-policy."
• Example: Double-negated version of example 2.

It’s easy to see that these examples, being constructed from the policy, are at most slightly more complex than the simplest possible way to generate the policy, since they could make use of that way.

Step 2: The "intended" planner-reward pair--the one that humans would judge to be a reasonable decomposition of the human policy in question--is likely to be significantly more complex than the simplest possible planner-reward pair.

• Argument: It's really complicated.
• Argument: The pair contains more information than the policy, so it should be more complicated.
• Argument: Philosophers and economists have been trying for years and haven't succeeded yet.

Conclusion: If we use Occam’s Razor alone to find planner-reward pairs that fit a particular human’s behavior, we’ll settle on one of the degenerate ones (or something else entirely) rather than a reasonable one. This could be very dangerous if we are building an AI to maximize the reward.

Methinks the argument proves too much:

My first point is that A&M’s argument probably works just as well for other uses of Occam’s Razor. In particular it works just as well for the canonical use: finding the Laws and Initial Conditions that describe our universe!

Take a sequence of events we are trying to predict/represent with the lawlike-universe formalism, which posits C (the initial conditions) and then L() the dynamical laws, a function that takes initial conditions and extrapolates everything else from them. L(C) = E, the sequence of events/conditions/world-states we are trying to predict/represent.

Step 1: In any reasonable language, for any plausible sequence of events, we can construct "degenerate" initial condition + laws pairs that are almost as simple as the simplest pair.

• Example: The initial conditions are an empty void, but the laws say "And then the sequence of events that happens is E"
• Example: The initial conditions are simply E, and L() doesn’t do anything.

It’s easy to see that these examples, being constructed from E, are at most slightly more complex than the simplest possible pair, since they could use the simplest pair to generate E.

Step 2: The "intended" initial condition+law pair is likely to be significantly more complex than the simplest pair.

• Argument: It's really complicated.
• Argument: The pair contains more information than the sequence of events, so it should be more complicated.
• Argument: Physicists have been trying for years and haven't succeeded yet.

Conclusion: If we use Occam’s Razor alone to find law-condition pairs that fit all the world’s events, we’ll settle on one of the degenerate ones (or something else entirely) rather than a reasonable one. This could be very dangerous if we are e.g. building an AI to do science for us and answer counterfactual questions like “If we had posted the nuclear launch codes on the Internet, would any nukes have been launched?”

This conclusion may actually be true, but it’s a pretty controversial claim and I predict most philosophers of science wouldn’t be impressed by this argument for it--even the ones who agree with the conclusion.

Objecting to the three arguments for Step 2

Consider the following hypothesis, which is basically equivalent to the claim A&M are trying to disprove:

Occam Sufficiency Hypothesis: The “Intended” pair happens to be the simplest way to generate the policy.

Notice that everything in Step 1 is consistent with this hypothesis. The first degenerate pairs are constructed from the policy, so they are more complicated than the simplest way to generate it, so if that way is via the intended pair, they are more complicated (albeit only slightly) than the intended pair.

Next, notice that the three arguments in support of Step 2 don’t really hurt this hypothesis:

Re: first argument: The intended pair can be both very complex and the simplest way to generate the policy; no contradiction there. Indeed that’s not even surprising: since the policy is generated by a massive messy neural net in an extremely diverse environment, we should expect it to be complex. What matters for our purposes is not how complex the intended pair is, but rather how complex it is relative to the simplest possible way to generate the policy. A&M need to argue that the simplest possible way to generate the policy is simpler than the intended pair; arguing that the intended pair is complex is at best only half the argument.

Compare to the case of physics: Sure, the laws of physics are complex. They probably take at least a page of code to write up. And that’s aspirational; we haven’t even got to that point yet. But that doesn’t mean Occam’s Razor is insufficient to find the laws of physics.

Re: second argument: The inference from “This pair contains more information than the policy” to “this pair is more complex than the policy” is fallacious. Of course the intended pair contains more information than the policy! All ways of generating the policy contain more information than it. This is because there are many ways (e.g. planner-reward pairs) to get any given policy, and thus specifying any particular way is giving you strictly more information than simply specifying the policy.

Compare to the case of physics: Even once we’ve been given the complete history of the world (or a complete history of some arbitrarily large set of experiment-events) there will still be additional things left to specify about what the laws and initial conditions truly are. Do the laws contain a double negation in them, for example? Do they have some weird clause that creates infinite energy but only when a certain extremely rare interaction occurs that never in fact occurs? What language are the laws written in, anyway? And what about the initial conditions? Lots of things left to specify that aren’t determined by the complete history of the world. Yet this does not mean that the Laws + Initial Conditions are more complex than the complete history of the world, and it certainly doesn’t mean we’ll be led astray if we believe in the Laws+Conditions pair that is simplest.

Re: third argument: Yes, people have been trying to find planner-reward pairs to explain human behavior for many years, and yes, no one has managed to build a simple algorithm to do it yet. Instead we rely on all sorts of implicit and intuitive heuristics, and we still don’t succeed fully. But all of this can be said about Physics too. It’s not like physicists are literally following the Occam’s Razor algorithm--iterating through all possible Law+Condition pairs in order from simplest to most complex and checking each one to see if it outputs a universe consistent with all our observations. And moreover, physicists haven’t succeeded fully either. Nevertheless, many of us are still confident that Occam’s Razor is in principle sufficient: If we were to follow the algorithm exactly, with enough data and compute, we would eventually settle on a Law+Condition pair that accurately describes reality, and it would be the true pair. Again, maybe we are wrong about that, but the arguments A&M have given so far aren’t convincing.

Conclusion

Perhaps Occam’s Razor is insufficient after all. (Indeed I suspect as much, for reasons I’ll sketch in the appendix) But as far as I can tell, A&M’s arguments are at best very weak evidence against the sufficiency of Occam’s Razor for inferring human preferences, and moreover they work pretty much just as well against the canonical use of Occam’s Razor too.

This is a bold claim, so I won’t be surprised if it turns out I was confused. I look forward to hearing people’s feedback. Thanks in advance! And thanks especially to Armstrong and Mindermann if they take the time to reply.

Appendix: So, is Occam’s Razor sufficient or not?

--A priori, we should expect something more like a speed prior to be appropriate for identifying the mechanisms of a finite mind, rather than a pure complexity prior.

--Sure enough, we can think of scenarios in which e.g. a deterministic universe with somewhat simple laws develops consequentialists who run massive simulations including of our universe and then write down Daniel’s policy in flaming letters somewhere, such that the algorithm “Run this deterministic universe until you find big flaming letters, then read out that policy” becomes a very simple way to generate Daniel’s policy. (This is basically just the “Universal Prior is Malign” idea applied in a new way.)

--So yeah, pure complexity prior is probably not good. But maybe a speed prior would work, or something like it. Or maybe not. I don’t know.

--One case that seems useful to me: Suppose we are considering two explanations of someone’s behavior: (A) They desire the well-being of the poor, but [insert epicycles here to explain why they aren’t donating much, are donating conspicuously, are donating ineffectively] and (B) They desire their peers (and their selves) to believe that they desire the well-being of the poor. Thanks to the epicycles in (A), both theories fit the data equally well. But theory B is much more simple. Do we conclude that this person really does desire the well-being of the poor, or not? If we think that even though (A) is more complex it is also more accurate, then yeah it seems like Occam’s Razor is insufficient to infer human preferences. But if we instead think “Yeah, this person just really doesn’t care, and the proof is how much simpler B is than A” then it seems we really are using something like Occam’s Razor to infer human preferences. Of course, this is just one case, so the only way it could prove anything is as a counterexample. To me it doesn’t seem like a counterexample to Occam’s sufficiency, but I could perhaps be convinced to change my mind about that.

--Also, I'm pretty sure that once we have better theories of the brain and mind, we’ll have new concepts and theoretical posits to explain human behavior. (e.g. something something Karl Friston something something free energy?) Thus, the simplest generator of a given human’s behavior will probably not divide automatically into a planner and a reward; it’ll probably have many components and there will be debates about which components the AI should be faithful to (dub these components the reward) and which components the AI should seek to surpass (dub these components the planner.) These debates may be intractable, turning on subjective and/or philosophical considerations. So this is another sense in which I think yeah, definitely Occam’s Razor isn’t sufficient--for we will also need to have a philosophical debate about what rationality is.

Discuss

### The sentence structure of mathematics

7 октября, 2019 - 21:58
Published on October 7, 2019 6:58 PM UTC

"Alice pushes Bob."

"Cat drinks milk."

"Comment hurts feelings."

These are all different sentences that describe wildly different things. People are very different from cats, and cats are very different from comments. Bob, milk, and feelings don't have much to do with each other. Pushing, drinking, and (emotionally) hurting are also really different things.

But I bet these sentences all feel really similar to you.

They should feel similar. They all have the same structure. Specifically, that structure is

Because these sentences all share the same fundamental underlying structure, they all feel quite similar even though they are very different on the surface. (The mathematical term for "fundamentally the same but different on the surface" is isomorphic.)

When you studied sentence structure back in grammar school (it wasn't just me, right?) you learned to break down sentences into their parts of speech. You learn that nouns are persons, places, or things, and verbs are the activities that nouns do. Adjectives describe nouns, and adverbs describe pretty much anything. Prepositions tell you where nouns go. Etc.

Parts of speech are really abstract and really general. When you look at the surface, the sentence

the ant crawls on the ground

and the sentence

the spaceship flies through space

could not possibly be more different. But when you look at the sentence structure, they're nearly identical.

The concept of "parts of speech" emerge when we notice certain general patterns arising in the way we speak. We notice that whether we're talking about ants or spaceships, we're always talking about things. And whether we're talking about crawling or flying, we're always talking about actions.

And so on for adjectives, adverbs, conjunctions, etc., which always seem to relate back to nouns and verbs—adjectives modify nouns, for example.

Next we simply give things and actions, descriptors and relational terms some confusing names to make sure the peons can't catch on—nouns and verbs, adjectives and prepositions—and we have a way of breaking down any English sentence into its fundamental parts.

That is to say, if you know the abstract rules governing sentence structure—the types of pieces and their connections—you can come up with structures that any English sentence is but a particular example of.

Like how "Alice pushes Bob" is but a particular example of "Noun verb noun."

At the most basic level, category theory breaks down mathematics into its parts of speech. It turns out that mathematics is pretty much just nouns and verbs at its simplest—just like how, if you read between the lines a bit, any English sentence can be boiled down to its nouns and verbs. Those are the "main players" which everything else just modifies in some fashion.

In mathematics, a noun is called an object.

A verb is called a morphism or arrow. We'll explore the terminology of morphism a bit more next time. As to why they can also be called arrows, that's because verbs appear to have directions: One noun does the verb, and another noun (potentially the same noun, like pinching yourself) receives the verb. So you could draw that as an arrow like so:

Alicepush−−−→Bob.

This is exactly how we diagram objects and morphisms in category theory, with one difference: we typically use single letters in place of full names. (I'd explain the value of concision here, but it seems hypocritical.) So if Alice and Bob are objects in our category, and Alice's push of Bob is the morphism, then we might write it this way:

Ap→B.

Equally legitimate is to highlight the morphism up front. (We'll see they're the real stars of the show):

p:A→B.

So now you understand objects and morphisms, the basic pieces of any category, just like how nouns and verbs are the basic pieces of any sentence.

Of course, making a sentence isn't as simple as mashing nouns and verbs together. We need to make sure that the sentence makes sense. To paraphrase Harrison Ford, you can write "colorless green ideas sleep furiously", but you sure can't think it.

We'll explore the rules that define a category in the next post.

Discuss

### [Link] Book Review: ‘The AI Does Not Hate You’ by Tom Chivers (Scott Aaronson)

7 октября, 2019 - 21:16
Published on October 7, 2019 6:16 PM UTC

Scott Aaronson uploaded a review about The AI Does Not Hate You, a book by Tom Chivers.

The book:

This is a book about AI and AI risk. But it's also more importantly about a community of people who are trying to think rationally about intelligence, and the places that these thoughts are taking them, and what insight they can and can't give us about the future of the human race over the next few years.

The book has a dual purpose, it gives an account of the most important events that happened on the rationalist community while informing on the current state of the AI risk field. There's a Lesswrong discussion here.

This is to talk about Scott A. post. In the review he gives his opinions about the book, including his relationship with the rationalist community and his somewhat changing views on AI risk.

Reading Chivers’s book prompted me to reflect on my own relationship to the rationalist community.The astounding progress in deep learning and reinforcement learning and GANs, which caused me (like everyone else, perhaps) to update in the direction of human-level AI in our lifetimes being an actual live possibility.

Discuss

### [AN #67]: Creating environments in which to study inner alignment failures

7 октября, 2019 - 20:10
Published on October 7, 2019 5:10 PM UTC

[AN #67]: Creating environments in which to study inner alignment failures View this email in your browser

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Audio version here (may not be up yet).

Highlights

Towards an empirical investigation of inner alignment (Evan Hubinger) (summarized by Rohin): Last week, we saw that the worrying thing about mesa optimizers (AN #58) was that they could have robust capabilities, but not robust alignment (AN#66). This leads to an inner alignment failure: the agent will take competent, highly-optimized actions in pursuit of a goal that you didn't want.

This post proposes that we empirically investigate what kinds of mesa objective functions are likely to be learned, by trying to construct mesa optimizers. To do this, we need two ingredients: first, an environment in which there are many distinct proxies that lead to good behavior on the training environment, and second, an architecture that will actually learn a model that is itself performing search, so that it has robust capabilities. Then, the experiment is simple: train the model using deep RL, and investigate its behavior off distribution to distinguish between the various possible proxy reward functions it could have learned. (The next summary has an example.)

Some desirable properties:

- The proxies should not be identical on the training distribution.

- There shouldn't be too many reasonable proxies, since then it would be hard to identify which proxy was learned by the neural net.

- Proxies should differ on "interesting" properties, such as how hard the proxy is to compute from the model's observations, so that we can figure out how a particular property influences whether the proxy will be learned by the model.

Rohin's opinion: I'm very excited by this general line of research: in fact, I developed my own proposal along the same lines. As a result, I have a lot of opinions, many of which I wrote up in this comment, but I'll give a summary here.

I agree pretty strongly with the high level details (focusing on robust capabilities without robust alignment, identifying multiple proxies as the key issue, and focusing on environment design and architecture choice as the hard problems). I do differ in the details though. I'm more interested in producing a compelling example of mesa optimization, and so I care about having a sufficiently complex environment, like Minecraft. I also don't expect there to be a "part" of the neural net that is actually computing the mesa objective; I simply expect that the heuristics learned by the neural net will be consistent with optimization of some proxy reward function. As a result, I'm less excited about studying properties like "how hard is the mesa objective to compute".

A simple environment for showing mesa misalignment (Matthew Barnett) (summarized by Rohin): This post proposes a concrete environment in which we can run the experiments suggested in the previous post. The environment is a maze which contains keys and chests. The true objective is to open chests, but opening a chest requires you to already have a key (and uses up the key). During training, there will be far fewer keys than chests, and so we would expect the learned model to develop an "urge" to pick up keys. If we then test it in mazes with lots of keys, it would go around competently picking up keys while potentially ignoring chests, which would count as a failure of inner alignment. This predicted behavior is similar to how humans developed an "urge" for food because food was scarce in the ancestral environment, even though now food is abundant.

Rohin's opinion: While I would prefer a more complex environment to make a more compelling case that this will be a problem in realistic environments, I do think that this would be a great environment to start testing in. In general, I like the pattern of "the true objective is Y, but during training you need to do X to get Y": it seems particularly likely that even current systems would learn to competently pursue X in such a situation.

Technical AI alignment   Iterated amplification

Machine Learning Projects on IDA (Owain Evans et al) (summarized by Nicholas): This document describes three suggested projects building on Iterated Distillation and Amplification (IDA), a method for training ML systems while preserving alignment. The first project is to apply IDA to solving mathematical problems. The second is to apply IDA to neural program interpretation, the problem of replicating the internal behavior of other programs as well as their outputs. The third is to experiment with adaptive computation where computational power is directed to where it is most useful. For each project, they also include motivation, directions, and related work.

Nicholas's opinion: Figuring out an interesting and useful project to work on is one of the major challenges of any research project, and it may require a distinct skill set from the project's implementation. As a result, I appreciate the authors enabling other researchers to jump straight into solving the problems. Given how detailed the motivation, instructions, and related work are, this document strikes me as an excellent way for someone to begin her first research project on IDA or AI safety more broadly. Additionally, while there are many public explanations of IDA, I found this to be one of the most clear and complete descriptions I have read.

Read more: Alignment Forum summary post

List of resolved confusions about IDA (Wei Dai) (summarized by Rohin): This is a useful post clarifying some of the terms around IDA. I'm not summarizing it because each point is already quite short.

Mesa optimization

Concrete experiments in inner alignment (Evan Hubinger) (summarized by Matthew): While the highlighted posts above go into detail about one particular experiment that could clarify the inner alignment problem, this post briefly lays out several experiments that could be useful. One example experiment is giving an RL trained agent direct access to its reward as part of its observation. During testing, we could try putting the model in a confusing situation by altering its observed reward so that it doesn't match the real one. The hope is that we could gain insight into when RL trained agents internally represent 'goals' and how they relate to the environment, if they do at all. You'll have to read the post to see all the experiments.

Matthew's opinion: I'm currently convinced that doing empirical work right now will help us understand mesa optimization, and this was one of the posts that lead me to that conclusion. I'm still a bit skeptical that current techniques are sufficient to demonstrate the type of powerful learned search algorithms which could characterize the worst outcomes for failures in inner alignment. Regardless, I think at this point classifying failure modes is quite beneficial, and conducting tests like the ones in this post will make that a lot easier.

Learning human intent

Fine-Tuning GPT-2 from Human Preferences (Daniel M. Ziegler et al) (summarized by Sudhanshu): This blog post and its associated paper describes the results of several text generation/continuation experiments, where human feedback on initial/older samples was used in the form of a reinforcement learning reward signal to finetune the base 774-million parameter GPT-2 language model (AN #46). The key motivation here was to understand whether interactions with humans can help algorithms better learn and adapt to human preferences in natural language generation tasks.

They report mixed results. For the tasks of continuing text with positive sentiment or physically descriptive language, they report improved performance above the baseline (as assessed by external examiners) after fine-tuning on only 5,000 human judgments of samples generated from the base model. The summarization task required 60,000 samples of online human feedback to perform similarly to a simple baseline, lead-3 - which returns the first three sentences as the summary - as assessed by humans.

Some of the lessons learned while performing this research include 1) the need for better, less ambiguous tasks and labelling protocols for sourcing higher quality annotations, and 2) a reminder that "bugs can optimize for bad behaviour", as a sign error propagated through the training process to generate "not gibberish but maximally bad output". The work concludes on the note that it is a step towards scalable AI alignment methods such as debate and amplification.

Sudhanshu's opinion: It is good to see research on mainstream NLProc/ML tasks that includes discussions on challenges, failure modes and relevance to the broader motivating goals of AI research.

The work opens up interesting avenues within OpenAI's alignment agenda, for example learning a diversity of preferences (A OR B), or a hierarchy of preferences (A AND B) sequentially without catastrophic forgetting.

In order to scale, we would want to generate automated labelers through semi-supervised reinforcement learning, to derive the most gains from every piece of human input. The robustness of this needs further empirical and conceptual investigation before we can be confident that such a system can work to form a hierarchy of learners, e.g. in amplification.

Rohin's opinion: One thing I particularly like here is that the evaluation is done by humans. This seems significantly more robust as an evaluation metric than any automated system we could come up with, and I hope that more people use human evaluation in the future.

Robust Change Captioning (Dong Huk Park et al) (summarized by Dan H): Safe exploration requires that agents avoid disrupting their environment. Previous work, such as Krakovna et al. (AN #10), penalize an agent's needless side effects on the environment. For such techniques to work in the real world, agents must also estimate environment disruptions, side effects, and changes while not being distracted by peripheral and unaffecting changes. This paper proposes a dataset to further the study of "Change Captioning," where scene changes are described by a machine learning system in natural language. That is, given before and after images, a system describes the salient change in the scene. Work on systems that can estimate changes can likely progress safe exploration.

Interpretability

Learning Representations by Humans, for Humans (Sophie Hilgard, Nir Rosenfeld et al) (summarized by Asya): Historically, interpretability approaches have involved machines acting as experts, making decisions and generating explanations for their decisions. This paper takes a slightly different approach, instead using machines as advisers who are trying to give the best possible advice to humans, the final decision makers. Models are given input data and trained to generate visual representations based on the data that cause humans to take the best possible actions. In the main experiment in this paper, humans are tasked with deciding whether to approve or deny loans based on details of a loan application. Advising networks generate realistic-looking faces whose expressions represent multivariate information that's important for the loan decision. Humans do better when provided the facial expression 'advice', and furthermore can justify their decisions with analogical reasoning based on the faces, e.g. "x will likely be repaid because x is similar to x', and x' was repaid".

Asya's opinion: This seems to me like a very plausible story for how AI systems get incorporated into human decision-making in the near-term future. I do worry that further down the line, AI systems where AIs are merely advising will get outcompeted by AI systems doing the entire decision-making process. From an interpretability perspective, it also seems to me like having 'advice' that represents complicated multivariate data still hides a lot of reasoning that could be important if we were worried about misaligned AI. I like that the paper emphasizes having humans-in-the-loop during training and presents an effective mechanism for doing gradient descent with human choices.

Rohin's opinion: One interesting thing about this paper is its similarity to Deep RL from Human Preferences: it also trains a human model, that is improved over time by collecting more data from real humans. The difference is that DRLHP produces a model of the human reward function, whereas the model in this paper predicts human actions.

Other progress in AI   Reinforcement learning

The Principle of Unchanged Optimality in Reinforcement Learning Generalization (Alex Irpan and Xingyou Song) (summarized by Flo): In image recognition tasks, there is usually only one label per image, such that there exists an optimal solution that maps every image to the correct label. Good generalization of a model can therefore straightforwardly be defined as a good approximation of the image-to-label mapping for previously unseen data.

In reinforcement learning, our models usually don't map environments to the optimal policy, but states in a given environment to the corresponding optimal action. The optimal action in a state can depend on the environment. This means that there is a tradeoff regarding the performance of a model in different environments.

The authors suggest the principle of unchanged optimality: in a benchmark for generalization in reinforcement learning, there should be at least one policy that is optimal for all environments in the train and test sets. With this in place, generalization does not conflict with good performance in individual environments. If the principle does not initially hold for a given set of environments, we can change that by giving the agent more information. For example, the agent could receive a parameter that indicates which environment it is currently interacting with.

Flo's opinion: I am a bit torn here: On one hand, the principle makes it plausible for us to find the globally optimal solution by solving our task on a finite set of training environments. This way the generalization problem feels more well-defined and amenable to theoretical analysis, which seems useful for advancing our understanding of reinforcement learning.

On the other hand, I don't expect the principle to hold for most real-world problems. For example, in interactions with other adapting agents performance will depend on these agents' policies, which can be hard to infer and change dynamically. This means that the principle of unchanged optimality won't hold without precise information about the other agent's policies, while this information can be very difficult to obtain.

More generally, with this and some of the criticism of the AI safety gridworlds that framed them as an ill-defined benchmark, I am a bit worried that too much focus on very "clean" benchmarks might divert from issues associated with the messiness of the real world. I would have liked to see a more conditional conclusion for the paper, instead of a general principle.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Discuss

7 октября, 2019 - 16:20
Published on October 7, 2019 1:20 PM UTC

Last year we organized a new contra dance weekend, Beantown Stomp ( retrospective). It sold out, and some people were frustrated that they didn't hear about it before we went to a waitlist. Since we're going to have a second one (March 20-22nd, put it in your calendar!) several people have suggested that we do a lottery this year. Looking at last years applications, however, I'm not so sure:

We admitted people first-come first-served and went to waitlist at the end of February for a weekend in late March. Everyone who applied before the weekend did end up getting in off the waitlist for at least part of it, though there were also people who would have liked to come who didn't try registering because they knew the weekend was sold out.

This year we're going to allow more people to attend, 250 instead of 200: there was no time last year where the hall was too full, and my worries about not being able to ventilate well enough were unfounded. So there should be less pressure on tickets, with more of them to go around.

So instead of jumping right in with a lottery, here's what I'm thinking: when we open registration the first 125 applications are accepted. Then if we get application #126 before registration has been open for a month, we pause registration and hold a lottery for the remaining 125 spots. This keeps the simplicity and immediacy of first-come first-served, unless demand is high enough that we need a lottery.

Thoughts? I don't know of other events that have done this: most have stayed first-come first-served until they start seeing all their tickets selling out right away, and then switched to a lottery. The hybrid proposal here is a way to get most of the benefits of both when you don't know which situation you're in yet.

Discuss

### The Gears of Impact

7 октября, 2019 - 08:25
Published on October 7, 2019 5:25 AM UTC

Scheduling: The remainder of the sequence will be released after some delay.

Exercise: Why does instrumental convergence happen? Would it be coherent to imagine a reality without it?

Notes

• Here, our descriptive theory relies on our ability to have reasonable beliefs about what we'll do, and how things in the world will affect our later decision-making process. No one knows how to formalize that kind of reasoning, so I'm leaving it a black box: we somehow have these reasonable beliefs which are apparently used to calculate AU.
• In technical terms, AU calculated with the "could" criterion would be closer to an optimal value function, while actual AU seems to be an on-policy prediction, whatever that means in the embedded context. Felt impact corresponds to TD error.
• Framed as a kind of EU, we plausibly use AU to make decisions.
• I'm not claiming normatively that "embedded agentic" EU should be AU; I'm simply using "embedded agentic" as an adjective.

Discuss

### Replace judges with Keynesian beauty contests?

7 октября, 2019 - 07:00
Published on October 7, 2019 4:00 AM UTC

The job of the judicial system is to interpret the law. It's valuable for society that the law is interpreted consistently--for example, so we can make legal agreements and trust that judges will enforce those agreements in a way that's similar to how they enforced such agreements in the past. This is why legal precedents exist.

I think one possible problem facing democracy, at least here in the US, is that incentives for judges to follow precedents aren't strong enough. Because judges have so much flexibility in how they interpret the law, the appointment of judges has become a high-stakes political contest that our democracy wasn't designed to do robustly. Because judges have significant leeway in sentencing, their apparent capriciousness leads people to distrust the judicial system. Here's a proposal for fixing these problems and reducing noise in the judicial system--which could be useful, but is probably flawed and mainly just fun to think about.

A Keynesian beauty contest is a scenario John Maynard Keynes used to describe the behavior of financial markets. The idea is that a newspaper prints photographs of 100 faces and challenges readers to pick the 6 faces who will be picked most frequently by the readership as a whole. The interesting thing is that choosing the faces you personally find attractive isn't a good strategy. You're better off choosing faces which are attractive to the population at large--or, if you believe other newspaper readers are savvy, you might instead choose faces people think people think people are attracted to (and so on).

Let's suppose that the newspaper editor makes a mistake and happens to run the same 100 faces a second time. You'd be a fool not to pick the six winning faces from last time! Not only does the data show that those faces are likely to win--everyone else knows that the data shows that those faces are likely to win, and they're going to make their choices accordingly.

In other words, the Keynesian beauty contest is structured to incentivize predictability and conformism in one's judgments.

Now for fixing the judicial system.

Imagine if anyone who passed the bar was allowed to sign up to be a "Keynesian beauty contest judge" (KBCJ hereafter). Every month, they'd be sent some legal cases to make decisions on. Perhaps there'd be a transcript of some kind of court investigation to review, with identifying information removed. (The KBCJs should remain anonymous too, so they would be harder to intimidate if, for example, the defendant is a mafia boss. And they should remain unknown to each other so they can't coordinate.) Then the KBCJs review the transcript, pronounce guilt or innocence, and if the person is thought guilty, they sentence them. But there's a catch--they only get paid if their judgments are close enough to the average of the other KBCJs. So just like the beauty contest participants, they have a natural incentive to look at precedent information in order to make predictable judgments.

Of course there are a lot of issues that would need to be worked out. For example, the KBCJs might gradually get lazy over time. If all the KBCJs know that all the KBCJs usually just look at the first page of the transcript to determine guilt or innocence, ignoring subsequent pages is actually the best strategy for getting paid! To deal with this problem, maybe the KBCJs could justify why they made each decision in writing, and those justifications could be subject to random checks. They could be limited in the number of cases they took on, and fired if their opinions deviated from the majority too often.

Additionally, there might be cases that cover new legal ground. In those cases, however, I'd argue that trying to squeeze the case into some sort of existing law or precedent is the wrong approach, and the right strategy is for someone to do some original thinking about what's best for society, which feels like it ought to be a different process. Perhaps in addition to standard outcomes like "guilty" and "innocent", the KBCJs would have a new option available, "send it to a citizen's assembly". If enough KBCJs voted for this option, a citizen's assembly (advised by experts) would be spun up to try & figure out what to do in that kind of case.

Another possible objection to the KBCJ system is that it could work too well, and the KBCJs might faithfully continue to enforce precedents long after the population at large decided they were unjust. One hopes that the legislative arm of the government is competent to repeal unjust laws, but it's good to build in some redundancy if that isn't the case. Perhaps the KBCJs could also make use of the "send it to a citizen's assembly" option in cases where it seems obvious that a particular law could use a second look.

I'm also not sure about the idea of working from a transcript instead of seeing live people in front of you. Working from a transcript has the advantage that, for example, the decisions of the KBCJs could not be influenced by the defendant's race or gender (assuming those identifying details were removed, which should be possible in most cases). However, the KBCJs might feel their work isn't meaningful if it feels too much like shuffling papers, and they might grow cynical about their job. I have a theory that the reason why public healthcare systems tend to work OK (compared to nationalizing most other industries) is that medical workers are intrinsically motivated to help patients, so profit incentives aren't essential. On the other hand, very few people are intrinsically motivated by shuffling papers at a bank, and that's why state-run banks don't work as well.

Discuss

### What do the baby eaters tell us about ethics?

7 октября, 2019 - 02:26
Published on October 6, 2019 10:27 PM UTC

I just finished the baby eater sequence ( https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8 ) and aside from it being an incredibly engaging story I feel like there is a deeper message about ethical systems I don't fully understand yet.

In the sequence introduction Eliezer says it makes points about "naturalistic metaethics" but I wonder what points are these specifically, since after reading the SEP page on moral naturalism https://plato.stanford.edu/entries/naturalism-moral/ I can't really figure out what the mind-independent moral facts are in the story.

Another thought I've had since I read the story is that it seems like a lot of human-human interactions are really human-babyeater interactions. If you're religious and talking to an atheist about God, both of you will look like baby eaters to the other. Likewise if you watch Fox News everyone on CNN or MSNBC will look like baby eaters but the same is true in reverse, everyone watching CNN will think Fox News are the baby eaters.

I have to say, this feels like some kind of ethical nihilism, but I would be curious to know if there are any canonical meta-ethical or ethical theories that correspond to the _message_ of the baby eater sequence, because if there is one, I think I agree with it.

Discuss

### Introduction to Introduction to Category Theory

6 октября, 2019 - 17:43
Published on October 6, 2019 2:43 PM UTC

Category theory is so general in its application that it really feels like everyone, even non-mathematicians, ought to at least conceptually grok that it exists, like how everyone ought to understand the idea of the laws of physics even if they don't know what those laws are.

We expect educated people to know that the Earth is round and the Sun is big, even though those facts don't have any direct relevance to the lives of most people. I think people should know about Yoneda and adjunction in at least the same broad way people are aware of the existence and use of calculus.

But no one outside of mathematics and maybe programming/data science has heard of category theory, and I think a big part of that is because all of the examples in textbooks assume you already know what Sierpinski spaces and abelian groups are.

That is to say: all expositions of category theory assume you know math.

Which makes sense. Category theory is the mathematics of math. Trying to learn category theory without having most of an undergraduate education in math already under your belt is like trying to study Newton's laws without having ever seen an apple fall from a tree. You can...you're just going to have absolutely no intuition to rely on.

Category theory generalizes the things you do in the various fields of mathematics, just like how Newton's laws generalize the things you do when you toss a rock or push yourself off the ground. Except really, category theory generalizes what you do when you generalize with Newton's laws. Category theory generalizes generalizing.

Therefore, without knowing about any specific generalizations, like algebra or topology, it's hard to understand general generalities—which are categories.

As a result, there are no category theory texts (that I know of) that teach category theory to the educated and intelligent but mathematically ignorant person.

Which is a shame, because you totally can.

Sure, if you've never learned topology, plenty of standard examples will fly over your head. But every educated person has encountered the idea of generalization, and they've seen generalizations of generalizations. In fact, category theory is very intuitive, and I don't think it necessarily benefits from relating it all as quickly as possible to more familiar fields of mathematics. Instead, you should grasp the flow of category theory itself, as its own field.

So this is (tentatively, hopefully, unless I get busy, bored, or it just doesn't work out) a series on the basics of category theory without assuming you know any math. I'm thinking specifically of high school seniors.

There is no schedule for the posts. They'll just be up whenever I make them.

Why category theory? And why lesswrong?

Well, category theory is a super-general theory of everything. Rationality is also a super-general theory of everything. In fact, we'll see how category theory tells us a lot about what rationality really is, in a certain rigorous sense.

Basically...rationality comes from noticing certain general laws that seem to emerge every time you try to do something the "right" way. After a while, instead of focusing so much on the specifics, it starts being worth it to take a step back and study the general rules that seem to be emerging. And you start to notice that doing things the "right" way gets a lot easier when you start with the general rules and simply fill in the specifics, like how the quadratic formula makes quadratic equations a cinch to solve.

Category theory gives us all the general rules for doing things the "right" way.

(Don't actually hold me to demonstrating this claim.)

Why should you be interested in category theory?

One is because category theory is going to rise in importance in the future. It offers powerful new ways of doing math and science. So get started!

Two is that category theory makes it much easier to learn the rest of math. Well, maybe—this is an experiment, and a big motivation for doing this. How fast and well do people learn regular math if they can just say, "Oh, it's an adjunction" every time they learn a new concept?

Three is that referencing homotopy type theory in conversation will make you sound cool and mysterious.

Please let me know if there's any interest in this.

Discuss

### What is category theory?

6 октября, 2019 - 17:33
Published on October 6, 2019 2:33 PM UTC

Category theory is the mathematics of math—specifically, it's a mathematical theory of mathematical structure. It turns out that every kind of mathematics you're likely to encounter in a normal university education is just another kind of category—group theory falls under the category of groups, topology falls under the category of topological spaces, etc.

Specifically, category theory shows that all of the highly diverse mathematical structures that we know of can be broken down into nodes and arrows between nodes. Nodes and arrows define a category in a nutshell—plus something called composition, which basically says that if a bird can fly from Mexico to the USA to Canada, then the bird can also travel "directly" from Mexico to Canada in a way that is equal to the Mexico —> USA —> Canada path. (See "the right to use a name" section.)

Breaking any and every mathematical structure down to nodes and arrows between nodes is a super-general, super-abstract way of studying mathematics. It's like trying to study Shakespeare by breaking down all of his sentences into nouns and verbs. Category theory has a reputation for being painfully abstract—in fact, it's been called "abstract nonsense" by one of its founders. Because of this, it's typically recommended that you have a few mathematical structures under your belt—algebra, groups, topology, etc.—before studying category theory so that you have specific examples to relate the abstractions to. (It would be tough to study verbs if you didn't know about things like running and jumping!)

But while there's only so much to learn about Shakespeare by breaking "to be or not to be" into infinitives, conjunctions, and adverbs, it turns out that the super-general perspective of category theory is incredibly useful in concrete ways. In particular, it turns out that pretty much every cool idea in math is something called an adjoint functor—a special construction that can only be accessed through category theory. A lot of category theorists will tell you that adjoint functors are kind of the point of category theory. Adjoints, or adjunction, generalizes optimization itself.

Then there is the Yoneda lemma, which is as deep as it is elegant and powerful. We will explore it in depth. (If this series works out.)

You might be wondering what success category theory has found in applications to the sciences. How can you even apply something so general and abstract to our highly specific and concrete reality?

Well, category theory is super general, so whenever you are studying super-general phenomena, it makes sense to think of category theory. What's a super-general phenomenon? For example, the laws of physics! They govern everything, presumably. If you're looking for fundamental rules that apply to everything from tiny particles to enormous planets and the complex living creatures in between, category theory immediately comes to mind.

Then there is biology, which less super-general, unless there really are Martians hiding from our rovers, but organisms have to survive and reproduce under wildly diverse conditions—the planet Earth can throw a lot of stuff at you, from volcanoes to Ice Ages. On some level, organic life clearly has the ability to adapt to all of these conditions—and adapting the same basic thing to lots of different contexts with powerful results is basically what category theory is.

Definitely the biggest applied success for category theory has been in programming. I'd encourage you to look up functional programming, lambda calculus, or just Google something like "programming category theory." It's fascinating, though I'm actually going to deemphasize the programming side of things if anything, as I don't want to distract from the fundamentals.

So what is category theory? Nothing other than the formal generalization of everything. Why should you be interested in it? Because it gives you an incredible bird's-eye view of all of mathematics, and a particular perch, adjunction, that can't be found anywhere else.

This series will be very slow paced relative to other introductions—I will not assume you know what sets and functions are, to give just one example. If you're comfortable with math or just want to plunge into things a little more, I strongly encourage you to look up the many fantastic introductions to category theory that already exist on the Internet for free as videos, textbooks, blog posts, and papers. This is a series meant for people who either have no exposure to mathematics beyond the high school level or actively want to avoid it! (I'll put it this way: if there was a "Baby Rudin" for category theory, this series would be aiming to be a "Fetal Rudin.")

There's no schedule for these posts, which isn't ideal for learning, but that's just the reality of how this series is going to be made. Coming up is a sequence of posts on the most basic details of defining a category, with an emphasis on developing intuition at each step.

Discuss

6 октября, 2019 - 14:10
Published on October 6, 2019 11:10 AM UTC

BIDA is hosting its first advanced dance this afternoon, before our regular dance:

I've been thinking a lot about it. If you had asked me ten years ago whether BIDA would ever do advanced dances I would probably have said no. A major goal with BIDA is to be welcoming, especially to newcomers, and a dance where people aren't welcome unless they already know how to dance seems antithetical. So why do I think having one is good for our community?

Over time I've gotten a better understanding of the different roles contra dancing plays in people's lives, and a common pattern is that someone will start coming, get excited about it, dance a lot for a while, come less often, and then stop attending, over the course of about five years. The pattern is not universal, but it is pretty common, and one driver is no longer finding contra to be as interesting as before. Some people stop dancing much, others move on to more complex styles.

This is where I see an occasional advanced dance fitting in: when you have a hall where everyone already knows how to dance the caller can choose more interesting dances. Or run more no-walkthroughs and spend less time teaching. The dancers spend less time helping people around them learn to dance, and more time coming up with interesting variations. People also want to be at dances with their friends, and if you have a pool of people who only want to come a few times a year they'll normally not see much of each other. But an advanced dance can bring coordination, where many of them attend at once without a "how about we all come to this specific dance" discussion. A community where people stay involved longer is a healthier one, and helping people maintain ties they formed dancing is good on its own.

On the other hand, this needs to be a rare event. Not only does it lose its coordination power if scheduled too frequently, but if advanced dances started to take the place of regular dances then new people wouldn't be able to get into contra. The thing where you can walk in off the street and start dancing, without attending any sort of lesson, learning entirely as you go, is one of the best things about the dance; it's really important that we don't lose this. So for now this is a one-off event, but we may have others if this feels like it's meeting a need.

Discuss

### Mako's Notes from Skeptoid's 13 Hour 13th Birthday Stream

6 октября, 2019 - 12:43
Published on October 6, 2019 9:43 AM UTC

Recently, the pretty great skeptic podcast, Skeptoid, celebrated its 13th anniversary with a 13 hour-long livestream, in which host and founder Brian Dunning wore a party hat and received video calls from 13 of his dearest friends and peers.

It was a bit of a wreck, and the simultanious viewers never topped 40, but it was also extremely cozy and intimate. It was great seeing Brian interacting with his friends and colleagues and reflecting on their shared projects. It also lined up perfectly with my Aotearoan timezone, so I uh, I actually watched the entire thing... o.o;

But I'm glad I did! I've watched much sillier streamed events! It was actually very informative.

Here are the notes I took.

• Comedian Rachel Bloom came by. The video call quality was terrible.

• Apparently Sony funded a pilot of a show called Truth Hurts. Where Brian Dunning and Shira Lazar made a fun time of testing woo. It felt a lot like Mythbusters, with a greater focus on, you know, busting myths, and mainly myths that might really matter to someone. I think I would have loved it, but obviously it didn't go into production and I don't think Sony are funding shows any more.
• Archaeologist Ken Feder apparently wrote a book about Weird archaeological sites and the eerie stories wooists tell about them. Usually these stories are something along the lines of "old humans couldn't have made these artefacts, so it must have been aliens". Ken notes that this theory is actually kind of racist, the old people were very clever. He then explains the techniques by which the old people did make those artefacts, which tend to be very interesting.
• Magician Brian Brushwood seemed to have a really good attitude about promoting rationality. He hosts something called Scam School where, if I've gathered correctly, he teaches people how to commit acts of fooling. He says that only once you've fooled someone yourself will you learn to spot the signs that someone is trying to fool you, and in many cases those signs will be obvious once you know what to look for. He seemed to have a lot of value to say.
• On this note, I recommend Penn and Teller's Fool Us. Magic is the study of wrong-seeing, it should be interesting to any aspiring rationalist, and Fool Us is a great place to see it.
• Kiki Sanford, host of the extremely long-running science news podcast Scishow, mentioned being a part of a show called What On Earth, where they'd all look at some of the weird/spooky things people find in google earth, and then investigate them, sometimes going so far as to visit those locations, and find out what's really there. This sounds extremely fun to me.
• Brian's special Principles of Curiosity was shown. It was a well polished short documentary about fundamental principles of scientific thinking, featuring the travelling stones of death valley as a central case. I hope it is/was shown in many schools, and I hope it has its intended effects.
• Former psychic Mark Edward was talking about his time working at a psychics' hotline (written about in Psychic Blues). He said that most of the people there were just cynical frauds, but he described a type of person... "intuitive", those who were just "telling their own truth", if I understood correctly... he was describing a type of person who passed for a psychic because they could cold-read, and they could cold-read because they could see through people in ways that their clients might find invasive if it weren't for the shrouds of ghosts and magic covering their shame. They'd tell their audience that the reason they could see so much, wasn't because people have streams of pathos overflowing out of their orifices that they're failing to hide from anyone who truly wants to see, no no, it's because of this here comforting illusion.. you have a guardian angel in the spirit realm and it is speaking to me now because it trusts me, now, here's how your angel says we can fix your problem, and improve your life. Again, I'm not giving this advice, no, far be it from me to give you advice, it's the angel. You have no reason to feel embarrassed, I'm not telling you incredibly basic things that anyone ought to know, these are esoteric predictions from the spirit realm. Now, that ought to cure your troubles. I hope you don't need to call me again, but if you do, I'll always be here.
• And I wept for a bit. The notion that there are these great souls who see so much more human truth than anyone could believe or accept, that they feel like the only way they can convey any of it is to cast these digestible illusions for us... I can believe it. I've got friends like that, and there's nothing a-priori unlikely about it. Some people, often through trauma, end up having a lot of introspection forced upon them. Being forced at an early age to look into dark places, lose someone when they're young, contend with monsters, or maybe mental illness visits them, and they have to sear their whole psyche with light. And sometimes those people take all of that strength and direct it towards helping the people around them.
• But idk I should read the book

Overall it was a lovely, awkward, interesting time, and Brian will probably never do it again. Here's to another 13 years!

Discuss

### [Event] Meeting in Myrhorod, November 17

6 октября, 2019 - 11:45
Published on October 6, 2019 8:45 AM UTC

On November, 17th (Saturday) we go to Myrhorod (Poltava region) and meet with LW readers from Kharkiv and maybe other places, too. The attendance in Kyiv is low enough to do something crazy without worrying too much.

The meetup is at 16.00, unless Intercity is late, near Gogol's statue at the railway station.

We shall go from there somewhere more comfortable. It will be all the more interesting since nobody who's said they'd come knows a comfortable place there. As usual, you can reach me at chernyshenko123@gmail dot com, +38097-667-29-70, Marichka, but if you bring people there, please do everything to help them return home safely and share your own contact information. Given the political situation, be brave. Write me ASAP if you set out from Kyiv and want to come together, I'll be taking a train.

We shall only have several hours and probably some people will have to leave earlier than others, which means we might want to just hang out and introduce ourselves. The coolest outcome I hope for is for Kharkiv to start their own meetup afterwards, but the sky is not the limit. I also have some hopes for Odessa, although not in the near future.

Likely it will be cold, so charge your phones, take enough money to eat at least twice and bundle up. Do not hesitate to tell me directly that you need whatever, whenever it comes up.

And if you think you are not the typical LWer... does it signify so much about you?

Discuss

### Who lacks the qualia of consciousness?

5 октября, 2019 - 22:49
Published on October 5, 2019 7:49 PM UTC

Over on Facebook (I don't know if it's possible to link to a Facebook post, but h/t Alexander Kruel) and Twitter, the subject of missing qualia has come up. Some people are color-blind. This deficiency can be objectively demonstrated by tasks such as the Ishihara patterns. Some people cannot smell, and sometimes do not discover this until well into adulthood. Some people cannot form mental imagery, which was undiscovered until Galton wrote of it, but is now well-known enough to have a Wikipedia article. Until they discover that others really do see with the mind's eye, aphantasics take the expression to be some sort of metaphor. But it is not. Some people, I think most, do see things in their mind's eye.

More recently of note is that some people lack the qualia of long-term memory (see section 1.4): they can know that things involving them happened, but not re-experience them as a participant.

I want to put the following question: Does anyone here lack the qualia of consciousness?

If you do lack this then you won't know what I'm talking about. So I shall try to describe the experience. I have a vivid sensation of my own presence, my own self. This is the thing I am pointing at when I say that I am conscious. Whether I sit in meditation or in the midst of life, there I am. Indeed, more vividly in meditation, because then, that is where I direct my attention. But only in dreamless sleep is it absent.

Some people claim by meditation to have seen through what they claim is the illusion of consciousness. I am uncertain whether they have self-modified to ablate the faculty of having this experience, or merely philosophised themselves into believing there can't be any such thing, and insisting that they are not experiencing what they are experiencing.

But there may be some people out there who have never had any experience of themselves such as I have described. In effect, almost p-zombies. The original p-zombies are by definition indistinguishable in behaviour from everyone else, including talk about consciousness. But people without this experience of self, quasi-p-zombies, or q-zombies for short, may imitate the discourse as aphantasics or anosmics may, but without real understanding. I invite anyone who recognises themselves to be a q-zombie to put their hand up. Note that this is a question about whether you actually have this experience, not what you think about its possibility or nature.

Discuss

### Eight O'Clock is Relative

5 октября, 2019 - 19:20
Published on October 5, 2019 4:20 PM UTC

When Lily was little we used an analog rotary timer outlet to turn on a light so she would know when it was ok to get up. The goal wasn't to wake her, but to let her know when it was ok to come downstairs. This worked ok, but it wasn't very precise. It was good to maybe the nearest 15min, and would drift.

As she got better at reading we switched to a clock, and taught her that when the first digit was "8" she could get up:

pictured: a clock that is not saying "time to get up"

This worked pretty well, but as she got close to starting kindergarten we needed to move her bedtime and waketime earlier. We did this gradually, about 15min at a time, over the course of a couple weeks. Since she can't tell time yet and was only reading the most significant digit she wasn't going to be able to learn that 7:45, then 7:30, then 7:15, etc meant time to get up. So I did the easy thing, and started setting her clock early. First fifteen minutes fast, then thirty minutes fast, until eventually it was an hour fast.

It's now the first week of school and her clock reads "8:00" at 7am. At this point switching it to the correct time and teaching her she can get up if it is "7:??" would have made sense, but there was enough going on that we didn't get to it.

Now we're a month in, she's gotten faster with her morning routine, and she could be sleeping later. So I've pushed her clock to only be 48 minutes fast instead of an hour, and it reads "8:00" at 7:12am.

All of this is pretty silly: she's pretty close to learning to tell time, and having one clock in the house that doesn't match the others is going to start confusing her. And it already confuses adults sometimes (mostly me) since who expects a clock to be 48min fast?

On the other hand I'm not sure where to go next. I could just leave things alone until she points out that it's weird, at which point she'll know time well enough that I can set it correctly and suggest she not get up before 7:12am. Or I could get a more accurate outlet timer, which would have the benefit of also working for our younger daughter? Not sure yet.

(When we were little we had analog clocks in the house, and my dad made a wooden demonstration clock for us. When the hands on the real clock matched the hands on the wooden clock, it was time to get up. I think this worked well, though I remember one weekend when my youngest sister seemed to sleep a really long time, and it turned out she had slept past her normal waking time of 7am and so was still waiting for the hands to come around again.)

Discuss

### How do I view my replies?

5 октября, 2019 - 19:05
Published on October 5, 2019 4:05 PM UTC

On the old site (like on Reddit), there was an envelope in the upper right that led to all replies I'd received to my posts and comments. Now there's a bell; but all its tabs currently say "You don't have any notifications yet!". What am I missing?

Discuss