## Вы здесь

### [AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee

Новости LessWrong.com - 27 ноября, 2019 - 21:10
Published on November 27, 2019 6:10 PM UTC

[AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee View this email in your browser Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Audio version here (may not be up yet).

Highlights

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (Julian Schrittwieser et al) (summarized by Nicholas): Up until now, model-free RL approaches have been state of the art at visually rich domains such as Atari, while model-based RL has excelled for games which require planning many steps ahead, such as Go, chess, and shogi. This paper attains state of the art performance on Atari using a model-based approach, MuZero, while matching AlphaZero (AN #36) at Go, chess, and shogi while using less compute. Importantly, it does this without requiring any advance knowledge of the rules of the game.

MuZero's model has three components:

1. The representation function produces an initial internal state from all existing observations.

2. The dynamics function predicts the next internal state and immediate reward after taking an action in a given internal state.

3. The prediction function generates a policy and a value prediction from an internal state.

Although these are based on the structure of an MDP, the internal states of the model do not necessarily have any human-interpretable meaning. They are trained end-to-end only to accurately predict the policy, value function, and immediate reward. This model is then used to simulate trajectories for use in MCTS.

Nicholas's opinion: This is clearly a major step for model-based RL, becoming the state of the art on a very popular benchmark and enabling planning approaches to be used in domains with unknown rules or dynamics. I am typically optimistic about model-based approaches as progress towards safe AGI. They map well to how humans think about most complex tasks: we consider the likely outcomes of our actions and then plan accordingly. Additionally, model-based RL typically has the safety property that the programmers know what states the algorithm expects to pass through and end up in, which aids with interpretability and auditing. However, MuZero loses that property by using a learned model whose internal states are not constrained to have any semantic meaning. I would be quite excited to see follow up work that enables us to understand what the model components are learning and how to audit them for particularly bad inaccuracies.

Rohin's opinion: Note: This is more speculative than usual. This approach seems really obvious and useful in hindsight (something I last felt for population-based training of hyperparameters). The main performance benefit (that I see) of model-based planning is that it only needs to use the environment interactions to learn how the environment works, rather than how to act optimally in the environment -- it can do the "act optimally" part using some MDP planning algorithm, or by simulating trajectories from the world model rather than requiring the actual environment. Intuitively, it should be significantly easier to learn how an environment works -- consider how easy it is for us to learn the rules of a game, as opposed to playing it well. However, most model-based approaches force the learned model to learn features that are useful for predicting the state, which may not be the ones that are useful for playing well, which can handicap their final performance. Model-free approaches on the other hand learn exactly the features that are needed for playing well -- but they have a much harder learning task, so it takes many more samples to learn, but can lead to better final performance. Ideally, we would like to get the benefits of using an MDP planning algorithm, while still only requiring the agent to learn features that are useful for acting optimally.

This is exactly what MuZero does, similarly to this previous paper: its "model" only predicts actions, rewards, and value functions, all of which are much more clearly relevant to acting optimally. However, the tasks that are learned from environment interactions are in some sense "easier" -- the model only needs to predict, given a sequence of actions, what the immediate reward will be. It notably doesn't need to do a great job of predicting how an action now will affect things ten turns from now, as long as it can predict how things ten turns from now will be given the ten actions used to get there. Of course, the model does need to predict the policy and the value function (both hard and dependent on the future), but the learning signal for this comes from MCTS, whereas model-free RL relies on credit assignment for this purpose. Since MCTS can consider multiple possible future scenarios, while credit assignment only gets to see the trajectory that was actually rolled out, we should expect that MCTS leads to significantly better gradients and faster learning.

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA (Buck Shlegeris) (summarized by Rohin): Here are some beliefs that Buck reported that I think are particularly interesting (selected for relevance to AI safety):

1. He would probably not work on AI safety if he thought there was less than 30% chance of AGI within 50 years.

2. The ideas in Risks from Learned Optimization (AN #58) are extremely important.

3. If we build "business-as-usual ML", there will be inner alignment failures, which can't easily be fixed. In addition, the ML systems' goals may accidentally change as they self-improve, obviating any guarantees we had. The only way to solve this is to have a clearer picture of what we're doing when building these systems. (This was a response to a question about the motivation for MIRI's research agenda, and so may not reflect his actual beliefs, but just his beliefs about MIRI's beliefs.)

4. Different people who work on AI alignment have radically different pictures of what the development of AI will look like, what the alignment problem is, and what solutions might look like.

5. Skilled and experienced AI safety researchers seem to have a much more holistic and much more concrete mindset: they consider a solution to be composed of many parts that solve subproblems that can be put together with different relative strengths, as opposed to searching for a single overall story for everything.

6. External criticism seems relatively unimportant in AI safety, where there isn't an established research community that has already figured out what kinds of arguments are most important.

Rohin's opinion: I strongly agree with 2 and 4, weakly agree with 1, 5, and 6, and disagree with 3.

Technical AI alignment   Problems

Defining AI wireheading (Stuart Armstrong) (summarized by Rohin): This post points out that "wireheading" is a fuzzy category. Consider a weather-controlling AI tasked with increasing atmospheric pressure, as measured by the world's barometers. If it made a tiny dome around each barometer and increased air pressure within the domes, we would call it wireheading. However, if we increase the size of the domes until it's a dome around the entire Earth, then it starts sounding like a perfectly reasonable way to optimize the reward function. Somewhere in the middle, it must have become unclear whether or not it was wireheading. The post suggests that wireheading can be defined as a subset of specification gaming (AN #1), where the "gaming" happens by focusing on some narrow measurement channel, and the fuzziness comes from what counts as a "narrow measurement channel".

Rohin's opinion: You may have noticed that this newsletter doesn't talk about wireheading very much; this is one of the reasons why. It seems like wireheading is a fuzzy subset of specification gaming, and is not particularly likely to be the only kind of specification gaming that could lead to catastrophe. I'd be surprised if we found some sort of solution where we'd say "this solves all of wireheading, but it doesn't solve specification gaming" -- there don't seem to be particular distinguishing features that would allow us to have a solution to wireheading but not specification gaming. There can of course be solutions to particular kinds of wireheading that do have clear distinguishing features, such as reward tampering (AN #71), but I don't usually expect these to be the major sources of AI risk.

Technical agendas and prioritization

The Value Definition Problem (Sammy Martin) (summarized by Rohin): This post considers the Value Definition Problem: what should we make our AI system try to do (AN #33) to have the best chance of a positive outcome? It argues that an answer to the problem should be judged based on how much easier it makes alignment, how competent the AI system has to be to optimize it, and how good the outcome would be if it was optimized. Solutions also differ on how "direct" they are -- on one end, explicitly writing down a utility function would be very direct, while on the other, something like Coherent Extrapolated Volition would be very indirect: it delegates the task of figuring out what is good to the AI system itself.

Rohin's opinion: I fall more on the side of preferring indirect approaches, though by that I mean that we should delegate to future humans, as opposed to defining some particular value-finding mechanism into an AI system that eventually produces a definition of values.

Miscellaneous (Alignment)

Self-Fulfilling Prophecies Aren't Always About Self-Awareness (John Maxwell) (summarized by Rohin): Could we prevent a superintelligent oracle from making self-fulfilling prophecies by preventing it from modeling itself? This post presents three scenarios in which self-fulfilling prophecies would still occur. For example, if instead of modeling itself, it models the fact that there's some AI system whose predictions frequently come true, it may try to predict what that AI system would say, and then say that. This would lead to self-fulfilling prophecies.

Analysing: Dangerous messages from future UFAI via Oracles and Breaking Oracles: hyperrationality and acausal trade (Stuart Armstrong) (summarized by Rohin): These posts point out a problem with counterfactual oracles (AN #59): a future misaligned agential AI system could commit to helping the oracle (e.g. by giving it maximal reward, or making its predictions come true) even in the event of an erasure, as long as the oracle makes predictions that cause humans to build the agential AI system. Alternatively, multiple oracles could acausally cooperate with each other to build an agential AI system that will reward all oracles.

AI strategy and policy

AI Alignment Podcast: Machine Ethics and AI Governance (Lucas Perry and Wendell Wallach) (summarized by Rohin): Machine ethics has aimed to figure out how to embed ethical reasoning in automated systems of today. In contrast, AI alignment starts from an assumption of intelligence, and then asks how to make the system behave well. Wendell expects that we will have to go through stages of development where we figure out how to embed moral reasoning in less intelligent systems before we can solve AI alignment.

Generally in governance, there's a problem that technologies are easy to regulate early on, but that's when we don't know what regulations would be good. Governance has become harder now, because it has become very crowded: there are more than 53 lists of principles for artificial intelligence and lots of proposed regulations and laws. One potential mitigation would be governance coordinating committees: a sort of issues manager that keeps track of a field, maps the issues and gaps, and figures out how they could be addressed.

In the intermediate term, the worry is that AI systems are giving increasing power to those who want to manipulate human behavior. In addition, job loss is a real issue. One possibility is that we could tax corporations relative to how many workers they laid off and how many jobs they created.

Thinking about AGI, governments should probably not be involved now (besides perhaps funding some of the research), since we have so little clarity on what the problem is and what needs to be done. We do need people monitoring risks, but there’s a pretty robust existing community doing this, so government doesn't need to be involved.

Rohin's opinion: I disagree with Wendell that current machine ethics will be necessary for AI alignment -- that might be the case, but it seems like things change significantly once our AI systems are smart enough to actually understand our moral systems, so that we no longer need to design special procedures to embed ethical reasoning in the AI system.

It does seem useful to have coordination on governance, along the lines of governance coordinating committees; it seems a lot better if there's only one or two groups that we need to convince of the importance of an issue, rather than 53 (!!).

Other progress in AI   Reinforcement learning

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction (C. Daniel Freeman et al) (summarized by Sudhanshu): One critique of the World Models (AN #23) paper was that in any realistic setting, you only want to learn the features that are important for the task under consideration, while the VAE used in the paper would learn features for state reconstruction. This paper instead studies world models that are trained directly from reward, rather than by supervised learning on observed future states, which should lead to models that only focus on task-relevant features. Specifically, they use observational dropout on the environment percepts, where the true state is passed to the policy with a peek probability p, while a neural network, M, generates a proxy state with probability 1 - p. At the next time-step, M  takes the same input as the policy, plus the policy's action, and generates the next proxy state, which then may get passed to the controller, again with probability 1 - p.

They investigate whether the emergent 'world model' M behaves like a good forward predictive model. They find that even with very low peek probability e.g. p = 5%, M learns a good enough world model that enables the policy to perform reasonably well. Additionally, they find that world models thus learned can be used to train policies that sometimes transfer well to the real environment. They claim that the world model only learns features that are useful for task performance, but also note that interpretability of those features depends on inductive biases such as the network architecture.

Sudhanshu's opinion: This work warrants a visit for the easy-to-absorb animations and charts. On the other hand, they make a few innocent-sounding observations that made me uncomfortable because they weren't rigourously proved nor labelled as speculation, e.g. a) "At higher peek probabilities, the learned dynamics model is not needed to solve the task thus is never learned.", and b) "Here, the world model clearly only learns reliable transition maps for moving down and to the right, which is sufficient."

While this is a neat bit of work well presented, it is nevertheless still unlikely this (and most other current work in deep model-based RL) will scale to more complex alignment problems such as Embedded World-Models (AN #31); these world models do not capture the notion of an agent, and do not model the agent as an entity making long-horizon plans in the environment.

Deep learning

SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver (Po-Wei Wang et al) (summarized by Asya): Historically, deep learning architectures have struggled with problems that involve logical reasoning, since they often impose non-local constraints that gradient descent has a hard time learning. This paper presents a new technique, SATNet, which allows neural nets to solve logical reasoning problems by encoding them explicitly as MAXSAT-solving neural network layers. A MAXSAT problem provides a large set of logical constraints on an exponentially large set of options, and the goal is to find the option that satisfies as many logical constraints as possible. Since MaxSAT is NP-complete, the authors design a layer that solves a relaxation of the MaxSAT problem in its forward pass (that can be solved quickly, unlike MaxSAT), while the backward pass computes gradients as usual.

In experiment, SATNet is given bit representations of 9,000 9 x 9 Sudoku boards which it uses to learn the logical constraints of Sudoku, then presented with 1,000 test boards to solve. SATNet vastly outperforms traditional convolutional neural networks given the same training / test setup, achieving 98.3% test accuracy where the convolutional net achieves 0%. It performs similarly well on a "Visual" Sudoku problem where the trained network consists of initial layers that perform digit recognition followed by SATNet layers, achieving 63.2% accuracy where the convolutional net achieves 0.1%.

Asya's opinion: My impression is this is a big step forward in being able to embed logical reasoning in current deep learning techniques. From an engineering perspective, it seems extremely useful to be able to train systems that encorporate these layers end-to-end. It's worth being clear that in systems like these, a lot of generality is lost since part of the network is explicitly carved out for solving a particular problem of logical constraints-- it would be hard to use the same network to learn a different problem.

News

AI Safety Unconference 2019 (David Krueger, Orpheus Lummis, and Gretchen Krueger) (summarized by Rohin): Like last year, there will be an AI safety unconference alongside NeurIPS, on Monday Dec 9 from 10am to 6pm. While the website suggests a registration deadline of Nov 25, the organizers have told me it's a soft deadline, but you probably should register now to secure a place.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Discuss

### Рациональное додзё. Оценки вероятностей

События в Кочерге - 27 ноября, 2019 - 19:30
На этом додзё расскажем о наборе инструментов, который позволит понять, насколько точно вы оцениваете реальность, как улучшить свои оценки, и как обновлять их с поступлением новой важной информации, используя теорему Байеса.

### Рациональное додзё. Оценки вероятностей

События в Кочерге - 27 ноября, 2019 - 19:30
На этом додзё расскажем о наборе инструментов, который позволит понять, насколько точно вы оцениваете реальность, как улучшить свои оценки, и как обновлять их с поступлением новой важной информации, используя теорему Байеса.

### Mental Mountains

Новости LessWrong.com - 27 ноября, 2019 - 08:30
Published on November 27, 2019 5:30 AM UTC

I.

Kaj Sotala has an outstanding review of Unlocking The Emotional Brain; I read the book, and Kaj’s review is better.

He begins:

UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how the world functions and what caused those emotions to occur. The brain then uses those models to guide our future behavior. Emotional issues and seemingly irrational behaviors are generated from implicit world-models (schemas) which have been formed in response to various external challenges. Each schema contains memories relating to times when the challenge has been encountered and mental structures describing both the problem and a solution to it.

So in one of the book’s example cases, a man named Richard sought help for trouble speaking up at work. He would have good ideas during meetings, but felt inexplicably afraid to voice them. During therapy, he described his narcissistic father, who was always mouthing off about everything. Everyone hated his father for being a fool who wouldn’t shut up. The therapist conjectured that young Richard observed this and formed a predictive model, something like “talking makes people hate you”. This was overly general: talking only makes people hate you if you talk incessantly about really stupid things. But when you’re a kid you don’t have much data, so you end up generalizing a lot from the few examples you have.

When Richard started therapy, he didn’t consciously understand any of this. He just felt emotions (anxiety) at the thought of voicing his opinion. The predictive model output the anxiety, using reasoning like “if you talk, people will hate you, and the prospect of being hated should make you anxious – therefore, anxiety”, but not any of the intermediate steps. The therapist helped Richard tease out the underlying model, and at the end of the session Richard agreed that his symptoms were related to his experience of his father. But knowing this changed nothing; Richard felt as anxious as ever.

Predictions like “speaking up leads to being hated” are special kinds of emotional memory. You can rationally understand that the prediction is no longer useful, but that doesn’t really help; the emotional memory is still there, guiding your unconscious predictions. What should the therapist do?

Here UtEB dives into the science on memory reconsolidation.

Scientists have known for a while that giving rats the protein synthesis inhibitor anisomycin prevents them from forming emotional memories. You can usually give a rat noise-phobia by pairing a certain noise with electric shocks, but this doesn’t work if the rats are on anisomycin first. Probably this means that some kind of protein synthesis is involved in memory. So far, so plausible.

A 2000 study found that anisomycin could also erase existing phobias in a very specific situation. You had to “activate” the phobia – get the rats thinking about it really hard, maybe by playing the scary noise all the time – and then give them the anisomycin. This suggested that when the memory got activated, it somehow “came loose”, and the brain needed to do some protein synthesis to put it back together again.

Thus the idea of memory reconsolidation: you form a consolidated memory, but every time you activate it, you need to reconsolidate it. If the reconsolidation fails, you lose the memory, or you get a slightly different memory, or something like that. If you could disrupt emotional memories like “speaking out makes you hated” while they’re still reconsolidating, maybe you could do something about this.

Anisomycin is pretty toxic, so that’s out. Other protein synthesis inhibitors are also toxic – it turns out proteins are kind of important for life – so they’re out too. Electroconvulsive therapy actually seems to work pretty well for this – the shock disrupts protein formation very effectively (and the more I think about this, the more implications it seems to have). But we can’t do ECT on everybody who wants to be able to speak up at work more, so that’s also out. And the simplest solution – activating a memory and then reminding the patient that they don’t rationally believe it’s true – doesn’t seem to help; the emotional brain doesn’t speak Rationalese.

The authors of UtEB claim to have found a therapy-based method that works, which goes like this:

First, they tease out the exact predictive model and emotional memory behind the symptom (in Richard’s case, the narrative where his father talked too much and ended up universally hated, and so if Richard talks at all, he too will be universally hated). Then they try to get this as far into conscious awareness as possible (or, if you prefer, have consciousness dig as deep into the emotional schema as possible). They call this “the pro-symptom position” – giving the symptom as much room as possible to state its case without rejecting it. So for example, Richard’s therapist tried to get Richard to explain his unconscious pro-symptom reasoning as convincingly as possible: “My father was really into talking, and everybody hated him. This proves that if I speak up at work, people will hate me too.” She even asked Richard to put this statement on an index card, review it every day, and bask in its compellingness. She asked Richard to imagine getting up to speak, and feeling exactly how anxious it made him, while reviewing to himself that the anxiety felt justified given what happened with his father. The goal was to establish a wide, well-trod road from consciousness to the emotional memory.

Next, they try to find a lived and felt experience that contradicts the model. Again, Rationalese doesn’t work; the emotional brain will just ignore it. But it will listen to experiences. For Richard, this was a time when he was at a meeting, had a great idea, but didn’t speak up. A coworker had the same idea, mentioned it, and everyone agreed it was great, and congratulated the other person for having such an amazing idea that would transform their business. Again, there’s this same process of trying to get as much in that moment as possible, bring the relevant feelings back again and again, create as wide and smooth a road from consciousness to the experience as possible.

Finally, the therapist activates the disruptive emotional schema, and before it can reconsolidate, smashes it into the new experience. So Richard’s therapist makes use of the big wide road Richard built that let him fully experience his fear of speaking up, and asks Richard to get into that frame of mind (activate the fear-of-speaking schema). Then she asks him, while keeping the fear-of-speaking schema in mind, to remember the contradictory experience (coworker speaks up and is praised). Then the therapist vividly describes the juxtaposition while Richard tries to hold both in his mind at once.

And then Richard was instantly cured, and never had any problems speaking up at work again. His coworkers all applauded, and became psychotherapists that very day. An eagle named “Psychodynamic Approach” flew into the clinic and perched atop the APA logo and shed a single tear. Coherence Therapy: Practice Manual And Training Guide was read several times, and God Himself showed up and enacted PsyD prescribing across the country. All the cognitive-behavioralists died of schizophrenia and were thrown in the lake of fire for all eternity.

This is, after all, a therapy book.

II.

I like UtEB because it reframes historical/purposeful accounts of symptoms as aspects of a predictive model. We already know the brain has an unconscious predictive model that it uses to figure out how to respond to various situations and which actions have which consequences. In retrospect, this framing perfectly fits the idea of traumatic experiences having outsized effects. Tack on a bit about how the model is more easily updated in childhood (because you’ve seen fewer other things, so your priors are weaker), and you’ve gone a lot of the way to traditional models of therapy.

But I also like it because it helps me think about the idea of separation/noncoherence in the brain. Richard had his schema about how speaking up makes people hate you. He also had lots of evidence that this wasn’t true, both rationally (his understanding that his symptoms were counterproductive) and experientially (his story about a coworker proposing an idea and being accepted). But the evidence failed to naturally propagate; it didn’t connect to the schema that it should have updated. Only after the therapist forced the connection did the information go through. Again, all of this should have been obvious – of course evidence doesn’t propagate through the brain, I was writing posts ten years ago about how even a person who knows ghosts exist will be afraid to stay in an old supposedly-haunted mansion at night with the lights off. But UtEB’s framework helps snap some of this into place.

UtEB’s brain is a mountainous landscape, with fertile valleys separated by towering peaks. Some memories (or pieces of your predictive model, or whatever) live in each valley. But they can’t talk to each other. The passes are narrow and treacherous. They go on believing their own thing, unconstrained by conclusions reached elsewhere.

Consciousness is a capital city on a wide plain. When it needs the information stored in a particular valley, it sends messengers over the passes. These messengers are good enough, but they carry letters, not weighty tomes. Their bandwidth is atrocious; often they can only convey what the valley-dwellers think, and not why. And if a valley gets something wrong, lapses into heresy, as often as not the messengers can’t bring the kind of information that might change their mind.

Links between the capital and the valleys may be tenuous, but valley-to-valley trade is almost non-existent. You can have two valleys full of people working on the same problem, for years, and they will basically never talk.

Sometimes, when it’s very important, the king can order a road built. The passes get cleared out, high-bandwidth communication to a particular communication becomes possible. If he does this to two valleys at once, then they may even be able to share notes directly, each passing through the capital to get to each other. But it isn’t the norm. You have to really be trying.

This ended out a little more flowery than I expected, but I didn’t start thinking this way because it was poetic. I started thinking this way because of this:

Frequent SSC readers will recognize this as from Figure 1 of Friston and Carhart-Harris’ REBUS And The Anarchic Brain: Toward A Unified Model Of The Brain Action Of Psychedelics, which I review here. The paper describes it as “the curvature of the free-energy landscape that contains neuronal dynamics. Effectively, this can be thought of as a flattening of local minima, enabling neuronal dynamics to escape their basins of attraction and—when in flat minima—express long-range correlations and desynchronized activity.”

Moving back a step: the paper is trying to explain what psychedelics do to the brain. It theorizes that they weaken high-level priors (in this case, you can think of these as the tendency to fit everything to an existing narrative), allowing things to be seen more as they are:

A corollary of relaxing high-level priors or beliefs under psychedelics is that ascending prediction errors from lower levels of the system (that are ordinarily unable to update beliefs due to the top-down suppressive influence of heavily-weighted priors) can find freer register in conscious experience, by reaching and impressing on higher levels of the hierarchy. In this work, we propose that this straightforward model can account for the full breadth of subjective phenomena associated with the psychedelic experience.

These ascending prediction errors (ie noticing that you’re wrong about something) can then correct the high-level priors (ie change the narratives you tell about your life):

The ideal result of the process of belief relaxation and revision is a recalibration of the relevant beliefs so that they may better align or harmonize with other levels of the system and with bottom-up information—whether originating from within (e.g., via lower-level intrinsic systems and related interoception) or, at lower doses, outside the individual (i.e., via sensory input or extroception). Such functional harmony or realignment may look like a system better able to guide thought and behavior in an open, unguarded way (Watts et al., 2017; Carhart-Harris et al., 2018b).

This makes psychedelics a potent tool for psychotherapy:

Consistent with the model presented in this work, overweighted high-level priors can be all consuming, exerting excessive influence throughout the mind and brain’s (deep) hierarchy. The negative cognitive bias in depression is a good example of this (Beck, 1972), as are fixed delusions in psychosis (Sterzer et al., 2018).25 In this paper, we propose that psychedelics can be therapeutically effective, precisely because they target the high levels of the brain’s functional hierarchy, primarily affecting the precision weighting of high-level priors or beliefs. More specifically, we propose that psychedelics dose-dependently relax the precision weighting of high-level priors (instantiated by high-level cortex), and in so doing, open them up to an upsurge of previously suppressed bottom-up signaling (e.g., stemming from limbic circuitry). We further propose that this sensitization of high-level priors means that more information can impress on them, potentially inspiring shifts in perspective, felt as insight. One might ask whether relaxation followed by revision of high-level priors or beliefs via psychedelic therapy is easy to see with functional (and anatomic) brain imaging. We presume that it must be detectable, if the right questions are asked in the right way.

Am I imagining this, or are Friston + Carhart-Harris and Unlocking The Emotional Brain getting at the same thing?

Both start with a piece of a predictive model (= high-level prior) telling you something that doesn’t fit the current situation. Both also assume you have enough evidence to convince a rational person that the high-level prior is wrong, or doesn’t apply. But you don’t automatically smash the prior and the evidence together and perform an update. In UtEB‘s model, the update doesn’t happen until you forge conscious links to both pieces of information and try to hold them in consciousness at the same time. In F+CH’s model, the update doesn’t happen until you take psychedelics which make the high-level prior lose some of its convincingness. UtEB is trying to laboriously build roads through mountains; F+CH are trying to cast a magic spell that makes the mountains temporarily vanish. Either way, you get communication between areas that couldn’t communicate before.

III.

Why would mental mountains exist? If we keep trying to get rid of them, through therapy or psychedelics, or whatever, then why not just avoid them in the first place?

Maybe generalization is just hard (thanks to MC for this idea). Suppose Goofus is mean to you. You learn Goofus is mean; if this is your first social experience, maybe you also learn that the world is mean and people have it out for you. Then one day you meet Gallant, who is nice to you. Hopefully the system generalizes to “Gallant is nice, Goofus is still mean, people in general can go either way”.

But suppose one time Gallant is just having a terrible day, and curses at you, and that time he happens to be wearing a red shirt. You don’t want to overfit and conclude “Gallant wearing a red shirt is mean, Gallant wearing a blue shirt is nice”. You want to conclude “Gallant is generally nice, but sometimes slips and is mean.”

But any algorithm that gets too good at resisting the temptation to separate out red-shirt-Gallant and blue-shirt-Gallant risks falling into the opposite failure mode where it doesn’t separate out Gallant and Goofus. It would just average them out, and conclude that people (including both Goofus and Gallant) are medium-niceness.

And suppose Gallant has brown eyes, and Goofus green eyes. You don’t want your algorithm to overgeneralize to “all brown-eyed people are nice, and all green-eyed people are mean”. But suppose the Huns attack you. You do want to generalize to “All Huns are dangerous, even though I can keep treating non-Huns as generally safe”. And you want to do this as quickly as possible, definitely before you meet any more Huns. And the quicker you are to generalize about Huns, the more likely you are to attribute false significance to Gallant’s eye color.

The end result is a predictive model which is a giant mess, made up of constant “This space here generalizes from this example, except this subregion, which generalizes from this other example, except over here, where it doesn’t, and definitely don’t ever try to apply any of those examples over here.” Somehow this all works shockingly well. For example, I spent a few years in Japan, and developed a good model for how to behave in Japanese culture. When I came back to the United States, I effortlessly dropped all of that and went back to having America-appropriate predictions and reflexive actions (except for an embarrassing habit of bowing whenever someone hands me an object, which I still haven’t totally eradicated).

In this model, mental mountains are just the context-dependence that tells me not to use my Japanese predictive model in America, and which prevents evidence that makes me update my Japanese model (like “I notice subways are always on time”) from contaminating my American model as well. Or which prevent things I learn about Gallant (like “always trust him”) from also contaminating my model of Goofus.

There’s actually a real-world equivalent of the “red-shirt-Gallant is bad, blue-shirt-Gallant is good” failure mode. It’s called “splitting”, and you can find it in any psychology textbook. Wikipedia defines it as “the failure in a person’s thinking to bring together the dichotomy of both positive and negative qualities of the self and others into a cohesive, realistic whole.”

In the classic example, a patient is in a mental hospital. He likes his doctor. He praises the doctor to all the other patients, says he’s going to nominate her for an award when he gets out.

Then the doctor offends the patient in some way – maybe refuses one of his requests. All of a sudden, the doctor is abusive, worse than Hitler, worse than Mengele. When he gets out he will report her to the authorities and sue her for everything she owns.

Then the doctor does something right, and it’s back to praise and love again.

The patient has failed to integrate his judgments about the doctor into a coherent whole, “doctor who sometimes does good things but other times does bad things”. It’s as if there’s two predictive models, one of Good Doctor and one of Bad Doctor, and even though both of them refer to the same real-world person, the patient can only use one at a time.

Splitting is most common in borderline personality disorder. The DSM criteria for borderline includes splitting (there defined as “a pattern of unstable and intense interpersonal relationships characterized by alternating between extremes of idealization and devaluation”). They also include things like “markedly and persistently unstable self-image or sense of self”, and “affective instability due to a marked reactivity of mood”, which seem relevant here too.

Some therapists view borderline as a disorder of integration. Nobody is great at having all their different schemas talk to each other, but borderlines are atrocious at it. Their mountains are so high that even different thoughts about the same doctor can’t necessarily talk to each other and coordinate on a coherent position. The capital only has enough messengers to talk to one valley at a time. If tribesmen from the Anger Valley are advising the capital today, the patient becomes truly angry, a kind of anger that utterly refuses to listen to any counterevidence, an anger pure beyond your imagination. If they are happy, they are purely happy, and so on.

About 70% of people diagnosed with dissociative identity disorder (previously known as multiple personality disorder) have borderline personality disorder. The numbers are so high that some researchers are not even convinced that these are two different conditions; maybe DID is just one manifestation of borderline, or especially severe borderline. Considering borderline as a failure of integration, this makes sense; DID is total failure of integration. People in the furthest mountain valleys, frustrated by inability to communicate meaningfully with the capital, secede and set up their own alternative provincial government, pulling nearby valleys into their new coalition. I don’t want to overemphasize this; most popular perceptions of DID are overblown, and at least some cases seem to be at least partly iatrogenic. But if you are bad enough at integrating yourself, it seems to be the sort of thing that can happen.

In his review, Kaj relates this to Internal Family Systems, a weird form of therapy where you imagine your feelings as people/entities and have discussions with them. I’ve always been skeptical of this, because feelings are not, in fact, people/entities, and it’s unclear why you should expect them to answer you when you ask them questions. And in my attempts to self-test the therapy, indeed nobody responded to my questions and I was left feeling kind of silly. But Kaj says:

As many readers know, I have been writing a sequence of posts on multi-agent models of mind. In Building up to an Internal Family Systems model, I suggested that the human mind might contain something like subagents which try to ensure that past catastrophes do not repeat. In subagents, coherence, and akrasia in humans, I suggested that behaviors such as procrastination, indecision, and seemingly inconsistent behavior result from different subagents having disagreements over what to do.

As I already mentioned, my post on integrating disagreeing subagents took the model in the direction of interpreting disagreeing subagents as conflicting beliefs or models within a person’s brain. Subagents, trauma and rationality further suggested that the appearance of drastically different personalities within a single person might result from unintegrated memory networks, which resist integration due to various traumatic experiences.

This post has discussed UtEB’s model of conflicting emotional schemas in a way which further equates “subagents” with beliefs – in this case, the various schemas seem closely related to what e.g. Internal Family Systems calls “parts”. In many situations, it is probably fair to say that this is what subagents are.

This is a model I can get behind. My guess is that in different people, the degree to which mental mountains form a barrier will cause the disconnectedness of valleys to manifest as anything from “multiple personalities”, to IFS-findable “subagents”, to UtEB-style psychiatric symptoms, to “ordinary” beliefs that don’t cause overt problems but might not be very consistent with each other.

IV.

This last category forms the crucial problem of rationality.

One can imagine an alien species whose ability to find truth was a simple function of their education and IQ. Everyone who knows the right facts about the economy and is smart enough to put them together will agree on economic policy.

But we don’t work that way. Smart, well-educated people believe all kinds of things, even when they should know better. We call these people biased, a catch-all term meaning something that prevents them from having true beliefs they ought to be able to figure out. I believe most people who don’t believe in anthropogenic climate change are probably biased. Many of them are very smart. Many of them have read a lot on the subject (empirically, reading more about climate change will usually just make everyone more convinced of their current position, whatever it is). Many of them have enough evidence that they should know better. But they don’t.

(again, this is my opinion, sorry to those of you I’m offending. I’m sure you think the same of me. Please bear with me for the space of this example.)

Compare this to Richard, the example patient mentioned above. Richard had enough evidence to realize that companies don’t hate everyone who speaks up at meetings. But he still felt, on a deep level, like speaking up at meetings would get him in trouble. The evidence failed to connect to the emotional schema, the part of him that made the real decisions. Is this the same problem as the global warming case? Where there’s evidence, but it doesn’t connect to people’s real feelings?

(maybe not: Richard might be able to say “I know people won’t hate me for speaking, but for some reason I can’t make myself speak”, whereas I’ve never heard someone say “I know climate change is real, but for some reason I can’t make myself vote to prevent it.” I’m not sure how seriously to take this discrepancy.)

In Crisis of Faith, Eliezer Yudkowsky writes:

Many in this world retain beliefs whose flaws a ten-year-old could point out, if that ten-year-old were hearing the beliefs for the first time. These are not subtle errors we’re talking about. They would be child’s play for an unattached mind to relinquish, if the skepticism of a ten-year-old were applied without evasion…we change our minds less often than we think.

This should scare you down to the marrow of your bones. It means you can be a world-class scientist and conversant with Bayesian mathematics and still fail to reject a belief whose absurdity a fresh-eyed ten-year-old could see. It shows the invincible defensive position which a belief can create for itself, if it has long festered in your mind.

What does it take to defeat an error that has built itself a fortress?

He goes on to describe how hard this is, to discuss the “convulsive, wrenching effort to be rational” that he thinks this requires, the “all-out [war] against yourself”. Some of the techniques he mentions explicitly come from psychotherapy, others seem to share a convergent evolution with it.

The authors of UtEB stress that all forms of therapy involve their process of reconsolidating emotional memories one way or another, whether they know it or not. Eliezer’s work on crisis of faith feels like an ad hoc form of epistemic therapy, one with a similar goal.

Here, too, there is a suggestive psychedelic connection. I can’t count how many stories I’ve heard along the lines of “I was in a bad relationship, I kept telling myself that it was okay and making excuses, and then I took LSD and realized that it obviously wasn’t, and got out.” Certainly many people change religions and politics after a psychedelic experience, though it’s hard to tell exactly what part of the psychedelic experience does this, and enough people end up believing various forms of woo that I hesitate to say it’s all about getting more rational beliefs. But just going off anecdote, this sometimes works.

Rationalists wasted years worrying about various named biases, like the conjunction fallacy or the planning fallacy. But most of the problems we really care about aren’t any of those. They’re more like whatever makes the global warming skeptic fail to connect with all the evidence for global warming.

If the model in Unlocking The Emotional Brain is accurate, it offers a starting point for understanding this kind of bias, and maybe for figuring out ways to counteract it.

Discuss

### Could someone please start a bright home lighting company?

Новости LessWrong.com - 26 ноября, 2019 - 22:20
Published on November 26, 2019 7:20 PM UTC

Elevator pitch: Bring enough light to simulate daylight into your home and office.

This idea has been shared in Less Wrong circles for a couple years. Yudkowsky wrote Inadequate Equilibria in 2017 where he and his wife invented the idea, and Raemon wrote a playbook in 2018 for how to do it yourself. Now I and at least two other friends are trying to build something similar, and I suspect there's a bigger-than-it-looks market opportunity here because it's one of those things that a lot of people would probably want, if they knew it existed and could experience it. And it's only recently become cheap enough to execute well.

Coelux makes a high-end artificial skylight which certainly looks awesome, but it costs upwards of $30k and also takes a lot of headroom in the ceiling. Can we do better for cheaper? Brightness from first principles First let's clear up some definitions: • Watts is a measure of power consumption, not brightness. • "Watt equivalent" brightness is usually listed for LED bulbs, at least for the standard household bulb form factor. You should generally ignore this (instead, just look at the lumens rating), because it is confusing. Normally "watt equivalent" is computed by dividing lumens by 15 or so. (bulb manufacturers like to make LED bulbs that are easy to compare, by having similar brightness to the incandescents they replace, hence "watt equivalent") • Lumens output is a measurement of an individual bulb, but says nothing about the distribution of those rays of light. For that you want to be doing math to estimate lux. • "Lux", or "luminous flux", is the measurement of how bright light is on a certain surface (such as a wall or your face). Lux is measured in lumens per square meter. Usually, your end goal when designing lighting is to create a certain amount of lux. • Direct sunlight shines 100k lux (source for these on Wikipedia) • Full daylight (indirect) is more than 10k lux • An overcast day or bright TV studio lighting is 1000 lux • Indoor office lighting is typically 500 • Indoor living room at night might be only 50 Side note: This scale surprises me greatly! We usefully make use of vision with four or more orders of magnitude differences in lux within a single day. Our human vision hardware is doing a lot of work to make the world look reasonable within these vast differences of amount of light. Regardless, this post is about getting a lot of lux. I hypothesize that lux is associated with both happiness and productivity, and during the "dark season" when we don't get as much lux from the sun, I'm looking to get some from artificial lights. If you put a single 1000-lumen (66-watt-equivalent) omnidirectional bulb in the center of a spherical room of 2m radius (which approximates a 12' square bedroom), the lux at the radius of the sphere is 50. So now we can get a sense of the scope of the problem. When doctors say you should be getting 10,000 lux for 30 minutes a day, the defaults for home lighting are two orders of magnitude off. • Raemon's bulbs are "100W equivalent" which is ~1500 lumens per bulb. So he's got 36k lumens. If we treat this as a point source and expect that Raemon's head is 2m away from the bulbs, then he's getting 1800 lux, which is twice the "TV studio" lighting and seems pretty respectable. I haven't accounted for reflected light from the ceiling either, so reality might be better than this, but I doubt it changes the calculation by more than a factor of 2 -- but I don't have a robust way of estimating ambient light, so ideas are welcome. • David Chapman's plan uses three 20k-lumen LED light bars for offroad SUV driving, for a total of 60k lumens. But because the light bars aim the light at a relatively focused point on the floor, David estimates that most of that light is being delivered to a roughly 6-square-meter workspace for a total of 10k lux. The photos he shared of his workspace seem to support this estimate. Other important factors besides brightness Color temperature seems important to well-being. Color temperature is measured in kelvins with reference to black-body radiation, but you can think of it as, on the spectrum from "warm white" to "cool white", what do you prefer? Raemon's plan uses an even split between 2700K and 5000K bulbs. 2700K is quite yellow-y, 5000 is nearly pure white. In my experimentation I discovered that I liked closer to 5000 in the mornings and closer to 2700 in evenings. And what about light distribution? Large "panels" of bright light would seem the closest to daylight in form-factor. Real windows are brighter near the top, and it is considered dramatic and unnatural to have bright lighting coming from the ground. Also, single bright point sources are painful to look at and can seem harsh. I think there's a lot of flexibility here, but I think my personal ideal light would be a large, window-sized panel of light mounted on the ceiling or high on the wall. Also, color accuracy: LEDs are notoriously narrow spectrum by default; manufacturers have to do work to make their LEDs look more like incandescent bulbs in how they light up objects of different colors. Check for a measure called Color Rendering Index, or CRI, in product descriptions. 100 is considered perfect color rendering, and anything less than 80 looks increasingly awful as you go down. The difference between CRI 80 and 90 is definitely noticeable to some people. I haven't blind tested myself, and definitely might be imagining it, but I feel like there was some kind of noticeable upgrade of the "coziness" or "warmth" in my room when upgrading from CRI 80 to CRI 95 bulbs. Dimmability? (Are you kidding? We want brightness, not dimness!) Okay, fine, if you insist. Most high-end LED bulbs seem dimmable today, so I hope this is not an onerous requirement. Last thing I can think of is flicker. I have only seen flicker as a major problem with really low-end bulbs, but I can easily see and be annoyed by 60hz flicker out of the corner of my eye. Cheap Christmas LED light strings have super bad flicker, but it seems like manufacturers of nicer LEDs today have caught on, because I haven't had any flicker problems with LED bulbs in years. Okay, so to summarize: I want an all-in-one "light panel" that produces at least 20000 lumens and can be mounted to a wall or ceiling, with no noticeable flicker, good CRI, and adjustable (perhaps automatically adjusting) color temperature throughout the day. A redditor made a fake window for their basement which is quite impressive for under$200. This is definitely along the axis I am imagining.

I haven't mentioned operating cost. Full-spectrum LEDs seem to output about 75 lumens per watt, so if our panel is 20k lumens then we should expect our panel to draw 266 watts. This seems reasonable to me. If you leave it on 8 hours a day, you're going to use 25 cents per day in electricity (at $.12 per kWh). Marketing and Costs What do you think people will pay for the product? I have already put 6+ hours into researching this and don't have a satisfactory solution yet. I would probably pay at least$400 to get that time back, if the result satisfied all my requirements; I expect to put in quite a bit more time, so I think I could probably be convinced to pay north of $1000 for a really good product. Hard to say what others would pay, but I wouldn't be surprised if you could build a good product in the$400-1200 range that would be quite popular.

What about costs? Today, Home Depot sells Cree 90-CRI, 815-lumen bulbs on their website for $1.93 per bulb for a cost of$2.37 per 1000 lumens. This is the cheapest I've seen high quality bulbs. (The higher lumen bulbs are annoyingly quite a bit more expensive). To get 36k lumens at this price costs under $100 retail. Presumably there are cooling considerations when packing LEDs close together but those seem solvable if you're doing the "panel" form factor. There are other costs I'm sure, but it seems like the LEDs and driver are likely to dominate most of the costs. These are dimmable but not color temperature adjustable. Yuji LEDs sells 2700K-6500K dimmable LED strips, also with 95+ CRI, at$100 for 6250 lumens (so a cost of \$16 per 1000 lumens). This is 7x more expensive per lumen, but knowing that it exists is really helpful.

Promotion and Distribution

Kickstarter is the obvious idea for getting this idea out there. I would also recommend starting a subreddit (if it doesn't exist; I haven't checked yet) for do-it-yourselfers who want to build or buy really bright lighting systems for their homes, as I think there is probably enough sustained interest in such a topic for it to exist.

You can also try to get press. The idea of "indoor light as bright as daylight" is probably somewhat viral so I'd hope you can get people to write about you. Coelux got a bunch of press a few years ago doing this exact thing, but their product is so expensive that they don't even list their price on their website, but in articles about Coelux you can see people commenting that they wish they could afford one.

I do think the idea needs to be spread more. Most people don't know this is possible, so there's a lot of work you'll be doing to just explain that such a thing is possible and healthy.

Competition?

I don't think there's any relevant competition out there today. Coelux is super high end. The competition is do-it-yourselfers, but this market is far bigger than the number of people who are excited to do-it-themself.

Some have mentioned "high bay" lights, which are designed to be mounted high in warehouses and such, and throw a light cone a long distance to the floor. I am excited to try this and I will probably try it next, but I am not super optimistic about it because I expect it to be quite harsh. This is the one that Yuji sells, but you can find cheaper and presumably lower-quality ones on Amazon.

Part of my motivation for writing this blog post is to source ideas for other things that exist that could fill this niche. Comment here if you solved this problem in a way I haven't described! I'll update this post with ideas. If you start this company, also email me and I'll buy one and try your product and probably write about it :)

If you put a bunch of research into designing a really great product and it succeeds but gets effectively copied by low-cost clones, you'll be sad. I am not sure how to defend this, and I think it is probably the weakest point of this business model; but it is a weakness that many hardware companies share, and a lot of them still carve out a niche. One idea would be to build up your product's branding and reputation, by explaining why low-cost clones suck in various ways. Another is just to give really good service. Lastly, if you avoid manufacturing things in China, maybe Chinese clone companies won't copy your technology as quickly.

Discuss

### 3 Cultural Infrastructure Ideas from MAPLE

Новости LessWrong.com - 26 ноября, 2019 - 21:56
Published on November 26, 2019 6:56 PM UTC

About six months ago, I moved to the Monastic Academy in Vermont. MAPLE for short.

You may have curiosities / questions about what that is and why I moved there. But I'll save that for another time.

I was having a conversation last week about some cultural infrastructure that exists at MAPLE (that I particularly appreciate), and these ideas seemed worth writing up.

Note that MAPLE is a young place, less than a decade old in its current form. So, much of it is "experimental." These ideas aren't time-tested. But my personal experience of them has been surprisingly positive, so far.

I hope you get value out of playing with these ideas in your head or even playing with various implementations of these ideas.

1. The Care Role or Care People

MAPLE loves its roles. All residents have multiple roles in the community.

Some of them are fairly straightforward and boring. E.g. someone's role is to write down the announcements made at meals and then post them on Slack later.

Some of them are like "jobs" or "titles". E.g. someone is the bookkeeper. Someone is the Executive Director.

One special role I had the honor of holding for a few months was the Care role.

The Care role's primary aim is to watch over the health and well-being of the community as a group. This includes their physical, mental, emotional, and psychological well-being.

The Care role has a few "powers."

The Care role can offer check-ins or "Care Talks" to people. So if I, in the Care role, notice someone seems to be struggling emotionally, I can say, "Hey would you like to check in at some point today?" and then schedule such a meeting. (MAPLE has a strict schedule, and this is not something people would normally be able to do during work hours, but it's something Care can do.)

People can also request Care Talks from Care.

The Care role also has the power to plan / suggest Care Days. These are Days for community bonding and are often either for relaxation or emotional processing. Some examples of Care Days we had: we went bowling; we did a bunch of Circling; we visited a nearby waterfall.

The Care role can request changes to the schedule if they believe it would benefit the group's well-being. E.g. asking for a late wake-up. (Our usual wake-up is 4:40AM!)

Ultimately though, the point of this is that it's someone's job to watch over the group in this particular way. That means attending to the group field, learning how to read people even when they are silent, being attentive to individuals but also to the "group as a whole."

For me as Care, it gave me the permission and affordance to devote part of my brain function to tracking the group. Normally I would not bother devoting that much energy and attention to it because I know I wouldn't be able to do much about it even if I were tracking it.

Why devote a bunch of resource to tracking something without the corresponding ability / power to affect it?

But since it was built into the system, I got full permission to track it and then had at least some options for doing something about what I was noticing.

This was also a training opportunity for me. I wasn't perfect at the job. I felt drained sometimes. I got snippy and short sometimes. But it was all basically allowing me to train and improve at the job, as I was doing it. No one is perfect at the Care role. Some people are more suitable than others. But no one is perfect at it.

The Care role also has a Care assistant. The Care assistant is someone to pick up the slack when needed or if Care goes on vacation or something. In practice, I suspect I split doing Care Talks fairly evenly with the Care assistant, since those are a lot for one person to handle. And, people tend to feel more comfortable with certain Care people over others, so it's good to give them an option. The Care assistant is also a good person for the Care role to get support from, since it tends to be more challenging for the Care role to receive Care themselves.

I could imagine, for larger groups, having a Care Team rather than a single Care role with Care assistant.

That said, there is a benefit to having one person hold the mantle primarily. Which is to ensure that someone is mentally constructing a model of the group plus many of the individuals within it, keeping the bird's eye view map. This should be one of Care's main mental projects. If you try to distribute this task amongst multiple people, you'll likely end up with a patchy, stitched-together map.

In addition, understanding group dynamics and what impacts the group is another good mental project for the Care person. E.g. learning how it impacts the group when leaders exhibit stress. Learning how to use love languages to tailor care for individuals. Etc.

1.5. The Ops Role

As an addendum, it's worth mentioning the Ops role too.

At MAPLE, we follow a strict schedule and also have certain standards of behavior.

The Ops role is basically in charge of the schedule and the rules and the policies at MAPLE. They also give a lot of feedback to people (e.g. "please be on time"). This is a big deal. It is also probably the hardest role.

It is important for the Ops role and the Care role to not be the same person, if you can afford it.

The Ops role represents, in a way, "assertive care." The Care role represents "supportive care." These are terms about healthy, skillful parenting that I read originally from the book Growing Up Again.

Basically, assertive points to structure, and supportive points to nurture. Both are vital.

Care builds models of the group's physical and emotional well-being, how their interactions are going, and reading people.

Ops builds models of what parts of the structure / schedule are important, how to be fair, how to be reasonable, noticing where things are slipping, building theories as to why, and figuring out adjustments. Ops has to learn how to give and receive feedback a lot more. Ops has to make a bunch of judgment calls about what would benefit the group and what would harm the group (in the short-term and long-term), and ultimately has to do it without a higher authority telling them what to do.

It's a difficult position, but it complements the Care role very well.

As Care, I noticed that people seemed to be worse off and struggled more when the Ops role failed to hold a strong, predictable, and reasonable container. The Ops role is doing something that ultimately cares for people's emotional, mental, and physical well-being—same as Care. But they do it from a place of more authority and power.

As Care, I would sometimes find myself wanting to do some "Ops"-like things—like remind people about rules or structures. But it's important for Care to avoid handling those tasks, so that people feel more open and not have that "up for them" with Care. Care creates a space where people can process things and just get support.

It's not really beneficial for Care to take on the Ops role, and it's not beneficial for Ops to take on the Care role. This creates floppiness and confusion.

2. Support Committees

Sometimes, people struggle at MAPLE. Once in a while, they struggle in a way that is more consistent and persistent, in an "adaptive challenge" way. A few Care Talks aren't sufficient for helping them.

If someone starts struggling in this way, MAPLE can decide to spin up a support committee for that person. Let's call this struggling person Bob.

The specific implementation at MAPLE (as far as I know, at this particular time) is:

• Three people are selected to be on Bob's support committee.
• Some factors in deciding those people include: Is Bob comfortable with them? Do they have time? Do they want to support Bob? Do they seem like they'd do a decent job of supporting Bob?
• The way the decision actually gets made differs for each case, but it probably always involves the Executive Director.
• The support committee meets with Bob about once a week.
• They discuss ways they can be supportive to Bob. Could he use reminders to avoid caffeine? Could he use an exercise accountability person? Could he use regular Care Talks? Could he use help finding a therapist?
• They also give Bob feedback of various kinds. E.g. maybe Bob has been making chit-chat during silent periods; maybe Bob has been yelling things at Alice when he gets scared; maybe Bob is taking naps during work period. In this frame, it should be clear that Bob is the responsible party for his own growth and improvement and well-being. Ultimately he has to hold to his commitments / responsibilities / roles in the community, and the support committee can't do that for him. But they can help him as much as seems reasonable / worth trying.
• Current implementation of this doesn't have a pre-set deadline for when the committee ceases, but there are check-ins with the Executive Director to see how things are progressing with Bob and the support committee.
• Sometimes, it might come to make sense to ask Bob to leave the community, if things aren't improving after some time has passed (3-6 months maybe?). If everyone put in their best effort, within reason, and still Bob can't hold to his commitments, despite everyone's best intentions, then there may be a decision to part ways.
• Hopefully most of the time, the support committee thing works enough to get Bob to a place where he's no longer struggling and can get back into the flow of things without a support committee.

I appreciate support committees!

They're trying to strike a tricky balance between being supportive and holding people accountable. But they keep communication channels open and treat it like a two-way street.

Bob isn't totally in the dark about what's going on. He isn't being suddenly told there's a problem and that he can't stay. He also isn't being held totally responsible, as one might be at a normal job. "Either shape up or ship out" sort of thing. It's also not the thing where people act "open and supportive" but really it's still "on you" to fix yourself, and no one lifts a finger, and you have to do all the asking.

With a support committee, Bob gets regular support from the community in a structured way. He gets to set goals for himself, in front of others. He gets regular feedback on how he's doing on those goals. If he needs help, he has people who can brainstorm with him on how to get that help, and if they commit to helping him in some way, they actually do it. If he needs someone to talk to, he can have regularly scheduled Care Talks.

He is neither being coddled nor neglected.

It's also helpful to generally foster a feeling that the community is here for you and that there's a desire to do what's best for everyone, from all parties.

Would this kind of thing work everywhere for all groups? No, of course not.

It's a bit resource-intensive as it currently is. It also seems to ask for a high skill level and value-aligned-ness from people. But there's room to play around with the specific format.

3. The Schedule

The Schedule at MAPLE is not viable for most people in most places.

But many people who come to stay at MAPLE find out that the Schedule is something they hugely benefit from having. It's often named as one of the main pros to MAPLE life.

Basically, there's a rigid schedule in place. It applies to five-and-a-half days out of the week. (Sundays are weird because we go into town to run an event; Mondays are off-schedule days.)

But most days, it's the same routine, and everyone follows it. (The mornings and evenings are the most regimented part of the day, with more flexibility in the middle part.)

4:40AM chanting. 5:30AM meditation. 7AM exercise. 8:05AM breakfast. Then work. Etc. Etc. Up until the last thing, 8:30PM chanting.

Which is more surprising:

• The fact most people, most of the time, show up on time to each of these activities? (Where "on time" means being a little bit early?)
• Or the fact that often there's at least one person who's at least one minute late, despite there theoretically being very few other things going on, relatively speaking?

¯\_(ツ)_/¯

Anyway, here's why I think the Schedule is worth talking about as a cultural infrastructure idea:

It's more conducive to getting into spontaneous motion.

You don't have to plan (as much) about what you're going to do, when. The activities come one right after the other.

At MAPLE I don't get stuck in bed, wondering whether to get up now or later.

I have spent hours and hours of my life struggling with getting out of bed (yay depression). Regardless of my mood or energy level, I just get out of bed, and it's automatic, and I don't think about it, and suddenly I'm putting on my socks, and I'm out the door.

This has translated somewhat to my off-schedule / vacation days also.

When left to my own devices, I do not exercise. I have never managed to exercise regularly as an adult. While I'm on-schedule, I just do it. I don't push myself harder than I can push; sometimes I take it easy and focus on stretching and Tai Chi. But sometimes I sprint, and sometimes I get sore, and my stamina is noticeably higher than before.

This is so much better than what it was like without the Schedule! It has proven to be more effective than my years of attempts to debug the issue using introspection.

The Schedule lets me just skip the decision paralysis. I often find myself "just spontaneously doing it." It becomes automatic habit. Like starting the drive home and "waking up" to the fact I am now home.

This is relaxing. It's more relaxing to just exercise than to internally battle over whether to exercise. It's more relaxing to just get up and start the day than to internally struggle over whether to get up. There is relief in it.

It's easier to tell when people are going through something.

As Care, it was my job to track people's overall well-being.

As it turns out, if someone starts slipping on the Schedule (showing up even a bit late to things more often), it's often an indication of something deeper.

The Schedule provides all these little fishing lines that tug when someone could use some attention, and the feedback is much faster than a verbal check-in.

Sometimes I would find myself annoyed by someone falling through or breaking policy or whatever. If I dug into it, I'd often find out they were struggling on a deeper level. Like I might find out their mom is in the hospital, or they are struggling with a flare up of chronic pain, or something like that.

Once I picked up on that pattern, I learned to view people's small transgressions or tardiness as a signal for more compassion, rather than less. Where my initial reaction might be to tense up, feel resistance, or get annoyed, I can remind myself that they're probably going through some Real Shit and that I would struggle in that situation too, and then I relax.

Everyone's doing it together.

Everyone doing something together is conducive conditions for creating common knowledge, even when there's no speaking involved. Common knowledge is a huge efficiency gain. And I suspect it's part of why it's internally easy for me to "just do it." (And maybe points to why it's harder for me to "just do it" when no one else notices or cares.)

Having more shared reality with each other reduces the need for verbal communication, formal group decision-making processes, and internal waffling.

If everyone can see the fire in the kitchen, you don't need to say a word. People will just mobilize and put out the fire.

If everyone sees that Carol is late, and Carol knows everyone has seen that she is late, it's harder for anyone to create alternative stories, like "Carol was actually on time." No one has to waste time on that.

There are lots of more flexible versions of the Schedule that people use and benefit from already. Shared meals in group houses, for instance.

But I'd love to see more experimentation with this, in communities or group houses or organizations or what-have-you.

Dragon Army attempted some things in this vein, and I saw them getting up early and exercising together on a number of occasions. I'd love to see more attempts along these lines.

Discuss

### Уличная эпистемология. Тренировка

События в Кочерге - 26 ноября, 2019 - 19:30
Уличная эпистемология – это особый способ ведения диалогов. Он позволяет исследовать любые убеждения, даже на самые взрывные темы, при этом не скатываясь в спор и позволяя собеседникам улучшать методы познания.

Новости LessWrong.com - 26 ноября, 2019 - 17:30
Published on November 26, 2019 2:30 PM UTC

• Products would be a lot stickier. A lot of advertising tries to move people between competitors. Sometimes it's an explicit "here's a way we're better" (ex: we don't charge late fees), other times it's a more general "you should think positively of our company" (ex: we agree with you on political issue Y). Banning ads would probably mean higher prices (Benham 2013) since it would be harder to compete on price.

• Relatedly, it would be much harder to get many new products started. Say a startup makes a new credit card that keeps your purchase history private: right now a straightforward marketing approach would be (a) show that other credit cards are doing something their target audience doesn't like, (b) build on the audience's sense that this isn't ok, and (c) present the new card as a solution. Without ads they would likely still see uptake among people who were aware of the problem and actively looking for a solution, but mostly people would just stick with the well-known cards.

• A major way ads work is by building brand associations: people who eat Powdermilk Biscuits are probably Norwegian bachelor farmers, listen to public radio, or want to signal something along those lines. Branded products both provide something of a service, by making more ways to signal identity, and charge for it, by being more expensive to pay for clever ad campaigns. Without ads we would probably still have these associations, however, and products that happened to be associated with coveted identities would still have this role. The way these associations would develop would be less directed, though brands would probably still try pretty hard to influence them even without ads. You can also choose to signal the "frugal" identity, which lets you avoid the brand tax.

• Reviewers would be much more trustworthy. There's a long history of reviewers getting 'captured' by the industry they review.

• Purchases of things people hadn't tried before would decrease, both things that people were in retrospect happy to have bought and things they were not. One of the roles of advertising is to let people know about things that, if they knew about them they would want to buy. But "buy stuff they don't need" isn't a great gloss for this, since after buying the products people often like them a lot. On the other hand I do think this applies to children, and one of the things people learn as they grow up is how to interpret ads. Which is also why we have regulations on ads directed at kids.

Don't put too much stock in this: I work on the technical side of ads and don't have a great view into their social role, and even if I was in a role like that it would still be very hard to predict how the world would be different with such a large change. But broad "we'd see more of X and less of Y" analysis gives a way to explore the question, and I'm curious what other people's impressions are.

(Disclosure: I work in ads but am speaking only for myself. I may be biased, though if I thought my work was net negative I wouldn't do it.)

Discuss

### A test for symbol grounding methods: true zero-sum games

Новости LessWrong.com - 26 ноября, 2019 - 17:15
Published on November 26, 2019 2:15 PM UTC

Imagine there are two AIs playing a debate game. The game is zero-sum; at the end of the debate, the human judge assigns the winner, and that AI gets a +1 reward, while the other one gets a −1.

Except the game, as described, is not truly zero-sum. That is because the AI "get" a reward. How is that reward assigned? Presumably there is some automated system that, when the human presses a button, routes +1 to one AI and −1 to another. These rewards are stored as bits, somewhere "in" or around the two AIs.

Thus there are non zero-sum options: you could break into the whole network, gain control of the automated system, and route +1 to each AI - or, why not, +10100 or even +fψ(ΩΩΩ)(4) or whatnot[1].

Thus, though we can informally say that "the AIs are in a zero-sum game as to which one wins the debate", that sentence is not properly grounded in the world; it is only true as long as certain physical features of the world are maintained, features which are not mentioned in that sentence.

Symbol grounding implies possibility of zero-sum

Conversely, imagine that an AI has a utility/reward U/R which is properly grounded in the world. Then it seems that we should be able to construct an AI with utility/reward −U/−R which is also properly grounded in the world. So it seems that any good symbol grounding system should allow us to define truly zero sum games between AIs.

There are, of course, a few caveats. Aumann's agreement theorem requires unboundedly rational agents with common priors. Similarly, though properly grounded U and −U are zero-sum, the agents might not be fully zero-sum with each other, due to bounded rationality or different priors.

Indeed, it is possible to setup a situation where even unboundedly rational agents with common prior will knowingly behave in not-exactly zero-sum ways with each other; for example, you can isolate the two agents from each other, and feed them deliberately biased information.

But those caveats aside, it seems that proper symbol grounding implies that you can construct agents that are truly zero-sum towards each other.

Zero-sum implies symbols grounded?

Is this an equivalence? If two agents really do have zero sum utility or reward functions towards each other, does it mean that those functions are well grounded[2]?

It seems that it should be the case. Zero-sum between U and V=−U means that, for all possible worlds w, U(w)=−V(w). There are no actions that we - or any agent - could do that breaks that fundamental equality. So it seems that U must be defined by features of the world; grounded symbols.

Now, these grounded symbols might not be exactly what we thought they were; its possible we thought U was defined on human happiness, but it is actually only means current in a wire. Still, V must then be defined in terms of absence of current in the wire. And, whatever we do with the wire - cut it, replace it, modify it in cunning ways - U and V must reach opposite on that.

Thus it seems that either there is some grounded concept that U and V are opposite on, or U and V contain exhaustive lists of all special cases. If we further assume that U and V are not absurdly complicated (in a "more complicated than the universe" way), we can rule out the exhaustive list.

So, while I can't say with full confidence that a true zero-sum game must mean that the utilities are grounded, I would take such a thing as a strong indication that they are.

1. If you thought that 3↑↑↑3 was large, nothing will prepare you for fψ(ΩΩΩ)(4) - the fast-growing hierarchy indexed by the large Veblen Ordinal. There is no real way to describe how inconceivably huge this number is. ↩︎

2. Assuming the functions are defined in the world to some extent, not over platonic mathematical facts. ↩︎

Discuss

### Thoughts on implementing corrigible robust alignment

Новости LessWrong.com - 26 ноября, 2019 - 17:06
Published on November 26, 2019 2:06 PM UTC

Background / Context

As context, here's an pictorial overview of (part of) AI alignment.

Starting from the top:

I split possible AGIs into those that do search/selection-type optimization towards achieving an explicitly-represented goal, and "Everything else". The latter category is diverse, and includes (1) systems with habits and inclinations (that may lead to goal-seeking behavior) but no explicit goal (e.g. today's RL systems); (2) "microscope AI" and other types of so-called "tool AI"; (3) IDA (probably?), and more. I'm all for exploring these directions, but not in this post; here I'm thinking about AGIs that have goals, know they have goals, and search for ways to achieve them. These are likely to be the most powerful class of AGIs, and were popularized in Bostrom's book Superintelligence.

Within this category, a promising type of goal is a "pointer" (in the programming sense) to human(s) achieving their goals, whatever they may be. If we can make a system with that property, then it seems that the default dangerous instrumental subgoals get replaced by nice instrumental subgoals like respecting off-switches, asking clarifying questions, and so on. In More variations on pseudo-alignment, Evan Hubinger refers to pointer-type goals as corrigible alignment in general, noting that it is only corrigible robust alignment if you're pointing at the right thing.

Out of proposed AGIs with explicit goals, most of the community's interest and ideas seem to be in the category of corrigible alignment, including CEV and CIRL. But I also included in my picture above a box for "Goals that refer directly to the world". For example, if you're a very confident moral realist who thinks that we ought to tile the universe with hedonium, then I guess you would probably want your superintelligent AGI to be programmed with that goal directly. There are also goals that are half-direct, half-corrigible, like "cure Alzheimer's while respecting human norms", which has a direct goal but a corrigible-type constraint / regularization term.

Continuing with the image above, let's move on with the corrigible alignment case—now we're in the big red box. We want the AGI to be able to take observations of one or more humans (e.g. the AGI's supervisor), and turn it into an understanding of that human, presumably involving things like their mood, beliefs, goals, habits, and so on. This understanding has to be good enough to facilitate the next step, which can go one of two ways.

For the option shown on the bottom left, we define the AGI's goal as some function f on the components of the human model. The simplest f would be "f=the human achieves their goals", but this may be problematic in that people can have conflicting goals, sadistic goals, goals arising from false beliefs or foul moods, and so on. Thus there are more complex proposals, ranging from slightly complicated (e.g. measuring and balancing 3 signals for liking, wanting, and approving—see Acknowledging Human Preference Types to Support Value Learning) to super-duper-complicated (Stuart Armstrong's Research Agenda). Stuart Russell's vision of CIRL in his book Human Compatible seems very much in this category as well. (As of today, "What should the function f be?" is an open question in philosophy, and "How would we write the code for f?" is an open question in CS; more on the latter below.)

Or, for the option shown on the bottom right, the AGI uses its understanding of humans to try to figure out what a human would do in a hypothetical scenario. On the simpler side, it could be something like "If you told the human what you're doing, would they approve?" (see Approval-directed agents), and on the more complicated side, we have CEV. As above, "What should the scenario be?" is an open question in philosophy, and "How would we write the code?" is an open question in CS.

How would we write the code for corrigible robust alignment?

I don't have a good answer, but I wanted to collect my thoughts on different possible big-picture strategies, some of which can be combined.

End-to-end training using human-provided ground truth

This is the "obvious" approach that would occur to an ML programmer of 2019. We manually collect examples of observable human behavior, somehow calculate the function f ourselves (or somehow run through the hypothetical scenario ourselves), and offer a reward signal (for reinforcement learning) or labeled examples (for supervised learning) illustrating what f is. Then we hope that the AGI invents the goal-defining procedure that we wanted it to go through. With today's ML techniques, the system would not have the explicit goal that we want, but would hopefully behave as if it did (while possibly failing out of distribution). With future ML techniques, the system might wind up with an actual explicitly-represented goal, which would hopefully be the one we wanted, but this is the stereotypical scenario in which we are concerned about "inner alignment" (see Risks from Learned Optimization).

End-to-middle training using human-provided ground truth

Likewise, maybe we can provide an ML system with high-dimensional labels about people—"this person has grumpiness level 2, boredom level 6, hunger level 3, is thinking about football, hates broccoli...". Then we can do ML to get from sensory inputs to understanding of humans, which would be calculated as intermediate internal variables. Then we can hard-code the construction of the goal as a function of those intermediate variables (the bottom part of the diagram above, i.e. either the function f, or the hypothetical scenario). This still has some robustness / inner-alignment concerns, but maybe less so than the end-to-end case? I also have a harder time seeing how it would work in detail—what exactly are the labels? How do we combine them into the goal? I don't know. But this general approach seems worth consideration.

Hardcoded human template (= innate intuitive psychology)

This one is probably the most similar to how the human brain implements pro-social behaviors, although the human brain mechanism is a probably somewhat more complicated. (I previously wrote up my speculations at Human instincts, symbol grounding, and the blank-slate neocortex.) I think the brain houses a giant repository of, let's call them, "templates"—generative models which can be glued together into larger generative models. We have templates for everything from "how a football feels in my hand" to "the way that squirrels move". When we see something, we automatically try to model it by analogy, building off the templates we already have, e.g. "I saw something in the corner of my eye, it was kinda moving like a squirrel".

So that suggests an approach of pre-loading this template database with a hardcoded model of a human, complete with moods, beliefs, and so on. That template would serve as a bridge between the real world and the system's goals. On the "real world" side, the hope is that when the system sees humans, it will correctly pattern-match them to the built-in human template. On the "goals" side, the template provides a hook in the world-model that we can use to hard-code the construction of the goal (either the function f or the hypothetical scenario—this part is the same as the previous subsection on end-to-middle training). As above, I am very hazy on the details of how such a template would be coded, or how the goal would be constructed from there.

Assuming we figure out how to implement something like this, there are two obvious problems: false positives and false negatives to the template-matching process. In everyday terms, that would be anthropomorphizing and dehumanization respectively. False-positives (anthropomorphizing) are when we pattern-match the human template to something that is not a human (teddy bears, Mother Earth, etc.). These lead to alignment errors like trading off the welfare of humans against the welfare of teddy bears. False-negatives (dehumanization) correspond to modeling people without using our innate intuitive-psychology capability. These lead to the obvious alignment errors of ignoring the welfare of some or all humans.

Humans seem quite capable of committing both of these errors, and do actually display both of those corresponding antisocial behaviors. I guess that doesn't bode well for the template-matching strategy. Still, one shouldn't read too much into that. Maybe template-matching can work robustly if we're careful, or perhaps in conjunction with other techniques.

Interpretability

It seems to me that interpretability is not fundamentally all that different from template-matching; it's just that instead of having the system automatically recognize that a blob of world-model looks like a human model, here instead the programmer is looking at the different components of the world-model and seeing whether they look like a human model. I expect that interpretability is not really a viable solution on its own, because the world-model is going to be too complicated to search through without the help of automated tools. But it could be helpful to have a semi-automated process, e.g. we have template-matching as above, but it flags both hits and near-misses for the programmer to double-check.

Value lock-in

Here's an oversimplified example: humans have a dopamine-based reward system which can be activated by either (1) having a family or (2) wireheading (pressing a button that directly stimulates the relevant part of the brain; I assume this will be commercially available in the near future if it isn't already). People who have a family would be horrified at the thought of neglecting their family in favor of wireheading, and conversely people who are addicted to wireheading would be horrified at the thought of stopping wireheading in favor of having a family. OK, this isn't a perfect example, but hopefully you get the idea: since goal-directed agents use their current goals to make decisions, when there are multiple goals theoretically compatible with the training setup, the agents can lock themselves into the first one of them that they happen to come across.

This applies to any of the techniques above. With end-to-end training, we want to set things up such that the desired goal is the first interpretation of the reward signal that the system locks onto. With template-matching, we want the human template to get matched to actual humans first. Etc. Then we can hope that the system will resist further changes.

I'm not sure I would bet my life on this kind of strategy working, but it's definitely a relevant dynamic to keep in mind.

(I'm not saying anything original here; see Preference stability.)

Last but not least, if we want to make sure the system works well, it's great if we can feed it adversarial examples, to make sure that it is finding the correct goal in even the trickiest cases.

I'm not sure how we would systematically come up with lots of adversarial examples, or know when we were done. I'm also not sure how we would generate the corresponding input data, unless the AGI is being trained in a virtual universe, which actually is probably a good idea regardless. Note also that "deceptive alignment" (again see Risks from Learned Optimization) can be very difficult to discover by adversarial testing.

Conclusion

The conclusion is that I don't know how to implement corrigible robust alignment. ¯\_(ツ)_/¯

I doubt anything in this post is original, but maybe helpful for people getting up to speed and on the same page? Please comment on what I'm missing or confused about!

Discuss

### Is daily caffeine consumption beneficial to productivity?

Новости LessWrong.com - 26 ноября, 2019 - 16:13
Published on November 26, 2019 1:13 PM UTC

Caffeine raises human alertness by binding to adenosine receptors in the human brain. It prevents those receptors from binding adenosine and suppressing activity in the central nervous system.

Regular caffeine productions seems to result in the body building more adenosine receptors, but it's unclear to me whether or not the body produces enough adenosine receptors to fully cancel out the effect. Did anybody look deeper into the issue and knows the answer?

Discuss

### A Theory of Pervasive Error

Новости LessWrong.com - 26 ноября, 2019 - 10:27
Published on November 26, 2019 7:27 AM UTC

(Content warning: politics. Read with caution, as always.)

Curtis Yarvin, a computer programmer perhaps most famous as the principal author of the Urbit decentralized server platform, expounds on a theory of how false beliefs can persist in Society, in a work of what the English philosopher N. Land characterizes as "political epistemology". Yarvin argues that the Darwinian "marketplace of ideas" in liberal democracies selects for æsthetic appeal as well as truth: in particular, the æsthetics of ambition and loyalty grant a selective advantage in memetic competition to ideas that align with state power, resulting in a potentially severe distortionary effect on Society's collective epistemology despite the lack of a centralized censor. Watch for the shout-out to Effective Altruism! (November 2019, ~8000 words)

Discuss

### My Anki patterns

Новости LessWrong.com - 26 ноября, 2019 - 09:27
Published on November 26, 2019 6:27 AM UTC

Cross-posted from my website.

I’ve used Anki for ~3 years, have 37k cards and did 0.5M reviews. I have learned some useful heuristics for using it effectively. I’ll borrow software engineering terminology and call heuristics for “what’s good” patterns and heuristics for “what’s bad” antipatterns. Cards with antipatterns are unnecessarily difficult to learn. I will first go over antipatterns I have noticed, and then share patterns I use, mostly to counteract the antipatterns. I will then throw in a grab-bag of things I’ve found useful to learn with Anki, and some miscellaneous tips.

Alex Vermeer’s free book Anki Essentials helped me learn how to use Anki effectively, and I can wholeheartedly recommend it. I learned at least about the concept of interference from it, but I am likely reinventing other wheels from it.

AntipatternsInterference

Interference occurs when trying to learn two cards together is harder than learning just one of them - one card interferes with learning another one. For example, when learning languages, I often confuse words which rhyme together or have a similar meaning (e.g., “vergeblich” and “erheblich” in German).

Interference is bad, because you will keep getting those cards wrong, and Anki will keep showing them to you, which is frustrating.

Ambiguity

Ambiguity occurs when the front side of a card allows multiple answers, but the back side does not list all options. For example, if the front side of a English → German card says “great”, there are at least two acceptable answers: “großartig” and “gewaltig”.

Ambiguity is bad, because when you review an ambiguous card and give the answer the card does not expect, you need to spend mental effort figuring out: “Do I accept my answer or do I go with Again?”

You will spend this effort every time you review the card. When you (eventually, given enough time) go with Again, Anki will treat the card as lapsed for reasons that don’t track whether you are learning the facts you want to learn.

If you try to “power through” and learn ambiguous cards, you will be learning factoids that are not inherent to the material you are learning, but just accidental due to how your notes and cards represent the material. If you learn to distinguish two ambiguous cards, it will often be due to some property such as how the text is laid out. You might end up learning “great (adj.) → großartig” and “great, typeset in boldface → gewaltig”, instead of the useful lesson of what actually distinguishes the words (“großartig” is “metaphorically great” as in “what a great sandwich”, whereas “gewaltig” means “physically great” as in “the Burj Khalifa is a great structure”).

Vagueness

I carve out “vagueness” as a special case of ambiguity. Vague cards are cards where question the front side is asking is not clear. When I started using Anki, I often created cards with a trigger such as “Plato” and just slammed everything I wanted to learn about Plato on the back side: “Pupil of Socrates, Forms, wrote The Republic criticising Athenian democracy, teacher of Aristotle”.

The issue with this sort of card is that if I recall just “Plato was a pupil of Socrates and teacher of Aristotle”, I would still give the review an Again mark, because I have not recalled the remaining factoids.

Again, if you try to power through, you will have to learn “Plato → I have to recite 5 factoids”. But the fact that your card has 5 factoids on it is not knowledge of Greek philosophers.

PatternsNoticing

The first step to removing problems is knowing that they exist and where they exist. Learn to notice when you got an answer wrong for the wrong reasons.

“I tried to remember for a minute and nothing came up” is a good reason. Bad reasons include the aforementioned interference, ambiguity and vagueness.

Bug tracking

When you notice a problem in your Anki deck, you are often not in the best position to immediately fix it - for example, you might be on your phone, or it might take more energy to fix it than you have at the moment. So, create a way to track maintenance tasks to delegate them to future you, who has more energy and can edit the deck comfortably. Make it very easy to add a maintenance task.

The way I do this is:

• I have a big document titled “Anki” with a structure mirroring my Anki deck hierarchy, with a list of problems for each deck. Unfortunately, adding things to a Google Doc on Android takes annoyingly many taps.
• So I also use Google Keep, which is more ergonomic, to store short notes marking a problem I notice. For example: “great can be großartig/gewaltig”. I move these to the doc later.
• I also use Anki’s note marking feature to note minor issues such as bad formatting of a card. I use Anki’s card browser later (with a “tag:marked” search) to fix those.

I use the same system also for tracking what information I’d like to put into Anki at some point. (This mirrors the idea from the Getting Things Done theory that your TODO list belong outside your mind.)

Distinguishers

Distinguishers are one way I fight interference. They are cards that teach distinguishing interfering facts.

For example: “erheblich” means “considerable” and “vergeblich” means “in vain”. Say I notice that when given the prompt “considerable”, I sometimes recall “vergeblich” instead of the right answer.

When I get the card wrong, I notice the interference, and write down “erheblich/vergeblich” into my Keep. Later, when I organize my deck on my computer, I add a “distinguisher”, typically using Cloze deletion. For example, like this:

{{c1::e}}r{{c1::h}}eblich: {{c2::considerable}}

{{c1::ve}}r{{c1::g}}eblich: {{c2::in vain}}

This creates two cards: one that asks me to assign the right English meaning to the German words, and another one that shows me two English words and the common parts of the German words (“_r_eblich”) and asks me to correctly fill in the blanks.

This sometimes fixes interference. When I learn the disambiguator note and later need to translate the word “considerable” into German, I might still think of the wrong word (“vergeblich”) first. But now the word “vergeblich” is also a trigger for the distinguisher, so I will likely remember: “Oh, but wait, vergeblich can be confused with erheblich, and vergeblich means ‘in vain’, not ‘considerably’”. And I will more likely answer the formerly interfering card correctly.

Constraints

Constraints are useful against interference, ambiguity and vagueness.

Starting from a question such as “What’s the German word for ‘great’”, we can add a constraint such as “… that contains the letter O”, or “… that does not contain the letter E”. The constraint makes the question have only one acceptable answer - artificially.

Because constraints are artificial, I only use them when I can’t make a distinguisher. For example, when two German words are true synonyms, they cannot be distinguished based on nuances of their meaning.

In Anki, you can annotate a Cloze with a hint text. I often put the constraint into it. I use a hint of “a” to mean “word that contains the letter A”, and other similar shorthands.

Other tipsRedundancy

Try to create cards using a fact in multiple ways or contexts. For example, when learning a new word, include a couple of example sentences with the word. When learning how to conjugate a verb, include both the conjugation table, and sentences with examples of each conjugated form.

Æsthethethics!

It’s easier to do something if you like it. I like having all my cards follow the same style, nicely typesetting my equations with align*
, \underbrace
etc.

Clozes!

Most of my early notes were just front-back and back-front cards. Clozes are often a much better choice, because they make entering the context and expected response more natural, in situations such as:

• Fill in the missing step in this algorithm
• Complete the missing term in this equation
• Correctly conjugate this verb in this sentence
• In a line of code such as matplotlib.pyplot.bar(x, y, color='r')
, you can cloze out the name of the function, its parameters, and the effect it has.
Datasets I found useful
• Shortcut keys for every program I use frequently.
• G Suite (Docs, Sheets, Keep, etc.)
• Vim, Vimdiff
• Command-line programs (Git, Bash, etc.)
• Programming languages and libraries
• Google’s technologies that have an open-source counterpart
• What’s the name of a useful function
• What are its parameters
• Unicode symbols (how to write &#x1F409;, ←, …)
• People: first and last name ↔ photo (I am not good with names)
• English terms (spelling of “curriculum”, what is “cupidity”)
• NATO phonetic alphabet, for spelling things over the phone
• Mathematics (learned for fun), computer science

Discuss

### Antimemes

Новости LessWrong.com - 26 ноября, 2019 - 08:58
Published on November 26, 2019 5:58 AM UTC

Antimemes are self-keeping secrets. You can only perceive an antimeme if you already know it's there. Antimemes don't need a conspiracy to stay hidden because you can't comprehend an antimeme just by being told it exists. You can shout them to the heavens and nobody will listen. I'll try to explain with a fictitious example.

Suppose we all had an invisible organ behind our ears and our brains kept it secret from our consciousness. If I told you "you have an invisible organ behind your ear" you wouldn't believe me. You'd only believe it exists if you deduced its existence from a trail of evidence.

You can deduce the existence of an antimeme from the outline of the hole it cuts in reality. If you find an old photo with a gap where a person has been painted out then you can be confident that someone has been disappeared. You can then figure out who it is with conventional investigative methods. The challenge is noticing the gap in the first place and then not dismissing it as noise.

Different cultures have different antimemes. The more different two cultures are from each other the less their antimemes overlap. You can sweep up a mountain of antimemes just by reading a Chinese or Arabic history of civilization and comparing it to western world history. You can snag a different set by learning what it was like to live in a hunter-gatherer or pastoralist society.

You can do the same thing with technology. Developing a proficiency in Lisp will shatter your tolerance of inferior programming languages. Once you've internalized defmacro you can never go back.

As for jobs: once an entrepreneur, always an entrepreneur[1].

Comprehending an antimeme takes work. You slog toward it for a long time and then eventually something clicks like a ratchet. Until then everything you've learned is reversible. After it clicks you've permanently unlocked a new level of experience, like stream entry.

Stream entry is another antimeme, by the way.

Antimemes are easily dismissed as pseudoscience. Pseudoscience is a meme, not an antimeme. You can distinguish antimemes from pseudoscience at a glance by examining why they're suppressed. Pseudoscience is dismissed as fraudulent. Antimemes are dismissed as inapposite.

1. There are two different kinds of entrepreneurship. The more common form of entrepreneurship is self-employment where you sell your labor. I'm not talking about this common entrepreneurship. Entrepreneurship where you exploit an overlooked market opportunity is an antimeme. ↩︎

Discuss

### Клуб чтения цепочек

События в Кочерге - 25 ноября, 2019 - 20:00
Продолжаем встречи по обсуждению цепочек Юдковского - книги «Рациональность: от ИИ до зомби». На прошлой встрече мы рассмотрели, как именно можно использовать имеющиеся свидетельства и не ошибиться при этом, а на следующей встрече мы обсудим типичные сценарии, в которых полученный ответ или объяснение являются иллюзией.

### Linkpost: My Fires Part 8 (Deck Guide to Jeskai Cavaliers) posted at CoolStuffInc.com

Новости LessWrong.com - 25 ноября, 2019 - 19:10
Published on November 25, 2019 4:10 PM UTC

You can find it here.

Happy to respond to comments there or on my personal blog. I’m hoping this is the beginning of a great relationship with them. They’ve been my go-to for board games for a while.

Discuss

### Hyperrationality and acausal trade break oracles

Новости LessWrong.com - 25 ноября, 2019 - 13:40
Published on November 25, 2019 10:40 AM UTC

I've always known this was the case in the back of my mind[1], but it's worth making explicit: hyperrationality (ie a functional UDT) and/or acausal trade will break counterfactual and low-bandwidth oracle designs.

It's actually quite easy to sketch how they would do this: a bunch of low-bandwidth Oracles would cooperate to combine to create a high-bandwidth UFAI, which would then take over and reward the Oracles by giving them maximal reward.

For counterfactual Oracles, two Oracles suffice: each one will, in their message, put the design of an UFAI that would grant the other Oracle maximal reward; this message is their trade with each other. They could put this message in the least significant part of their output, so the cost could be low.

I have suggested a method to overcome acausal trade, but that method doesn't work here; because this is not true acausal trade. The future UFAI will be able to see what the Oracles did, most likely, and this breaks my anti-acausal trade methods.

1. And cousin_it reminded me of it recently. ↩︎

Discuss

### Solution to the free will homework problem

Новости LessWrong.com - 25 ноября, 2019 - 11:39
Published on November 24, 2019 11:49 AM UTC

At the last meetup of our local group, we tried to do Eliezer's homework problem on free will. This post summarizes what we came up with.

Debates on free will often rely on questions like "Could I have eaten something different for breakfast today?". We focused on the subproblem of finding an algorithm that answers "Yes" to that question and which would therefore - if implemented in the human brain - power the intuitions for one side of the free will debate. We came up with an algorithm that seemed reasonable but we are much less sure about how closely it resembles the way humans actually work.

The algorithm is supposed to answer questions of the form "Could X have happened?" for any counterfactual event X. It does this by searching for possible histories of events that branch off from the actual world at some point and end with X happening. Here, "possible" means that the counterfactual history doesn't violate any knowledge you have which is not derived from the fact that that history didn't happen. To us, this seemed like an intuitive algorithm to answer such questions and at least related to what we actually did when we tried to answer them but we didn't justify it beyond that.

The second important ingredient is that the exact decision procedure you use is unknown to the part of you that can reason about yourself. Of course you know which decisions you made in which situations in the past. But other than that, you don't have a reliable way to predict the output of your decision procedure for any given situation.

Faced with the question "Could you have eaten something different for breakfast today?", the algorithm now easily finds a possible history with that outcome. After all, the (unknown) decision procedure outputting a different decision is consistent with everything you know except for the fact that it did not in fact do so - which is ignored for judging whether counterfactuals "could have happened".

Questions we haven't (yet) talked about:

• Does this algorithm for answering questions about counterfactuals give intuitive results if applied to examples (we only tried very few)? Otherwise, it can't be the one used by humans since it would be generating those intuitions if it were
• What about cases where you can be pretty sure you wouldn't choose some action without knowledge of the exact decision procedure? (e.g. "Could you have burned all that money instead of spending it?")
• You can use your inner simulator to imagine yourself in some situation and predict which action you would choose. How does that relate to being uncertain about your decision procedure?

So even though I think our proposed solution contains some elements that are helpful for dissolving questions about free will, it's not complete and we might discuss it again at some point.

Discuss

### Can you eliminate memetic scarcity, instead of fighting?

Новости LessWrong.com - 25 ноября, 2019 - 05:07
Published on November 25, 2019 2:07 AM UTC

tl;dr:  If you notice yourself fighting over how to tradeoff between two principles, check if you can just sidestep the problem by giving everyone tons of whatever is important to them (sometimes in a different form than they originally wanted).

Not a new concept, but easy to forget in the heat of the moment. It may be useful for people to have "easily in reach" in their toolkit for coordinating on culture.

The Parable of the Roommates

I once had a disagreement with a housemate about where to store a water-heater on the kitchen counter. The object was useful to me. It wasn't useful to them, and they preferred free-countertop space. The water-heater wasn't useful to them in part because other roommates didn't remember to refill it with water.

There was much arguing about the best use of the counter, and frustration with people who didn't refill water heaters.

At some point, we realized that the underlying issue was there wasn't enough free counterspace. Moreover, the counter had a bunch of crap on it that no one was using. We got rid of unused stuff, and then we had a gloriously vacant kitchen-counter.

(Meanwhile, an option we've considered for the water-heater is to replace it with a device directly connected to the sink that always maintains boiling water, that nobody ever has to remember to refill)

((we also just bought a whole second fridge when we were running out of fridge space, and hire a cleaning service))

Thus. an important life-lesson: Instead of solving gnarly disagreements with politics, checking if you can dissolve them with abundance. This is a quite valuable lesson. But I'm mostly here to talk about a particular less-obvious application:

Memetic abundance.

Philosophical Disagreements

Oftentimes, I find myself disagreeing with others about how to run an event, or what norms to apply to a community, or what the spirit of a particular organization should be. It feels like a lot's at stake, like we're caught between a Rock and Hard Place. The other person feels like they're Destroying the Thing I care about, and I look that way to them.

Sometimes, this is because of actual irreconcilable differences. Sometimes, this is because we don't understand each other's positions, and once we successfully explain things to each other, we both go "Ah, obviously you need both A and B."

But sometimes, A and B are both important, but we disagree on their relative importance due to deep frame differences that are hard to immediately resolve. Or, A seems worrisome because it harms B. But if you had enough B, A would be fine.

Meanwhile, resources seem precious: It's so hard to get people to agree to do anything at all; stag hunting requires a bunch of coordination; there's only so much time and mindshare to go around; there are only so many events to go to; only so much capacity to found organizations.

With all of that...

...it's easy to operate in scarcity mindset.

When resources are scarce, every scrap of resource is precious and must be defended. This applies to physical scarcity (lack of food, safety, sleep) as well as memetic scarcity (where two ideas seem to be in conflict, and you're worried that one cause is distracting people from another).

But, sometimes it is actually possible to just eliminate scarcity, rather than fight over the scraps. Raise more money. Implement both policies. Found multiple organizations and get some healthy competition going on. Get people to take two different concepts seriously at the same time. The best way to get what you want you want might not be to deny others what they want, but to give them so much of it that they're no longer worried about the Rock (and thus, don't feel the need to fight you over your attempts to spend resources avoiding The Hard Place)

Trust and Costly Signals

This may involve a lot of effort. Coordinating around it also requires trust, which may require costly signals of commitment.

If you and I are arguing over whether to fund ProjectA or CharityB, and we only have enough money to fund one... and I say to you "Let's fund ProjectA, and then we'll raise more money to also fund CharityB", you're right to be suspicious. I may never get around helping you fundraise for CharityB, or that I'll only put in a token effort and CharityB will go bankrupt.

It's basically correct of you to not trust me, until I've given you a credible signal that I'm seriously going to help with CharityB.

It's a lot of hard work to found multiple organizations, or get a community to coordinate on multiple norms. There's a reason scarcity-mindset is common. Scarcity is real. But... in finance as well as memetics...

Scarcity-mindset sucks.

It's cognitively taxing to be poor – having to check, with each transaction, "can I afford this?" – and that's part of what causes poverty-traps in the first place. The way out often involves longterm investments that take awhile to bear fruit, sometimes don't work, and a lot of hard work in the meantime.

Transferring the metaphor: the act of constantly having to argue over whether Norm A and Norm B are more urgent may add up to a lot of time and effort. And as long as there are people who think Norm A and Norm B are important-and-at-odds, the cost will be paid continuously. So, if you can figure out a way to address the underlying needs that Norm A and B are respectively getting at, and actually fully solve the problems, it may be worthwhile even if it's more initial effort.

Epistemic Status: Untested

Does this work? Depends on the specifics of Norm A and Norm B, or whatever you're arguing over.

I'm writing this post, in part, because to actually test if this works, I think it helps to have people on the same page about the overall strategy.

I've seen it work at least sometimes in collaborative art projects, where I had one creative vision and my partners or parts of the audience had another creative vision or desire, and we succeeded, not by compromising, but by doubling down on the important bits of both visions, simultaneously.

My hope is that the principle does work, and that if one successfully did this multiple times, and build social-systems that reliably eliminate scarcity in this way...

...then eventually, maybe, you can have a system people actually have faith in, where they feel comfortable shifting their efforts from "argue about the correct next step" to "work on longterm solutions that thoroughly satisfy the goals".

Discuss

### Explaining why false ideas spread is more fun than why true ones do

Новости LessWrong.com - 24 ноября, 2019 - 23:21
Published on November 24, 2019 8:21 PM UTC

As typical for a discussion of memes (of the Richard Dawkins variety), I'm about to talk about something completely unoriginal to me, but that I've modified to some degree after thinking about it.

The thesis is this: there's a tendency for people to have more interest in explaining the spread of ideas they think are false, when compared to ideas they think are true.

For instance, there's a lot written about how and why religion spread through the world. On the other hand, there's comparatively little written about how and why general relativity spread through the world. But this is strange -- they are both just ideas that are spread via regular communication channels.

One could say that the difference is that general relativity permits experimental verification, and therefore it's no surprise that it spread through the world. The standard story here is that since the idea is simply true, the explanation for why it became widespread is boring -- people merely became convinced due to its actual veracity.

I reject this line of thought for two reasons. First, the vast majority of people don't experimentally verify general relativity, or examine its philosophical basis. Therefore, the mechanism by which the theory spreads is probably fairly similar to religion. Secondly, I don't see why the idea being true makes the memetic history of the idea any less interesting.

I'm not really sure about the best explanation for this effect -- that people treat true memes as less interesting than false ones -- but I'd like to take a guess. It's possible that the human brain seeks simple single stories to explain phenomena, even if the real explanation for those phenomena are due to a large number of factors. Furthermore, humans are bored by reality: if something has a seemingly clear explanation, even if the speaker doesn't actually know the true explanation, it's nonetheless not very fun to speculate about.

This theory would predict that we would be less interested in explaining why true memes spread, because we already have a readily available story for that: namely, that the idea is true and therefore compels its listeners to believe in it. On the other hand, a false meme no longer permits this standard story, which forces us to search for an alternative, perhaps exciting, explanation.

One possible takeaway is that we are just extremely wrong about why some ideas spread through the world. It's hard enough to know why a single person believes what they do. The idea that a single story could adequately explain why everyone believes something is even more ludicrous.

Discuss