Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 17 минут 31 секунда назад

IRL 4/8: Maximum Entropy IRL and Bayesian IRL

26 марта, 2019 - 01:07
Published on March 25, 2019 10:07 PM UTC

Every Monday for 8 weeks, we will be posting lessons about Inverse Reinforcement Learning. This is lesson 4.

Note that access to the lessons requires creating an account here.

This lesson comes with the following supplementary material:

Have a nice day!


Please take the LW/SSC meetups survey!

26 марта, 2019 - 00:48
Published on March 25, 2019 9:48 PM UTC

I've put together a survey to gather information on the state of meetups around the world, and in particular to figure out what kinds of actions it might be useful for people interested in global meetup coordination to take. It's branded as being for SlateStarCodex meetups, but please don't be put off by that if your group isn't affiliated with SSC - the branding is just an artifact of previous decisions, but I'm just as interested in getting data on LW and EA groups.

You can take the survey here.

Context: I've been organizing and thinking about meetups for a couple years now. I coordinated the SSC Meetups Everywhere 2018 and I received a grant from the Centre for Effective Altruism to coordinate SSC meetups.


Please let me know either in the survey or in the comments below if you have any feedback or questions! It's very unlikely that I'll make changes to the survey questions now since that would mess up the data, but this is my first time doing something like this and I will definitely take feedback into account for the future.

Data will not be released publicly because it would be too easy to identify individuals and I neglected to include a question about releasing people's answers, but I am planning to share aggregate statistics and lessons learned publicly. I will also probably reach out to individual meetup organizers if there's significant data on what people want to see from their groups.


To perform best at work, look at Time & Energy account balance

25 марта, 2019 - 23:14
Published on March 25, 2019 7:37 PM UTC

Several weeks ago, I got a chance to join a talk hosting one of the very few female regional head at Google.

Despite not having any business background, she climbed the rank from entry level employee to become a regional head, surpassing everyone else from prestigious business degrees and rich experiences.

One success driver she mentioned got my attention. Despite lagging very much behind at the beginning, the core to her success is that she always aims for 120% result of any task in front of her.

The reason why this interests me is not because of my fresh ears.

In fact, this is not the first time I heard of this concept. Not the first time I get inspired of giving it all to whatever is in front. Not the first time I try…and not the first time I fail.

Did I not put in enough effort?

No…in fact, I put in so much effort to make this concept come to live, not realising that while effort is highly important, it’s critically inadequate.

As I listened to this amazing regional head talking about different aspects of her life, I came to realisation on what I have always been missing so far.

To make each task yield 120%, apart from effort, we should also look at our time and energy balance.

Contributing the best on a task means to give the amount of time and energy in the level required to make the result best.

We cannot contribute what we don’t have.

No matter how much effort we try to give adequate time required for the best, we only have 24 hours a day.

No matter how much energy we try to put into each task, we only have a limited stream in each day.

Therefore, giving our best does not start from the moment we begin working…but from the moment we plan our schedule and project pipelines.

When having "Enough Time" is Not Enough When my boss asked if I have enough time to take on one additional project, I would look at how much time is required to finish all the tasks on my desk and then, most of the time, said "Yes" thinking I have enough time to finish it all.

However, there is a difference between having enough time to finish it all and having time to make it best.

Coming back to evaluate all the projects in my pipeline, I realize that the time I have is only enough to finish all up, but not to go above and beyond.

I have two choices:

  • Finishing a lot of tasks with average results OR

  • Complete major task with the best impact that goes beyond expectations

There is no right answer here, but for my situation, the second works better.

Even having Time is Sometimes Not Enough Having time is good. But having time without full energy...hmm...unlikely to be productive.

Another good lesson I learned from this talk is that ample time to do it best should always come with ample energy.

It's just normal to plan business projects with the right balance between high-low energy requirement. However...our energy pool is not limited only in working hours, but also in personal life.

One thing I learned is that when looking at high-low energy requirement in my activities list, I should include all activities both in office and at home.

Despite saying "I only have one major project going on during working hours", if this lady has to practice running a marathon at night with high intensity, how would she have enough energy to do both best, despite marathon not being related to works.

To summarize, with one key success driver in career being to do our best in the tasks at hand (eg. the concept to deliver 120%), many ambitious people try to put in so much effort to ensure the best results. However, the best results actually begin even before we start doing each task...but begins during project planning, in which time and energy balance would determine how our project results would turn out to be.


Subagents, akrasia, and coherence in humans

25 марта, 2019 - 17:24
Published on March 25, 2019 2:24 PM UTC

In my previous posts, I have been building up a model of mind as a collection of subagents with different goals, and no straightforward hierarchy. This then raises the question of how that collection of subagents can exhibit coherent behavior: after all, many ways of aggregating the preferences of a number of agents fail to create consistent preference orderings.

We can roughly describe coherence as the property that, if you become aware that there exists a more optimal strategy for achieving your goals than the one that you are currently executing, then you will switch to that better strategy. If an agent is not coherent in this way, then bad things are likely to happen to them.

Now, we all know that humans sometimes express incoherent behavior. But on the whole, people still do okay: the median person in a developed country still manages to survive until their body starts giving up on them, and typically also manages to have and raise some number of initially-helpless children until they are old enough to take care of themselves.

For a subagent theory of mind, we would like to have some explanation of when exactly the subagents manage to be collectively coherent (that is, change their behavior to some better one), and what are the situations in which they fail to do so. The conclusion of this post will be:

We are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS-style protector) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.

(Those of you who read my previous post might remember that I said this post would be about “unification of mind” - that is, about how to make subagents agree with each other better. Turns out that I spent so many words explaining when subagents disagree, that I had to put off the post on how to get them to agree. Maybe my next post will manage to be about that…)

Correcting your behavior as a default

There are many situations in which we exhibit incoherent behavior simply because we’re not aware of it. For instance, suppose that I do my daily chores in a particular order, when doing them in some other order would save more time. If you point this out to me, I’m likely to just say “oh”, and then adopt the better system.

Similarly, several of the experiments which get people to exhibit incoherent behavior rely on showing different groups of people different formulations of the same question, and then indicating that different framings of the same question get different answers from people. It doesn’t work quite as well if you show the different formulations to the same people, because then many of them will realize that differing answers would be inconsistent.

But there are also situations in which someone realizes that they are behaving in a nonsensical way, yet will continue behaving in that way. Since people usually can change suboptimal behaviors, we need an explanation for why they sometimes can’t.

Towers of protectors as a method for coherence

In my post about Internal Family Systems, I discussed a model of mind composed of several different kinds of subagents. One of them, the default planning subagent, is a module just trying to straightforwardly find the best thing to do and then execute that. On the other hand, protector subagents exist to prevent the system from getting into situations which were catastrophic before. If they think that the default planning subagent is doing something which seems dangerous, they will override it and do something else instead. (Previous versions of the IFS post called the default planning agent, “a reinforcement learning subagent”, but this was potentially misleading since several other subagents were reinforcement learning ones too, so I’ve changed the name.)

Thus, your behavior can still be coherent even if you feel that you are failing to act in a coherent way. You simply don’t realize that a protector is carrying out a routine intended to avoid dangerous outcomes - and this might actually be a very successful way of keeping you out of danger. Some subagents in your mind think that doing X would be a superior strategy, but the protector thinks that it would be a horrible idea - so from the point of view of the system as a whole, doing X is not a better strategy, so not switching to it is actually better.

On the other hand, it may also be the case that the protector’s behavior, while keeping you out of situations which the protector considers unacceptable, is causing other outcomes which are also unacceptable. The default planning subagent may realize this - but as already established, any protector can overrule it, so this doesn’t help.

Evolution’s answer here seems to be spaghetti towers. The default planning subagent might eventually figure out the better strategy, which avoids both the thing that the protector is trying to block and the new bad outcome. But it could be dangerous to wait that long, especially since the default planning agent doesn't have direct access to the protector's goals. So for the same reasons why a separate protector subagent was created to avoid the first catastrophe, the mind will create or recruit a protector to avoid the second catastrophe - the one that the first protector keeps causing.

With permission, I’ll borrow the illustrations from eukaryote’s spaghetti tower post to illustrate this.

Example Eric grows up in an environment where he learns that disagreeing with other people is unsafe, and that he should always agree to do things that other people ask of him. So Eric develops a protector subagent running a pleasing, submissive behavior.

Unfortunately, while this tactic worked in Eric’s childhood home, once he became an adult he starts saying “yes” to too many things, without leaving any time for his own needs. But saying “no” to anything still feels unsafe, so he can’t just stop saying “yes”. Instead he develops a protector which tries to keep him out of situations where people would ask him to do anything. This way, he doesn’t need to say “no”, and also won’t get overwhelmed by all the things that he has promised to do. The two protectors together form a composite strategy.

While this helps, it still doesn’t entirely solve the issue. After all, there are plenty of reasons that might push Eric into situations where someone would ask something of him. He still ends up agreeing to do lots of things, to the point of neglecting his own needs. Eventually, his brain creates another protector subagent. This one causes exhaustion and depression, so that he now has a socially-acceptable reason for being unable to do all the things that he has promised to do. He continues saying “yes” to things, but also keeps apologizing for being unable to do things that he (honestly) intended to do as promised, and eventually people realize that you probably shouldn’t ask him to do anything that’s really important to get done.

And while this kind of a process of stacking protector on top of a protector is not perfect, for most people it mostly works out okay. Almost everyone ends up having their unique set of minor neuroses and situations where they don’t quite behave rationally, but as they learn to understand themselves better, their default planning subagent gets better at working around those issues. This might also make the various protectors relax a bit, since the various threats are generally avoided and there isn’t a need to keep avoiding them.

Gradually, as negative consequences to different behaviors become apparent, behavior gets adjusted - either by the default planning subagents or by spawning more protectors - and remains coherent overall.

But sometimes, especially for people in highly stressful environments where almost any mistake may get them punished, or when they end up in an environment that their old tower of protectors is no longer well-suited for (distributional shift), things don’t go as well. In that situation, their minds may end up looking like this a hopelessly tangled web, where they have almost no flexibility. Something happens in their environment, which sets off one protector, which sets off another, which sets off another - leaving them with no room for flexibility or rational planning, but rather forcing them to act in a way which is almost bound to only make matters worse.

This kind of an outcome is obviously bad. So besides building spaghetti towers, the second strategy which the mind has evolved to employ for keeping its behavior coherent while piling up protectors, is the ability to re-process memories of past painful events.

As I discussed in my original IFS post, the mind has methods for bringing up the original memories which caused a protector to emerge, in order to re-analyze them. If ending up in some situation is actually no longer catastrophic (for instance, you are no longer in your childhood home where you get punished simply for not wanting to do something), then the protectors which were focused on avoiding that outcome can relax and take a less extreme role.

For this purpose, there seems to be a built-in tension. Exiles (the IFS term for subagents containing memories of past trauma) “want” to be healed and will do things like occasionally sending painful memories or feelings into consciousness so as to become the center of attention, especially if there is something about the current situation which resembles the past trauma. This also acts as what my IFS post called a fear model - something that warns of situations which resemble the past trauma enough to be considered dangerous in their own right. At the same time, protectors “want” to keep the exiles hidden and inactive, doing anything that they can for keeping them so. Various schools of therapy - IFS one of them - seek to tap into this existing tension so as to reveal the trauma, trace it back to its original source, and heal it.

Coherence and conditioned responses

Besides the presence of protectors, another possibility for why we might fail to change our behavior are strongly conditioned habits. Most human behavior involves automatic habits: behavioral routines which are triggered by some sort of a cue in the environment, and lead to or have once led to a reward. (Previous discussion; see also.)

The problem with this is that people might end up with habits that they wouldn’t want to have. For instance, I might develop a habit of checking social media on their phone when I’m bored, creating a loop of boredom (cue) -> looking at social media (behavior) -> seeing something interesting on social media (reward).

Reflecting on this behavior, I notice that back when I didn’t do it, my mind was more free to wander when I was bored, generating motivation and ideas. I think that my old behavior was more valuable than my new one. But even so, my new behavior still delivers enough momentary satisfaction to keep reinforcing the habit.

Subjectively, this feels like an increasing compulsion to check my phone, which I try to resist since I know that long-term it would be a better idea to not be checking my phone all the time. But as the compulsion keeps growing stronger and stronger, eventually I give up and look at the phone anyway.

The exact neuroscience of what is happening at such a moment remains only partially understood (Simpson & Balsam 2016). However, we know that whenever different subsystems in the brain produce conflicting motor commands, that conflict needs to be resolved, with only one at a time being granted access to the “final common motor path”. This is thought to happen in the basal ganglia, a part of the brain closely involved in action selection and connected to the global neuronal workspace.

One model (e.g. Redgrave 2007, McHaffie 2005) is that the basal ganglia receives inputs from many different brain systems; each of those systems can send different “bids” supporting or opposing a specific course of action to the basal ganglia. A bid submitted by one subsystem may, through looped connections going back from the basal ganglia, inhibit other subsystems, until one of the proposed actions becomes sufficiently dominant to be taken.

The above image from Redgrave 2007 has a conceptual image of the model, with two example subsystems shown. Suppose that you are eating at a restaurant in Jurassic Park when two velociraptors charge in through the window. Previously, your hunger system was submitting successful bids for the “let’s keep eating” action, which then caused inhibitory impulses to the be sent to the threat system. This inhibition prevented the threat system from making bids for silly things like jumping up from the table and running away in a panic. However, as your brain registers the new situation, the threat system gets significantly more strongly activated, sending a strong bid for the “let’s run away” action. As a result of the basal ganglia receiving that bid, an inhibitory impulse is routed from the basal ganglia to the subsystem which was previously submitting bids for the “let’s keep eating” actions. This makes the threat system’s bids even stronger relative to the (inhibited) eating system’s bids.

Soon the basal ganglia, which was previously inhibiting the threat subsystem’s access to the motor system while allowing the eating system access, withdraws that inhibition and starts inhibiting the eating system’s access instead. The result is that you jump up from your chair and begin to run away. Unfortunately, this is hopeless since the velociraptor is faster than you. A few moments later, the velociraptor’s basal ganglia gives the raptor’s “eating” subsystem access to the raptor’s motor system, letting it happily munch down its latest meal.

But let’s leave velociraptors behind and go back to our original example with the phone. Suppose that you have been trying to replace the habit of looking at your phone when bored, to instead smiling and directing your attention to pleasant sensations in your body, and then letting your mind wander.

Until the new habit establishes itself, the two habits will compete for control. Frequently, the old habit will be stronger, and you will just automatically check your phone without even remembering that you were supposed to do something different. For this reason, behavioral change programs may first spend several weeks just practicing noticing the situations in which you engage in the old habit. When you do notice what you are about to do, then more goal-directed subsystems may send bids towards the “smile and look for nice sensations” action. If this happens and you pay attention to your experience, you may notice that long-term it actually feels more pleasant than looking at the phone, reinforcing the new habit until it becomes prevalent.

To put this in terms of the subagent model, we might drastically simplify things by saying that the neural pattern corresponding to the old habit is a subagent reacting to a specific sensation (boredom) in the consciousness workspace: its reaction is to generate an intention to look at the phone. At first, you might train the subagent responsible for monitoring the contents of your consciousness, to output moments of introspective awareness highlighting when that intention appears. That introspective awareness helps alert a goal-directed subagent to try to trigger the new habit instead. Gradually, a neural circuit corresponding to the new habit gets trained up, which starts sending its own bids when it detects boredom. Over time, reinforcement learning in the basal ganglia starts giving that subagent’s bids more weight relative to the old habit’s, until it no longer needs the goal-directed subagent’s support in order to win.

Now this model helps incorporate things like the role of having a vivid emotional motivation, a sense of hope, or psyching yourself up when trying to achieve habit change. Doing things like imagining an outcome that you wish the habit to lead to, may activate additional subsystems which care about those kinds of outcomes, causing them to submit additional bids in favor of the new habit. The extent to which you succeed at doing so, depends on the extent to which your mind-system considers it plausible that the new habit leads to the new outcome. For instance, if you imagine your exercise habit making you strong and healthy, then subagents which care about strength and health might activate to the extent that you believe this to be a likely outcome, sending bids in favor of the exercise action.

On this view, one way for the mind to maintain coherence and readjust its behaviors, is its ability to re-evaluate old habits in light of which subsystems get activated when reflecting on the possible consequences of new habits. An old habit having been strongly reinforced reflects that a great deal of evidence has accumulated in favor of it being beneficial, but the behavior in question can still be overridden if enough influential subsystems weigh in with their evaluation that a new behavior would be more beneficial in expectation.

Some subsystems having concerns (e.g. immediate survival) which are ranked more highly than others (e.g. creative exploration) means that the decision-making process ends up carrying out an implicit expected utility calculation. The strengths of bids submitted by different systems do not just reflect the probability that those subsystems put on an action being the most beneficial. There are also different mechanisms giving the bids from different subsystems varying amounts of weight, depending on how important the concerns represented by that subsystem happen to be in that situation. This ends up doing something like weighting the probabilities by utility, with the kinds of utility calculations that are chosen by evolution and culture in a way to maximize genetic fitness on average. Protectors, of course, are subsystems whose bids are weighted particularly strongly, since the system puts high utility on avoiding the kinds of outcomes they are trying to avoid.

The original question which motivated this section was: why are we sometimes incapable of adopting a new habit or abandoning an old one, despite knowing that to be a good idea? And the answer is: because we don’t know that such a change would be a good idea. Rather, some subsystems think that it would be a good idea, but other subsystems remain unconvinced. Thus the system’s overall judgment is that the old behavior should be maintained.

Interlude: Minsky on mutually bidding subagentsI was trying to concentrate on a certain problem but was getting bored and sleepy. Then I imagined that one of my competitors, Professor Challenger, was about to solve the same problem. An angry wish to frustrate Challenger then kept me working on the problem for a while. The strange thing was, this problem was not of the sort that ever interested Challenger.What makes us use such roundabout techniques to influence ourselves? Why be so indirect, inventing misrepresentations, fantasies, and outright lies? Why can't we simply tell ourselves to do the things we want to do? [...]Apparently, what happened was that my agency for Work exploited Anger to stop Sleep. But why should Work use such a devious trick?To see why we have to be so indirect, consider some alternatives. If Work could simply turn off Sleep, we'd quickly wear our bodies out. If Work could simply switch Anger on, we'd be fighting all the time. Directness is too dangerous. We'd die.Extinction would be swift for a species that could simply switch off hunger or pain. Instead, there must be checks and balances. We'd never get through one full day if any agency could seize and hold control over all the rest. This must be why our agencies, in order to exploit each other's skills, have to discover such roundabout pathways. All direct connections must have been removed in the course of our evolution.This must be one reason why we use fantasies: to provide the missing paths. You may not be able to make yourself angry simply by deciding to be angry, but you can still imagine objects or situations that make you angry. In the scenario about Professor Challenger, my agency Work exploited a particular memory to arouse my Anger's tendency to counter Sleep. This is typical of the tricks we use for self-control.Most of our self-control methods proceed unconsciously, but we sometimes resort to conscious schemes in which we offer rewards to ourselves: "If I can get this project done, I'll have more time for other things." However, it is not such a simple thing to be able to bribe yourself. To do it successfully, you have to discover which mental incentives will actually work on yourself. This means that you - or rather, your agencies - have to learn something about one another's dispositions. In this respect the schemes we use to influence ourselves don't seem to differ much from those we use to exploit other people - and, similarly, they often fail. When we try to induce ourselves to work by offering ourselves rewards, we don't always keep our bargains; we then proceed to raise the price or even deceive ourselves, much as one person may try to conceal an unattractive bargain from another person.Human self-control is no simple skill, but an ever-growing world of expertise that reaches into everything we do. Why is it that, in the end, so few of our self-incentive tricks work well? Because, as we have seen, directness is too dangerous. If self-control were easy to obtain, we'd end up accomplishing nothing at all.

-- Marvin Minsky, The Society of Mind

Akrasia is subagent disagreement

You might feel that the above discussion doesn’t still entirely resolve the original question. After all, sometimes we do manage to change even strongly conditioned habits pretty quickly. Why is it sometimes hard and sometimes easier?

Redgrave et al. (2010) discuss two modes of behavioral control: goal-directed versus habitual. Goal-directed control is a relatively slow mode of decision-making, where “action selection is determined primarily by the relative utility of predicted outcomes”, whereas habitual control involves more directly conditioned stimulus-response behavior. Which kind of subsystem is in control is complicated, and depends on a variety of factors (the following quote has been edited to remove footnotes to references; see the original for those):

Experimentally, several factors have been shown to determine whether the agent (animal or human) operates in goal-directed or habitual mode. The first is over-training: here, initial control is largely goal-directed, but with consistent and repeated training there is a gradual shift to stimulus–response, habitual control. Once habits are established, habitual responding tends to dominate, especially in stressful situations in which quick reactions are required. The second related factor is task predictability: in the example of driving, talking on a mobile phone is fine so long as everything proceeds predictably. However, if something unexpected occurs, such as someone stepping out into the road, there is an immediate switch from habitual to goal-directed control. Making this switch takes time and this is one of the reasons why several countries have banned the use of mobile phones while driving. The third factor is the type of reinforcement schedule: here, fixed-ratio schedules promote goal-directed control as the outcome is contingent on responding (for example, a food pellet is delivered after every n responses). By contrast, interval schedules (for example, schedules in which the first response following a specified period is rewarded) facilitate habitual responding because contingencies between action and outcome are variable. Finally, stress, often in the form of urgency, has a powerful influence over which mode of control is used. The fast, low computational requirements of stimulus–response processing ensure that habitual control predominates when circumstances demand rapid reactions (for example, pulling the wrong way in an emergency when driving on the opposite side of the road). Chronic stress also favours stimulus–response, habitual control. For example, rats exposed to chronic stress become, in terms of their behavioural responses, insensitive to changes in outcome value and resistant to changes in action–outcome contingency. [...]Although these factors can be seen as promoting one form of instrumental control over the other, real-world tasks often have multiple components that must be performed simultaneously or in rapid sequences. Taking again the example of driving, a driver is required to continue steering while changing gear or braking. During the first few driving lessons, when steering is not yet under automatic stimulus–response control, things can go horribly awry when the new driver attempts to change gears. By contrast, an experienced (that is, ‘over-trained’) driver can steer, brake and change gear automatically, while holding a conversation, with only fleeting contributions from the goal-directed control system. This suggests that many skills can be deconstructed into sequenced combinations of both goal-directed and habitual control working in concert. [...]Nevertheless, a fundamental problem remains: at any point in time, which mode should be allowed to control which component of a task? Daw et al. have used a computational approach to address this problem. Their analysis was based on the recognition that goal-directed responding is flexible but slow and carries comparatively high computational costs as opposed to the fast but inflexible habitual mode. They proposed a model in which the relative uncertainty of predictions made by each control system is tracked. In any situation, the control system with the most accurate predictions comes to direct behavioural output.

Note those last sentences: besides the subsystems making their own predictions, there might also be a meta-learning system keeping track of which other subsystems tend to make the most accurate predictions in each situation, giving extra weight to the bids of the subsystem which has tended to perform the best in that situation. We’ll come back to that in future posts.

This seems compatible with my experience in that, I feel like it’s possible for me to change even entrenched habits relatively quickly - assuming that the new habit really is unambiguously better. In that case, while I might forget and lapse to the old habit a few times, there’s still a rapid feedback loop which quickly indicates that the goal-directed system is simply right about the new habit being better.

Or, the behavior in question might be sufficiently complex and I might be sufficiently inexperienced at it, that the goal-directed (default planning) subagent has always mostly remained in control of it. In that case change is again easy, since there is no strong habitual pattern to override.

In contrast, in cases where it’s hard to establish a new behavior, there tends to be some kind of genuine uncertainty:

  • The benefits of the old behavior have been validated in the form of direct experience (e.g. unhealthy food that tastes good, has in fact tasted good each time), whereas the benefits of the new behavior come from a less trusted information source which is harder to validate (e.g. I’ve read scientific studies about the long-term health risks of this food).
  • Immediate vs. long-term rewards: the more remote the rewards, the larger the risk that they will for some reason never materialize.
  • High vs. low variance: sometimes when I’m bored, looking at my phone produces genuinely better results than letting my thoughts wander. E.g. I might see an interesting article or discussion, which gives me novel ideas or insights that I would not otherwise have had. Basically looking at my phone usually produces worse results than not looking at it - but sometimes it also produces much better ones than the alternative.
  • Situational variables affecting the value of the behaviors: looking at my phone can be a way to escape uncomfortable thoughts or sensations, for which purpose it’s often excellent. This then also tends to reinforce the behavior of looking at the phone when I’m in the same situation otherwise, but without uncomfortable sensations that I’d like to escape.

When there is significant uncertainty, the brain seems to fall back to those responses which have worked the best in the past - which seems like a reasonable approach, given that intelligence involves hitting tiny targets in a huge search space, so most novel responses are likely to be wrong.

As the above excerpt noted, the tendency to fall back to old habits is exacerbated during times of stress. The authors attribute it to the need to act quickly in stressful situations, which seems correct - but I would also emphasize the fact that negative emotions in general tend to be signs of something being wrong. E.g. Eldar et al. (2016) note that positive or negative moods tend to be related to whether things are going better or worse than expected, and suggest that mood is a computational representation of momentum, acting as a sort of global update to our reward expectations.

For instance, if an animal finds more fruit than it had been expecting, that may indicate that spring is coming. A shift to a good mood and being “irrationally optimistic” about finding fruit even in places where the animal hasn’t seen fruit in a while, may actually serve as a rational pre-emptive update to its expectations. In a similar way, things going less well than expected may be a sign of some more general problem, necessitating fewer exploratory behaviors and less risk-taking, so falling back into behaviors for which there is a higher certainty of them working out.

So to repeat the summary that I had in the beginning: we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS protector whose bids get a lot of weight) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.


The Amish, and Strategic Norms around Technology

25 марта, 2019 - 01:16
Published on March 24, 2019 10:16 PM UTC

I was reading Legal Systems Very Different From Ours by David Friedman. The chapter on the Amish made a couple interesting claims, which changed my conception of that culture (although I'm not very confident that the Amish would endorse these claims as fair descriptions).

Strategic Norms Around Technology

The Amish relationship to technology is not "stick to technology from the 1800s", but rather "carefully think about how technology will affect your culture, and only include technology that does what you want."

So, electric heaters are fine. Central heating in a building is not. This is because if there's a space-heater in the living room, this encourages the family to congregate together. Whereas if everyone has heating in their room, they're more likely to spend time apart from each other.

Some communities allow tractors, but only if they don't have rubber tires. This makes them good for tilling fields but bad for driving around.

Cars and telephones are particularly important not to allow, because easy transportation and communication creates a slippery slope to full-connection to the outside world. And a lot of the Amish lifestyle depends on cutting themselves off from the various pressures and incentives present in the rest of the world.

Some Amish communities allow people to borrow telephones or cars from non-Amish neighbors. I might have considered this hypocritical. But in the context of "strategic norms of technology", it need not be. The important bit is to add friction to transportation and communication.

Competitive Dictatorship

Officially, most Amish congregations operate via something-like-consensus (I'm not sure I understood this). But Friedman's claim is that in practice, most people tend to go with what the local bishop says. This makes a bishop something like a dictator.

But, there are lots of Amish communities, and if you don't like the direction a bishop is pushing people in, or how they are resolving disputes, you can leave. There is a spectrum of communities ranging in how strict they are about about various rules, and they make decisions mostly independently.

So there is not only strategic norms around technology, but a fairly interesting, semi-systematic exploration of those norms.

Other Applications

I wouldn't want to be Amish-in-particular, but the setup here is very interesting to me.

I know some people who went to MAPLE, a monastery program. While there, there were limits on technology that meant, after 9pm, you basically had two choices: read, or go to bed. The choices were strongly reinforced by the social and physical environment. And this made it much easier to make choices they endorsed.

Contrast this with my current house, where a) you face basically infinite choices about to spend your time, and b) in practice, the nightly choices often end up being something like "stay up till 1am playing minecraft with housemates" or "stay up till 2am playing minecraft with housemates."

I'm interested in the question "okay, so... my goals are not the Amish goals. But, what are my goals exactly, and is there enough consensus around particular goals to make valid choices around norms and technology other than 'anything goes?'"

There are issues you face that make this hard, though:

Competition with the Outside World – The Amish system works because it cuts itself off from the outside world, and its most important technological choices directly cause that. Your business can't get outcompeted by someone else who opens up their shop on Sundays because there is nobody who opens their shop on Sundays.

You also might have goals that directly involve the outside world.

(The Amish also have good relationships with the government such that they can get away with implementing their own legal systems and get exceptions for things like school-laws. If you want to do something on their scale, you both would need to not attract the ire of the government, and be good enough at rolling your own legal system to not screw things up and drive people away)

Lack of Mid-Scale-Coordination – I've tried to implement 10pm bedtimes. It fails, horribly, because I frequently attend events that last till midnight or later. Everyone could shift their entire sleep schedule forward, maybe. But also...

People Are Different – Some of people's needs are cultural. But some are biological, and some needs are maybe due to environmental factors that happened over decades and can't be changed on a dime.

Some people do better with rules and structure. Some people flourish more with flexibility. Some people need rules and structure but different rules and structure than other people.

This all makes it fairly hard to coordinate on norms.

Contenders for Change

Given the above, I think it makes most sense to:

  • Look for opportunities explore norms and technology-use at the level of individuals, households, and small organizations (these seem like natural clusters with small numbers of stakeholders, where you can either get consensus or have a dictator).
  • While doing so, choose norms that are locally stable, that don't require additional cooperation outside yourself, your household or your org.

For example, I could imagine an entire household trying out a rule, like "the household internet turns off at 10pm", or "all the lights turn reddish at night so it's easier to get to sleep"


Did the recent blackmail discussion change your beliefs?

24 марта, 2019 - 19:06
Published on March 24, 2019 4:06 PM UTC

Various rationalist blogs and Less Wrong have recently posted on and discussed blackmail, and specifically legality and acceptability of such. I found the discussion unsatisfying, and I'm trying to understand why that is, and whether I'm alone in that.

As it was happening, it didn't feel like a particularly political topic - nobody seemed personally invested in the outcome. But it did seem like everyone (including myself, sometimes) was presenting examples or (over)generalizing to support their beliefs, and very few were seeking counterexamples or cruxes or lines of demarcation between different intuitions.

So - was this politics in disguise? Was some other bias interfering with the discussion? Was it useful and I just missed it? Did any sort of consensus emerge?


The Politics of Age (the Young vs. the Old)

24 марта, 2019 - 09:40
Published on March 24, 2019 6:40 AM UTC

Few days ago I've read an article in the local newspaper about Switzerland considering to lower the voting age to 16.

The reason I found it interesting was that it was not one of the old tired political discussions supported by the same old tired arguments that you typically encounter. In fact, it's a question that I have never thought of before.

Apparently, the discussion was triggered by the recent school strike for climate that went quite big in Switzerland. I've attended the demonstration in Zurich and it was not only big, it was really a kids' event. You could spot a grown-up here and there but they were pretty rare. (Btw, I think this movement is worth watching. Here, for the first time, I see a coordination on truly global level. It spans beyong western countries, with events being hosted in Asia, Pacific Islands, South America or Africa.)

Anyway, the main argument for lowering the voting age is to counter-balance the greying of the electorate.

Once again, this stems from what the climate stikers say: "The politicians who decide on these issues will be dead by the time the shit hits the fan. It will be us who'll have to deal with it. We should have a say in the matter."

But the question is broader: As the demographics change, with the birth rates dropping at crazy speed (China's population will start shrinking not that far in the future; Sub-saharan fartility rates had plummeted from 6.8 in 1970's to 4.85 in 2015), the age pyramid is going to look less like a pyramid and more like a column or even a funnel. In such a case the old will hold a much larger amount of political power than they do today.

While that may seem like a minor thing (everyone is young at some point and old later on) just consider how it would affect the politics of, say, pensions or health-care.

Or, for that matter, I hear that Brexit wouldn't happen is 16- and 17-year olds were allowed to vote.

More questions:

With old people being generally more conservative are we going to see slowing or even reversal of the seemingly instoppable move to the political left that was going on for decades?

With high percentage of young males being often blamed for social unrest and wars, is the changing shape of the age pyramid going to result in even more political stability? And how is giving teenagers a vote going to affect that?

I have no answers but the topic is definitely worth thinking about.

(Btw, the voting age was lowered to 16 in canton Glarus in 2007, so there's more than a decade of data to analyse the impact of the measure.)

March 24th, 2019

by martin_sustrik


Why the AI Alignment Problem is Unsolvable

24 марта, 2019 - 07:10
Published on March 24, 2019 4:10 AM UTC

The following is a chapter from the story I've been writing which contains a proof I came up with that the value alignment problem is unsolvable. I know it sounds crazy, but as far as I can tell the proof is completely correct. There are further supporting technical details which I can explain if anyone asks, but I didn't want to overload you guys with too much information at once, since a lot of those additional supporting details would require articles of their own to explain.

I am not the first person to make a correct proof that the Value Alignment problem is unsolvable. The credit for that goes to my friend Exceph, who came up with a longer and more technical proof which involves content from the Sequence on Instrumental Rationality we've been working on. His proof has not been published yet.

I haven't had time yet to extract my own less technical proof from the narrative dialogue of my story, but I thought it was really important that I share it here as soon as possible, since the more time is wasted on AI research, the less time we have to come up with strategies and solutions that could more effectively prevent x-risk long term.

Also, HEAVY SPOILERS for the story I've been writing, Earthlings: People of the Dawn. This chapter is literally the last chapter of part 5, after which the remaining parts are basically extended epilogues. You have been warned.



There were guards standing outside the entrance to the Rationality Institute. They saluted Bertie as he approached. Bertie nodded to them as he walked past. He reached the front doors and turned the handle, then pulled the door open.

He stepped inside. There was no one at the front desk. All the lights were on, but he didn’t hear anyone in the rooms he passed as he walked down the hallway, approaching the door at the end.

He finally stood before it. It was the door to Thato’s office.

Bertie knocked.

“Come in,” he heard Thato say from the other side.

Bertie turned the knob with a sweaty hand and pushed inwards. He stepped inside, hoping that whatever Thato wanted to talk to him about, that it wasn’t an imminent existential threat.

“Hello Bertie,” said Thato, somberly. He looked sweaty and tired, with bags under his puffy red eyes. Had he been crying?

“Hi Thato,” said Bertie, gently shutting the door behind him. He pulled up a chair across from Thato’s desk. “What did you want to talk to me about?”

“We finished analyzing the research notes on the chip you gave us two years ago,” said Thato, dully.

“And?” asked Bertie. “What did you find?”

“It was complicated, it took us a long time to understand it,” said Thato. “But there was a proof in there that the value alignment problem is unsolvable.”

There was a pause, as Bertie’s brain tried not to process what it had just heard. Then…

“WHAT!?” Berite shouted.

“We should have realized it earlier,” said Thato. Then in an accusatory tone, “In fact, I think you should have realized it earlier.”

“What!?” demanded Bertie. “How? Explain!”

“The research notes contained a reference to a children's story you wrote: A Tale of Four Moralities,Thato continued, his voice rising.It explained what you clearly already knew when you wrote it, that there are actually FOUR types of morality, each of which has a different game-theoretic function in human society: Eye for an Eye, the Golden Rule, Maximize Flourishing and Minimize suffering.”

“Yes,” said Bertie. “And how does one go from that to ‘the Value Alignment problem is unsolvable’?”

“Do you not see it!?” Thato demanded.

Bertie shook his head.

Thato stared at Bertie, dumbfounded. Then he spoke slowly, as if to an idiot.

“Game theory describes how agents with competing goals or values interact with each other. If morality is game-theoretic by nature, that means it is inherently designed for conflict resolution and either maintaining or achieving the universal conditions which help facilitate conflict resolution for all agents. In other words, the whole purpose of morality is to make it so that agents with competing goals or values can coexist peacefully! It is somewhat more complicated than that, but that is the gist.”

“I see,” said Bertie, his brows furrowed in thought. “Which means that human values, or at least the individual non-morality-based values don’t converge, which means that you can’t design an artificial superintelligence that contains a term for all human values, just the moral values.”

Then Bertie had a sinking, horrified feeling accompanied by a frightening intuition. He didn’t want to believe it.

“Not quite,” said Thato cuttingly. “Have you still not realized? Do you need me to spell it out?”

“Hold on a moment,” said Bertie, trying to calm his racing anxiety.

What is true is already so, Bertie thought.

Owning up to it doesn’t make it worse.

Not being open about it doesn’t make it go away.

And because it’s true, it is what is there to be interacted with.

People can stand what is true, for they are already enduring it.

Bertie took a deep breath as he continued to recite in his mind…

If something is true, then I want to believe it is true.

If something is not true, then I want not to believe it is true.

Let me not become attached to beliefs I may not want.

Bertie exhaled, still overwhelmingly anxious. But he knew that putting off the revelations any longer would make it even harder to have them. He knew the thought he could not think would control him more than the thought he could. And so he turned his mind in the direction it was afraid to look.

And the epiphanies came pouring out. It was a stream of consciousness, no--a waterfall of consciousness that wouldn’t stop. Bertie went from one logical step to the next, a nearly perfect dance of rigorously trained self-honesty and common sense--imperfect only in that he had waited so long to start it, to notice.

“So you can’t program an intelligence to be compatible with all human values, only human moral values,” Bertie said in a rush. “Except even if you programmed it to only be compatible with human moral values, there are four types of morality, so you’d have four separate and competing utility functions to program into it. And even if somehow you could program an intelligence to optimize for those four competing utility functions at the same time, that would just cause it to optimize for conflict resolution, and then it would just tile the universe with tiny artificial conflicts between artificial agents for it to resolve as quickly and efficiently as possible without letting those agents do anything themselves.”

“Right in one,” said Thato with a grimace. “And as I am sure you already know, turning a human into a superintelligence would not work either. Human values are not sufficiently stable. If you instruct a superintelligent human to protect other humans from death or grievous injury without infringing on their self-determination, that human would by definition have to stay out of human affairs under most circumstances, only intervening to prevent atrocities like murder, torture or rape, or to deal with the occasional existential threat. It would eventually go mad with boredom and loneliness, and it would snap.

“So, to summarize,” Bertie began, slowly. “The very concept of an omnibenevolent god is a contradiction in terms. It doesn’t correspond to anything that could exist in any self-consistent universe. It is logically impossible.”

“Hindsight is twenty-twenty, is it not?” asked Thato rhetorically.


“So what now?” asked Bertie.

“What now?” repeated Thato. “Why, now I am going to spend all of my money on frivolous things, consume copious amounts of alcohol, say anything I like to anyone without regard for their feelings or even safety or common sense, and wait for the end. Eventually, likely soon, some twit is going to build a God, or blow up the world in any number of other ways. That is all. It is over. We lost.”

Bertie stared at Thato. Then in a quiet, dangerous voice he asked, “Is that all? Is that why you sent me a message saying that you urgently wanted to meet with me in private?”

“Surely you see the benefit of doing so?” asked Thato. “Now you no longer will waste any more time on this fruitless endeavor. You too may relax, drink, be merry and wait for the end.”

At this point Bertie was seething. In a deceptively mild tone he asked, “Thato?”

“Yes?” asked Thato.

“May I have permission to slap you?”

“Go ahead,” said Thato. “It does not matter anymore. Nothing does.”

Bertie leaned over the desk and slapped Thato across the face, hard.

Thato seized Bertie’s wrist and twisted it painfully.

“That bloody hurt, you git!”

“I thought you said nothing matters!?” Bertie demanded. “Yet it clearly matters to you whether you’re slapped.”

Thato released Bertie’s wrist and looked away. Bertie massaged his wrist, trying to make the lingering sting go away.

"Are you done being an idiot?" he asked.

"Define 'idiot'," said Thato scathingly, still not looking at him.

"You know perfectly well what I mean," said Bertie.

Thato ignored him.


Bertie clenched his fists.

“In the letter Yuuto gave me before he died, he told me that the knowledge contained in that chip could spell Humanity’s victory or its defeat,” he said angrily, eyes blazing with determination. “Do you get it? Yuuto thought his research could either destroy or save humankind. He wouldn’t have given it to me if he didn’t think it could help. So I suggest you and your staff get back to analyzing it. We can figure this out, and we will.”

Bertie turned around and stormed out of the office.

He did not look back.


A Tale of Four Moralities

24 марта, 2019 - 06:46
Published on March 24, 2019 3:46 AM UTC

Author's note: This is a children's story I wrote a while back, which teaches a very important life lesson that none of us got to learn as kids. That lesson is extremely important, so all the adults here should pay attention too. I'll explain more of the technical details of the underlying theory behind it later.


Ivan was very angry.
His teddy was stolen.

Ivan decided.
He would catch the thief and steal from them.

"This will pay them back," said Ivan. "Serves them right."

Goldie was very happy.
It was her birthday.
Her papa gave her a teddy.

Goldie decided.
She would give a gift to her papa in return.

"It was nice of him to give me a teddy," said Goldie.
"This is the least I can do."

The next day, her teddy was gone.

Minnie was very sad. Someone was stealing teddies from her friends.
She looked at her teddy.
Would she be next?

Minnie decided. She would find the stolen teddies.
And she would return them.

"It's the right thing to do," said Minnie.
"This way, no one will be missing their teddies. Not anymore."

The next day, her teddy was gone.

Maxie felt guilty, but hopeful.
Earlier, his mama told him something sad.

"The other neighborhood is poor.
Kids there don't have teddies."

So Maxie decided.
He would steal teddies from his friends. He would give them to the other neighborhood.

"It's the best thing I can do," said Maxie. "My friends can afford new teddies. But the poor kids can't."

So Maxie stole teddies from his friends,
and gave them to the other neighborhood.

This made the kids there happy.
But his friends were sad, because now THEY had no teddies.

The next day,
the sad kids went with their parents to the teddy store,
to buy them new teddies.
But the store was all sold out of teddies.

"It's been hard to sell teddies in this town," said the store clerk. "Many poor people can't afford them. And many rich people already have teddies."

"Why not give teddies to the poor?
For free?" asked Maxie.

"We tried that before," said the clerk.
"It didn't work.
A long line of people came for teddies.
Many poor people can't afford cars.
When they got here, they were last in line. Then they got to the front of the line.
But by then, we were out of teddies."

"Then why give teddies to the rich?" asked Minnie.
"Can't you tell them no?"

"Other rich people paid us to give teddies for free.
They can't do that all the time.

"We have to sell to the rich, too.
Otherwise, we can't afford to make teddies.

"At all."

"Why not?" asked Goldie.

"We have to pay for the stuff to make the teddy," said the clerk.

"Why can't you just get that stuff for free?" asked Maxie.
"Then you could give teddies, without being paid."

"Maxie," said Maxie's mama. "There aren't enough teddies for everyone.
There isn't enough stuff to make that many."

Maxie began to cry.
"I wanted to make more people happier," he said.
"I thought by giving teddies to poor kids, I could make more of the town happier. There are more kids in the poor neighborhood.
And they had no teddies."

"YOU stole our teddies!" Ivan accused. "You should be punished.
Someone should steal a teddy from you."

"I'm sorry!" said Maxie.
"I don't have any teddies.
I gave them to the kids in the other neighborhood."

"Maybe if you asked nicely, they would return our teddies?" asked Goldie.

"No," said Minnie.
"They would feel the same way we did, when the teddies were stolen from us.
They don't know the teddies were stolen.
If we tell them, they won't know we're telling the truth."

No one was sure what to do.

Finally, Maxie said,
"We need to find a way to make more stuff.
That way, there will be enough to make teddies for everyone."

"And if we can't do that?" asked Minnie.

"I don't know," said Maxie.
"But we have to try!"

"Why should we help everyone?
The poor kids have never helped us," said Goldie.

"What else can we do?" asked Minnie. "We can't steal the teddies back."

"The poor kids didn't do anything wrong!" said Ivan. "We shouldn't punish them!"

"Maybe if we find a way to make more stuff," said Maxie.
"the poor kids will have enough to give you something, in return."

"Okay," said Goldie. "I'll help."

The kids talked.

The parents looked at each other.

"Do you think they can do it?" asked Goldie's papa.

Ivan's mama laughed.
She thought it was a joke.

Minnie's papa sighed sadly.

And Maxie's mama turned to the kids and said:

"If you're kind and just,
understanding and giving.
If you listen to each other, and to others.
If you work hard and do your best.
If you learn, grow and become stronger.
If you are brave, and never give up.
Then, maybe, you will find a way."

They would find a way to make more stuff. Someday.

And so they began their quest.


800 scientist call out against statistical significance

23 марта, 2019 - 15:57

Willing to share some words that changed your beliefs/behavior?

23 марта, 2019 - 05:08
Published on March 23, 2019 2:08 AM UTC

I'm collecting data on powerfully persuasive speech acts; it's part of a dangling thread of curiosity after GPT-2 (a new and fairly powerful text generation algorithm). I'm skeptical of the danger of mind-warping sentences as sometimes presented in fiction, or AI scenarios, and trying to get a sense of what the territory is like.

I've made a form to collect personal examples of things-someone-said that caused you to seriously change some belief or behavior. An easy example would be if someone declared that they love you, and this caused you to suddenly devote a lot more (or a lot less!) time and attention to them as a person.

If you have five minutes, my goal for this form is 1000+ responses and your own response(s) will help with that. All replies are anonymous, and there's a place for you to restrict how the information is used/state confidentiality desires. You can also fill it out more than once if you want.

https://goo.gl/forms/39x3vJqNomAome382 is the link to the form, if you want to share with anyone else; I'm happy to have this spread around wherever.