# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 9 минут 8 секунд назад

1 декабря, 2018 - 16:57
Published on December 1, 2018 1:57 PM UTC

Discuss

1 декабря, 2018 - 08:46
Published on December 1, 2018 5:46 AM UTC

One broad argument for AI risk is the Misspecified Goal argument:

The Misspecified Goal Argument for AI Risk: Very intelligent AI systems will be able to make long-term plans in order to achieve their goals, and if their goals are even slightly misspecified then the AI system will become adversarial and work against us.

My main goal in this post is to make conceptual clarifications and suggest how they affect the Misspecified Goal argument, without making any recommendations about what we should actually do. Future posts will argue more directly for a particular position. As a result, I will not be considering other arguments for focusing on AI risk even though I find some of them more compelling.

I think of this as a concern about long-term goal-directed behavior. Unfortunately, it’s not clear how to categorize behavior as goal-directed vs. not. Intuitively, any agent that searches over actions and chooses the one that best achieves some measure of “goodness” is goal-directed (though there are exceptions, such as the agent that selects actions that begin with the letter “A”). However, this is not a necessary condition: many humans are goal-directed, but there is no goal baked into the brain that they are using to choose actions.

This is related to the concept of optimization, though with intuitions around optimization we typically assume that we know the agent’s preference ordering, which I don’t want to assume here. (In fact, I don’t want to assume that the agent even has a preference ordering.)

One potential formalization is to say that goal-directed behavior is any behavior that can be modelled as maximizing expected utility for some utility function; in the next post I will argue that this does not properly capture the behaviors we are worried about. In this post I’ll give some intuitions about what “goal-directed behavior” means, and how these intuitions relate to the Misspecified Goal argument.

Generalization to novel circumstances

Consider two possible agents for playing some game, let’s say TicTacToe. The first agent looks at the state and the rules of the game, and uses the minimax algorithm to find the optimal move to play. The second agent has a giant lookup table that tells it what move to play given any state. Intuitively, the first one is more “agentic” or “goal-driven”, while the second one is not. But both of these agents play the game in exactly the same way!

The difference is in how the two agents generalize to new situations. Let’s suppose that we suddenly change the rules of TicTacToe -- perhaps now the win condition is reversed, so that anyone who gets three in a row loses. The minimax agent is still going to be optimal at this game, whereas the lookup-table agent will lose against any opponent with half a brain. The minimax agent looks like it is “trying to win”, while the lookup-table agent does not. (You could say that the lookup-table agent is “trying to take actions according to <policy>”, but this is a weird complicated goal so maybe it doesn’t count.)

In general, when we say that an agent is pursuing some goal, this is meant to allow us to predict how the agent will generalize to some novel circumstance. This sort of reasoning is critical for the Goal-Directed argument for AI risk. For example, we worry that an AI agent will prevent us from turning it off, because that would prevent it from achieving its goal: “You can't fetch the coffee if you're dead.” This is a prediction about what an AI agent would do in the novel circumstance where a human is trying to turn the agent off.

This suggests a way to characterize these sorts of goal-directed agents: there is some goal such that the agent’s behavior in new circumstances can be predicted by figuring out which behavior best achieves the goal. There's a lot of complexity in the space of goals we consider: something like "human well-being" should count, but "the particular policy <x>" and “pick actions that start with the letter A” should not. When I use the word goal I mean to include the first kind but not the second, even though I currently don’t know theoretically how to distinguish between the two cases.

Note that this is in stark contrast to existing AI systems, which are particularly bad at generalizing to new situations.

Honestly, I’m surprised it’s only 90%. [1]

Empowerment

We could also look at whether or not the agent acquires more power and resources. It seems likely that an agent that is optimizing for some goal over the long term would want more power and resources in order to more easily achieve that goal. In addition, the agent would probably try to improve its own algorithms in order to become more intelligent.

This feels like a consequence of goal-directed behavior, and not its defining characteristic, because it is about being able to achieve a wide variety of goals, instead of a particular one. Nonetheless, it seems crucial to the broad argument for AI risk presented above, since an AI system will probably need to first accumulate power, resources, intelligence, etc. in order to cause catastrophic outcomes.

I find this concept most useful when thinking about the problem of inner optimizers, where in the course of optimization through a rich space you stumble across a member of the space that is itself doing optimization, but for a related but still misspecified metric. Since the inner optimizer is being “controlled” by the outer optimization process, it is probably not going to cause major harm unless it is able to “take over” the outer optimization process, which sounds a lot like accumulating power. (This discussion is extremely imprecise and vague; see the upcoming MIRI paper on “The Inner Alignment Problem” for a more thorough discussion.)

Our understanding of the behavior

There is a general pattern in which as soon as we understand something, it becomes something lesser. As soon as we understand rainbows, they are relegated to the “dull catalogue of common things”. This suggests a somewhat cynical explanation of our concept of “intelligence”: an agent is considered intelligent if we do not know how to achieve the outcomes it does using the resources that it has (in which case our best model for that agent may be that it is pursuing some goal, reflecting our tendency to anthropomorphize). That is, our evaluation about intelligence is a statement about our epistemic state. Some examples that follow this pattern are:

• As soon as we understand how some AI technique solves a challenging problem, it is no longer considered AI. Before we’ve solved the problem, we imagine that we need some sort of “intelligence” that is pointed towards the goal and solves it: the only method we have of predicting what this AI system will do is to think about what a system that tries to achieve the goal would do. Once we understand how the AI technique works, we have more insight into what it is doing and can make more detailed predictions about where it will work well, where it tends to make mistakes, etc. and so it no longer seems like “intelligence”. Once you know that OpenAI Five is trained by self-play, you can predict that they haven’t seen certain behaviors like standing still to turn invisible, and probably won’t work well there.
• Before we understood the idea of natural selection and evolution, we would look at the complexity of nature and ascribe it to intelligent design; once we had the mathematics (and even just the qualitative insight), we could make much more detailed predictions, and nature no longer seemed like it required intelligence. For example, we can predict the timescales on which we can expect evolutionary changes, which we couldn’t do if we just modeled evolution as optimizing reproductive fitness.
• Many phenomena (eg. rain, wind) that we now have scientific explanations for were previously explained to be the result of some anthropomorphic deity.
• When someone performs a feat of mental math, or can tell you instantly what day of the week a random date falls on, you might be impressed and think them very intelligent. But if they explain to you how they did it, you may find it much less impressive. (Though of course these feats are selected to seem more impressive than they are.)

Note that an alternative hypothesis is that humans equate intelligence with mystery; as we learn more and remove mystery around eg. evolution, we automatically think of it as less intelligent.

To the extent that the Misspecified Goal argument relies on this intuition, the argument feels a lot weaker to me. If the Misspecified Goal argument rested entirely upon this intuition, then it would be asserting that because we are ignorant about what an intelligent agent would do, we should assume that it is optimizing a goal, which means that it is going to accumulate power and resources and lead to catastrophe. In other words, it is arguing that assuming that an agent is intelligent definitionally means that it will accumulate power and resources. This seems clearly wrong; it is possible in principle to have an intelligent agent that nonetheless does not accumulate power and resources.

Also, the argument is not saying that in practice most intelligent agents accumulate power and resources. It says that we have no better model to go off of other than “goal-directed”, and then pushes this model to extreme scenarios where we should have a lot more uncertainty.

To be clear, I do not think that anyone would endorse the argument as stated. I am suggesting as a possibility that the Misspecified Goal argument relies on us incorrectly equating superintelligence with “pursuing a goal” because we use “pursuing a goal” as a default model for anything that can do interesting things, even if that is not the best model to be using.

Summary

Intuitively, goal-directed behavior can lead to catastrophic outcomes with a sufficiently intelligent agent, because the optimal behavior for even a slightly misspecified goal can be very bad according to the true goal. However, it’s not clear exactly what we mean by goal-directed behavior. A sufficient condition is that the algorithm searches over possible actions and chooses the one with the highest “goodness”, but this is not a necessary condition.

“From the outside”, it seems like a goal-directed agent is characterized by the fact that we can predict the agent’s behavior in new situations by assuming that it is pursuing some goal, and as a result it is acquires power and resources. This can be interpreted either as a statement about our epistemic state (we know so little about the agent that our best model is that it pursues a goal, even though this model is not very accurate or precise) or as a statement about the agent (predicting the behavior of the agent in new situations based on pursuit of a goal actually has very high precision and accuracy). These two views have very different implications on the validity of the Misspecified Goal argument for AI risk.

[1] This is an entirely made-up number.

Tomorrow's AI Alignment Forum sequences post will be 'Benign model-free RL' by Paul Christiano in the sequence on iterated amplification.

The next post in this sequence will be 'Coherence arguments do not imply goal-directed behavior' by Rohin Shah, on Sunday 2nd December.

Discuss

### Making sense of Vitamin D

30 ноября, 2018 - 22:00
Published on November 30, 2018 7:00 PM UTC

Epistemological status: I studied bioinformatics but I'm no domain expert and lay out my current view on the issue based on the facts I found in the literature. This is not health advice.

tl,dr:

There’s strong evolutionary selection for producing enough Vitamin D. Vitamin D works together with Vitamin K2, and as a result high Vitamin D supplementation should come with Vitamin K2 supplementation.

Personally, I decided to consume every morning 10 000 IU of Vitamin D3 and 200 µg of K2 (all-trans MK7) and believe that it’s likely beneficial for many fellow rationalists who don’t spend a lot of time exposing a lot of skin to the sun to take the same supplements. It’s important to take Vitamin D3 early in the day given that calcitriol has a short half-life and we are likely adapted to have higher calcitriol levels in the day than at night.

Taking vitamin D3 with K2 in the morning

Before reading further, I want you to look at the graphic below and assess to what extend you believe the two maps are the same or differ:

It's interesting how similar those two maps happen to be. The map is from the chapter Human Skin Pigmentation as an Adaptation to UV Radiation written by Nina G. Jablonski and George Chaplin. A is a map for UVB intensity over the world. On the other hand, B is about the skin color of the residents of the area by their sensitivity for the wavelength of 305mn.

There's a lot of human migration. Given that human likely only arrived in Australia in 65000 ago, evolution didn't have much time to evolve skin color to be able to allow different Australian natives to have a skin color that matches the the UVB radiation. This illustrates that there's huge selection pressure on skin color and this is likely selection pressure that's driven by the need to produce Vitamin D3 via UVB radiation.

Take a look at the two maps again and ponder how well they are aligned.

The importance of Vitamin D

Given that Vitamin D3 seems important enough to get the skin color changed to match UVB radiation it's reasonable to assume reducing the amount of UVB radiation humans get produces Vitamin D3 deficiency that has negative health consequences. Gwern's meta-analysis on Vitamin D suggests that the existing academic publications suggest that this costs 0.33 years of life. Later in this article I will make the case why I believe that the literature is likely biased to underrate the effect for systematic reasons.

Which Vitamin D should be supplemented?

Humans produce Vitamin D3. While Vitamin D2 that gets produced by mushrooms can also be used by humans, it makes sense to supplement Vitamin D3 as it’s the form of Vitamin D around which evolution optimized us given that it’s what’s available in the natural environment. Besides the evolutionary argument The case against ergocalciferol (vitamin D2) as a vitamin supplement lists a few other reasons as well.

How much Vitamin D3 should be taken?

For clinical care, it appears that all current methodologies are adequate if one targets a 25(OH)D value higher than current cut points; for example, a value of 40 ng/ml (100 nmol/L) is without toxicity and virtually ensures that the individual's “true” value is greater than 30 ng/ml (75 nmol/L). A clinical approach of targeting a higher 25(OH)D value seems prudent in that improving vitamin D status should reduce multiple adverse consequences of vitamin D deficiency at extremely low cost with minimal toxicity risk.

In a German study they found median 25(OH)D levels of 19.8 ng/ml. While this isn't a meta-review it illustrates that plenty of people are strongly under the recommendation of the Endocrine Society. I'm German and I used to take 5000 IU vitamin D3 per day and got tested by my doctor and got a value of 33ng/ml. That's a bit over the recommended minimum of 30 ng/ml but not enough to get over the inherent error of the essay. As a result I upped my daily D3 intake to 10000 IU which is maintenance tolerable upper limits recommended by the Endocrine Society Guideline.

How high is 10000 IU? According the Endocrine Society Guideline, when an adult wearing a bathing suit is exposed to one minimal erythemal dose of UV radiation (a slight pinkness to the skin 24 h after exposure), the amount of vitamin D produced is equivalent to ingesting between 10,000 and 25,000 IU. At the same time 10000 IU is much higher than the recommended daily intake of 600 IU by IOM (US), 600 IU by EFSA (EU) and 800 IU (20 µg Vitamin D) by the DGE in Germany.

Dimitrios T. Papadimitriou argues in The Big Vitamin D Mistake that 10 000 IU should be consumed to reach the 40 ng/ml 25(OH)D blood level and that due to statistical errors it’s assumed in the official guidelines that less Vitamin D3 supplementation is required to reach that level.

Above I spoke about my belief, that the existing studies underrate the benefits of Vitamin D3 supplementation. I believe that for two reasons. The first is about the timing of Vitamin D3 supplementation and the second is about K2.

The effect of timing of Vitamin D supplementation

The studies we have generally make the assumption that timing doesn't matter and blood 25(OH)D levels are the only interesting variable. Given that we don’t have an routine clinical essay to measure vitamin D3 or vitamin D2 serum concentration we can’t focus on their serum levels.

Another name for 25(OH)D level is Calcifediol (or 25-hydroxyvitamin D / calcidiol) while the substance we supplement or that our skin produces in response to UVB light exposure is cholecalciferol (or Vitamin D3). Calcifediol gets produced in our liver from cholecalciferol. Additionally our kidney turns cholecalciferol into calcitriol (1,25-dihydroxyvitamin D). Both calcifediol and calcitriol are used in many different pathways. Calcifediol has a biological half-life of 2–3 weeks while calcitriol has a half-life of 4–15 h.

The mainstream belief that timing is irrelevant is supported by the fact that calcitriol levels doesn’t get directly affected affected by Calcifediol levels. At the same time Seth Roberts gathered examples of multiple people whose sleep improved when they took Vitamin D3 in the morning and whose sleep got worse when they took it in the evening. Multiple folks in Quantified Self replicated that effect for themselves but unfortunately there aren’t studies that investigated it deeper.

Given that there’s no harm in taking it in the morning I personally take my Vitamin D3 in the morning even when there’s no high quality evidence for it.

The role of K2

The second important variable is K2. Atli Arnarson makes the case in Is Vitamin D Harmful Without Vitamin K? that Vitamin D toxicity at high doses is usually about K2 deficiency because both are needed in the same pathways and more K2 is needed when there's more Vitamin D. Vitamin D toxicity leads to hypercalcemia where there's too much Calcium in the blood. Calcifediol moves some calcium from the bone to the blood and K2 is needed to put the calcium in the bones. Hypercalcemia is bad because it lead to blood vessel calcification. Observational studies link low Vitamin K2 levels to blood vessel calcification with K2 being more important than K1.

Why might we have a K2 deficiency compared to the ancestral environment? K2 is found in animal liver and fermented foods which means food in which bacteria grew. In the ancestral environment it was hard to prevent bacteria from growing in the food that was consumed and thus the related consumption was higher. Seth Roberts makes here the point that we value the taste of umami that primarily comes from eating microbe-rich foot in the ancestral environment. Today, most westerners also don't eat animal liver while ancestral humans consumed it.

Which K2?

There are multiple forms of K2 (menaquinone) that can theoretically be used to supplement. The most commonly discussed are MK-4 and MK-7. According to Sato et al that MK-4 has from supplements has no direct bioavailability coming to the conclusion “MK-7 is a better supplier for MK-4 in vivo than MK-4 itself.” Schurgers et all writes however that he has unpublished data that suggest MK-4 bioavailability. It would be good to have more research to get more clear about MK-4 bioavailability. There’s also MK-5, MK-8 and MK-9 however it’s not widely used as a supplement and there’s more research needed.

Given the current research it seems sensible to me to go with pure MK-7 supplements.

MK-7 exists in a trans- and a cis-form where only the trans-form is used by humans. Given that some supplements contain a mix of both forms it’s desirable to buy a MK-7 supplement that specifies that it’s all-trans (or 99% all-trans).

Conclusion

On that basis I have decided for myself to consume for now 10000 IU Vitamin D3 per day and 200 µg Vitamin K2 (all-trans MK-7)*. I take it every morning. I don’t have strong reasons for 200 µg but it’s the default size of supplements and there’s no known toxicity of it. While going out into the sun would also be a way to acquire Vitamin D3, it causes wrinkles in the face that while I don’t try to minimize sun exposure I won’t maximize it when I can get my Vitamin D3 through a cheap supplement.

* I brought me current K2 supplement before doing this research and it doesn’t specify trans vs. cis but I will buy all-trans MK7 the next time.

Discuss

### Overconfident talking down, humble or hostile talking up

30 ноября, 2018 - 15:41
Published on November 30, 2018 12:41 PM UTC

A Distribution of Knowlege

If one were to make a distribution of the amount of knowledge different people have about, say, macroeconomics, I would suspect the distribution to be somewhat lognormal; they would have tails to both ends, but be very skewed to the right. Most people have almost no knowledge of macroeconomics, some have a bit, then there is a long tail of fewer and fewer people who make up the experts.

The above graph doesn’t exactly resemble what I’d expect for macroeconomics but acts as a rough heuristic. The large numbers represent halvings of the remaining percentiles (3/4th, 7/8th, 15/16th, etc).[1]

I’m going to posit the following claims:[2]

Claim 1: It’s easy to judge where on the curve people are who are lower than you.

Claim 2: It’s difficult to judge where on the curve people are who are higher than you, absent of sophisticated systems to support this.

Given these, let’s imagine a few situations:

1. Say you’re the local economist for a state government. You have some actions you really would like the government to take, even though your colleagues wouldn’t typically approve. You’re around a +4 on the macroeconomic scale, and your most knowledgeable colleagues are around a +2. Could you get away with pretending that macroeconomics has a very confident stance that happens to align with what you want to see happen? How would you do so?

1. You’re a local radio intellectual who’s a +3 on the macroeconomic scale. Almost all of your listeners are below a +1.5. You’ve been essentially lying to them for some time about macroeconomic theory because it helps your political message. A professor who’s a +5 starts writing a few articles that call you out on your lies. How do you respond?

1. You’re a college student who’s a +3 on the macroeconomic scale. Your professors are all a good deal higher and will be the primary ones evaluating your studies. You want to legitimately learn macroeconomics. How do you treat your professors?

Overconfident talking down, humble or hostile talking up

I think the answers I’d expect from these questions can be summarized in the phrase “Overconfident talking down, humble or hostile talking up.”

When you’re communicating with people who know less than you, and you have little accountability from people who know more, then you generally have the option of claiming to be more knowledgeable than you are, and lying in ways that are useful to you.

When you’re communicating with people who know more than you, you have two options. You can accept their greater state of knowledge, causing you to speak more honestly about the pertinent topics. Or, you could reject their credibility, claiming that they really don’t know more than you. Many people who know less than you both may believe you over them.

There are many examples of this. One particularly good one may be the history of schisms in religious organizations. Religious authorities generally know a lot more about their respective religions than the majority of citizens. Each authority has a choice; they could either accept the knowledge of the higher authorities, or they could reject the higher authorities. If they reject above authority, they would be incentivized to discredit that authority and express overconfidence in their own new beliefs. If they succeed, some followers would believe them, giving them both the assumption of expertise and also the flexibility of not having to be accountable to other knowledgeable groups. If they both defect on their previous authorities and fail, then they may wind up in a very poor position, so after defecting it's very important to ensure that their existing audience gives them full support.

The Economics of Knowledge Signaling

In slightly more economic terms, one could say that there are strong signals going up the chain of knowledge (from the nonexperts to the experts), and weak signals going down it. The market for knowledgeable expertise is one with relatively low transparency and typical incentives to lie and deceive, similar to the markets for lemons.

I'm not claiming with this that all of the overconfidence and discrediting is knowingly dishonest.[3] I'm also not claiming that this is original; much is quite obvious and parts are definitely studied. That said, I do get the impression that the science of signaling is still pretty overlooked (much of this is from Robin Hanson), and this is one area I think may not be well understood as a holistic economic system.

Finally, I'm reminded of the old joke:

"Once I saw this guy on a bridge about to jump. I said, “Don’t do it!” He said, “Nobody loves me.” I said, “God loves you. Do you believe in God?” He said, “Yes.” I said, “Are you a Christian or a Jew?” He said, “A Christian.” I said, “Me, too! What franchise?” He said, “Protestant.” I said, “Me, too! Northern Baptist or Southern Baptist?” He said, “Northern Baptist.” I said, “Me, too! Northern Conservative Baptist or Northern Liberal Baptist?” He said, “Northern Conservative Baptist.” I said, “Me, too! Northern Conservative Baptist Great Lakes Region, or Northern Conservative Baptist Eastern Region?” He said, “Northern Conservative Baptist Great Lakes Region.”I said, “Me, too!” “Northern Conservative Baptist Great Lakes Regions Council of 1879 or Northern Conservative Baptist Great Lakes Region Council of 1912?” He said “Northern Conservative Baptist Great Lakes Council of 1912.” I said, “Die, heretic!” And I pushed him over.

One may wonder what incentives seem to lead to such heartfelt but predictably frequent divisions.

1. This is similar to the log-odds scale. Standard deviation could also be used, but I find it a bit unintuitive, especially for non-normal distributions.
2. These mostly come from lots of anecdotal evidence, some general reasoning, and my memories of a few research studies. I’ve spent around 40 minutes attempted to locate useful studies for this post, but haven’t, though I’m quite sure I remember reading about related ones several years ago. If you have any recommended links, please post in the comments.
3. The first few chapters of The Elephant in the Brain go into this.

Discuss

### Tentatively considering emotional stories (IFS and “getting into Self”)

30 ноября, 2018 - 10:40
Published on November 30, 2018 7:40 AM UTC

I’ve recently been getting a lot out of the psychotherapy model of Internal Family Systems, as described in this book. I just wrote a comment on Slate Star Codex describing some of its basics and what I’ve gotten out of it, and thought that I might as well repost it here:

I recommend this book, though with the note that I often don’t need to follow the full process outlined there. Sometimes it’s definitely necessary, but what I’ve found even more commonly useful is something that it discusses at the beginning of the book, which it calls “getting into self”.

Here’s the basic idea. Suppose that a part of your mind is really angry at someone, and telling a story (which might not be true) about how that person is a horrible person with no redeeming qualities. Internal Family Systems says that there are three modes in which you might react to that part:

First, you may be entirely blended with it (for those familiar, this corresponds to what Acceptance and Commitment Therapy calls cognitive fusion). This means that you are experiencing everything in terms of the story that it is telling, and have forgotten that this is an emotional reaction. So you feel that it’s just objectively true that the other person is horrible and with no redeeming qualities.

Or you might be partially blended with it. In this case, you realize that you are experiencing an emotional reaction, and that your thoughts and feelings might not be entirely justified, but you still feel them and might not be able to stop yourself from behaving according to them anyway.

Finally, you might be “in Self”, meaning entirely unblended. Here you are still aware of the emotions and thoughts, but your subjective experience is that they’re not your emotions, they’re someone else’s – they’re coming from a part of your mind which is experienced as separate from “you”. In this mode, you do not feel threatened or overwhelmed by them, and you can maintain a state of open curiosity towards whether or not they are actually true.

My experience is that usually if I have an unpleasant emotion, I will try to do one of two things: either reject it entirely and push it out of my mind, or buy into the story that it’s telling and act accordingly. Once I learned the techniques for getting into Self, I got the ability to sort of… just hang out with the emotion, neither believing it to be absolutely true nor needing to show it to be false. And then if I e.g. had feelings of social anxiety, I could keep those feelings around and go into a social situation anyway, making a kind of mental move that I might describe as “yes, it’s possible that these people all secretly hate me; I’m going to accept that as a possibility without trying to add any caveats, but also without doing anything else than accepting its possibility”.

The consequence has been that this seems to make the parts of my mind with beliefs like “doing this perfectly innocuous thing will make other people upset” actually update their beliefs. I do the thing, the parts with this belief get to hang around and observe what happens, notice that nobody seems upset at me, and then they are somewhat less likely to bring up similar concerns in the future.

In terms of global workspace theory, my model here is that there’s a part of the mind that’s bringing up a concern that should be taken into account in decision-making. The concern may or may not be justified, so the correct thing to do is to consider its possibility, but not necessarily give it too much weight. Going into Self and letting the message stay in consciousness this way seems to make it available for decision-making, and often the module that’s bringing it up is happy to just have its message received and evaluated; you don’t have to do anything more than that, if it’s just holding it up as a tentative consideration to be evaluated.

The book has a few different techniques that you can use for getting into Self. One that I often use is to try to get a sense of where in my body the emotional sensations are coming from, and then let my mind create a visualization based on those. Once I have a visualization and a physical location of the part, it’s easier to experience it as “not me”. Another thing that I do is to just make that mental move that I described – “okay, this is a possibility, so I’m just going to test it out”. I find it useful to first stay blended with the part for a while, to get a sense of what exactly is the story that it’s trying to tell, before unblending and getting into Self.

E.g. a while back I was having a sense of loneliness as I laid down for a nap. I stepped into the part’s perspective to experience it for a while, then unblended; now I felt it as a black ice hockey puck levitating around my lower back. I didn’t really do anything other than let it be there, and maintained a connection with it. Gradually it started generating a pleasant warmth, and then the visualization transformed into a happy napping cartoon fox, curled up inside a fireball that it was using as its blanket. And then I was no longer feeling lonely.

That said, sometimes a part is not content to just raise a tentative possibility; sometimes it feels like something is an emergency, so you must act right away. Obviously, sometimes you really are in an emergency, so this is justified! But often times it’s based on the part having an unrealistic fear, which in the IFS model tends to be a result of some past trauma which it is reliving, not realizing that the circumstances of your life have changed and you’re now capable of dealing with it. In that case, you need to do the full process described in the book, where you basically get in proper contact with the part in question and address its concerns. (Actually it’s a bit more complicated than this, since the IFS model holds that there are many different kinds of parts that may have relationships with each other – so the “beer-drinking part” may be drinking beer in order to keep the traumatized part numb and safely out of consciousness, so you may actually need to deal with two different parts separately. The book goes into a lot more detail.)

Discuss

### Iterated Distillation and Amplification

30 ноября, 2018 - 07:47
Published on November 30, 2018 4:47 AM UTC

This is a guest post summarizing Paul Christiano’s proposed scheme for training machine learning systems that can be robustly aligned to complex and fuzzy values, which I call Iterated Distillation and Amplification (IDA) here. IDA is notably similar to AlphaGoZero and expert iteration.

The hope is that if we use IDA to train each learned component of an AI then the overall AI will remain aligned with the user’s interests while achieving state of the art performance at runtime — provided that any non-learned components such as search or logic are also built to preserve alignment and maintain runtime performance. This document gives a high-level outline of IDA.

Assume that we want to train a learner A to perform some complex fuzzy task, e.g. “Be a good personal assistant.” Assume that A is capable of learning to perform the task at a superhuman level — that is, if we could perfectly specify a “personal assistant” objective function and trained A to maximize it, then Awould become a far better personal assistant than any human.

There is a spectrum of possibilities for how we might train A to do this task. On one end, there are techniques which allow the learner to discover powerful, novel policies that improve upon human capabilities:

• Broad reinforcement learning: As A takes actions in the world, we give it a relatively sparse reward signal based on how satisfied or dissatisfied we are with the eventual consequences. We then allow A to optimize for the expected sum of its future rewards
• Broad inverse reinforcement learning: A attempts to infer our deep long-term values from our actions, perhaps using a sophisticated model of human psychology and irrationality to select which of many possible extrapolations is correct.

However, it is difficult to specify a broad objective that captures everything we care about, so in practice A will be optimizing for some proxy that is not completely aligned with our interests. Even if this proxy objective is “almost” right, its optimum could be disastrous according to our true values.

On the other end, there are techniques that try to narrowly emulate human judgments:

• Imitation learning: We could train A to exactly mimic how an expertwould do the task, e.g. by training it to fool a discriminative model trying to tell apart A’s actions from the human expert’s actions.
• Narrow inverse reinforcement learning: We could train A to infer our near-term instrumental values from our actions, with the presumption that our actions are roughly optimal according to those values.
• Narrow reinforcement learning: As A takes actions in the world, we give it a dense reward signal based on how reasonable we judge its choices are (perhaps we directly reward state-action pairs themselves rather than outcomes in the world, as in TAMER). A optimizes for the expected sum of its future rewards.

Using these techniques, the risk of misalignment is reduced significantly (though not eliminated) by restricting agents to the range of known human behavior — but this introduces severe limitations on capability. This tradeoff between allowing for novel capabilities and reducing misalignment risk applies across different learning schemes (with imitation learning generally being narrowest and lowest risk) as well as within a single scheme.

The motivating problem that IDA attempts to solve: if we are only able to align agents that narrowly replicate human behavior, how can we build an AGI that is both aligned and ultimately much more capable than the best humans?

Core concept: Analogy to AlphaGoZero

The core idea of Paul’s scheme is similar to AlphaGoZero (AGZ): We use a learned model many times as a subroutine in a more powerful decision-making process, and then re-train the model to imitate those better decisions.

AGZ’s policy network p is the learned model. At each iteration, AGZ selects moves by an expensive Monte Carlo Tree Search (MCTS) which uses policy pas its prior; p is then trained to directly predict the distribution of moves that MCTS ultimately settles on. In the next iteration, MCTS is run using the new more accurate p, and p is trained to predict the eventual outcome of that process, and so on. After enough iterations, a fixed point is reached — p is unable to learn how running MCTS will change its current probabilities.

MCTS is an amplification of p — it uses p as a subroutine in a larger process that ultimately makes better moves than p alone could. In turn, p is a distillation of MCTS: it learns to directly guess the results of running MCTS, achieving comparable performance while short-cutting the expensive computation. The idea of IDA is to use the basic iterated distillation and amplification procedure in a much more general domain.

The IDA Scheme

IDA involves repeatedly improving a learned model through an amplification and distillation process over multiple iterations.

Amplification is interactive and human-directed in IDA

In AGZ, the amplification procedure is Monte Carlo Tree Search — it’s a simple and well-understood algorithm, and there’s a clear mechanism for how it improves on the policy network’s original choices (it traverses the game tree more deeply). But in IDA, amplification is not necessarily a fixed algorithm that can be written down once and repeatedly applied; it’s an interactive process directed by human decisions.

In most domains, humans are capable of improving their native capabilities by delegating to assistants (e.g. because CEOs can delegate tasks to a large team, they can produce orders of magnitude more output per day than they could on their own). This means if our learning procedure can create an adequate helper for the human, the human can use the AI to amplify their ability — this human/AI system may be capable of doing things that the human couldn’t manage on their own.

Below I consider the example of using IDA to build a superhuman personal assistant. Let A[t] to refer to the state of the learned model after the end of iteration t; the initial agent A[0] is trained by a human overseer H.

Example: Building a superhuman personal assistant

H trains A[0] using a technique from the narrow end of the spectrum, such as imitation learning. Here we are imagining a much more powerful version of “imitation learning” than current systems are actually capable of — we assume that A[0] can acquire nearly human-level capabilities through this process. That is, the trained A[0] model executes all the tasks of a personal assistant as H would (including comprehending English instructions, writing emails, putting together a meeting schedule, etc).

Even though A[0] cannot discover any novel capabilities, it has two key advantages over H: it can run much faster, and many copies or versions of it can be run at once. We hope to leverage these advantages to construct a larger system — involving H and many copies of A[0] — that will substantially improve on H’s capabilities while preserving alignment with H’s values.

H can use calls to A[0] (along with other tools such as external memory) to become a better personal assistant. For example, H could assign one copy of A[0] to figuring out the best time to schedule the client’s recurring team meetings, another copy to figure out what to order the client for lunch, another copy to balance the client’s personal budget, etc. H now has the ability to get very quick solutions to sub-problems that are roughly as good as the ones H would have come up with on their own over a longer time period, and can combine these results to make much better decisions than an unaided human.

Let Amplify(H, A[0]) refer to the larger system of H + many copies of A[0] + aids. Compared to A[0] alone, the Amplify(H, A[0]) system has much higher time and resource costs but its eventual decisions are much better. Moreover, because in each of its individual decisions each copy of A[0] continues to act just as a human personal assistant would act, we can hope that Amplify(H, A[0]) preserves alignment.

In the next iteration of training, the Amplify(H, A[0]) system takes over the role of H as the overseer. A[1] is trained with narrow and safe techniques to quickly reproduce the results of Amplify(H, A[0]). Because we assumed Amplify(H, A[0]) was aligned, we can hope that A[1] is also aligned if it is trained using sufficiently narrow techniques which introduce no new behaviors. A[1] is then used in Amplify(H, A[1]), which serves as an overseer to train A[2], and so on.

Pseudocodedef IDA(H):
A <- random initialization
repeat:
A <- Distill(Amplify(H, A))

def Distill(overseer):
"""
Returns an AI trained using narrow, robust techniques to
perform.

"""

def Amplify(human, AI):
"""
Interactive process in which human uses many calls to AI to
improve on human's native performance at relevant task(s).
"""

What properties must hold for IDA to work?

The IDA scheme is a template with “slots” for Amplify and Distill procedures that have not been fully specified yet — in fact, they rely on capabilities we don’t yet have. Because IDA itself is not fully specified, it’s not clear what minimal set of properties are necessary for it to succeed.

Achieving alignment and high capability

That said, here are some general properties which seem necessary — though likely not sufficient — for IDA agents to achieve robust alignment and high capability:

1. The Distill procedure robustly preserves alignment: Given an aligned agent Hwe can use narrow safe learning techniques to train a much faster agent Awhich behaves as H would have behaved, without introducing any misaligned optimization or losing important aspects of what H values.
2. The Amplify procedure robustly preserves alignment: Given an aligned agent A, it is possible to specify an amplification scheme which calls A multiple times as a subroutine in a way that reliably avoids introducing misaligned optimization.
3. At least some human experts are able to iteratively apply amplification to achieve arbitrarily high capabilities at the relevant task: a) there is some threshold of general capability such that if someone is above this threshold, they can eventually solve any problem that an arbitrarily intelligent system could solve, provided they can delegate tasks to similarly-intelligent assistants and are given arbitrary amounts of memory and time; b) at least some human experts are above this threshold of generality — given enough time and resources, they can figure out how to use AI assistants and tools to improve their capabilities arbitrarily far.

The non-profit Ought is working on gathering more evidence about assumptions 2 and 3.

Achieving competitive performance and efficiency

Paul aims for IDA agents to be competitive with traditional RL agents in time and resource costs at runtime — this is a reasonable expectation because an IDA agent is ultimately just another learned model whose weights were tuned with an unusual training procedure.

Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects.

This was originally posted here.

Discuss

### Moving Factward

29 ноября, 2018 - 08:54
Published on Thu Nov 29 2018 05:54:28 GMT+0000 (UTC)

In the legendarium of J.R.R. Tolkien, the land of the gods is known as "The Uttermost West." For the world was originally created flat, and the gods took the westernmost region of this flat world for their dwelling place.

On our globe, of course, there is no westernmost point. And yet it is still the case that, at each position on the equator, some direction is objectively "westward". The objectivity of "westward" doesn't assume that there is some ultimate West by which the west-ness of all other positions is measured.

Analogously, there is no such thing as a "bare uninterpreted fact". "Just the facts" is not a realizable ideal.

And yet we can still recognize when one account of a situation is more "factish" than another. We can see that the second account is more of an interpretation compared to the relatively factish features given in the first account. The more-factish account is never the ultimate and unvarnished truth. Likely no coherent sense could be made of that ideal. Nonetheless, from wherever we stand, we can always "move factward".

Discuss

### Formal Open Problem in Decision Theory

29 ноября, 2018 - 06:25
Published on Thu Nov 29 2018 03:25:46 GMT+0000 (UTC)

(This post was originally published on March 31st 2017, and has been brought forwarded as part of the AI Alignment Forum launch sequence on fixed points.)

In this post, I present a new formal open problem. A positive answer would be valuable for decision theory research. A negative answer would be helpful, mostly for figuring out what is the closest we can get to a positive answer. I also give some motivation for the problem, and some partial progress.

Open Problem: Does there exist a topological space X (in some convenient category of topological spaces) such that there exists a continuous surjection from X to the space [0,1]X (of continuous functions from X to [0,1])?

Motivation:

Topological Naturalized Agents: Consider an agent who makes some observations and then takes an action. For simplicity, we assume there are only two possible actions, A and B. We also assume that the agent can randomize, so we can think of this agent as outputting a real number in [0,1], representing its probability of taking action A.

Thus, we can think of an agent as having a policy which is a function from the space Y of possible observations to [0,1]. We will require that our agent behaves continuously as a function of its observations, so we will think of the space of all possible policies as the space of continuous functions from Y to [0,1], denoted [0,1]Y.

We will let X denote the space of all possible agents, and we will have a function f:X→[0,1]Y which takes in an agent, and outputs that agent's policy.

Now, consider what happens when there are other agents in the environment. For simplicity, we will assume that our agent observes one other agent, and makes no other observations. Thus, we want to consider the case where Y=X, so f:X→[0,1]X.

We want f to be continuous, because we want a small change in an agent to correspond to a small change in the agent's policy. This is particularly important since other agents will be implementing continuous functions on agents, and we would like any continuous function on policies to be able to be considered valid continuous function on agents.

We also want f to be surjective. This means that our space of agents is sufficiently rich that for any possible continuous policy, there is an agent in our space that implements that policy.

In order to meet all these criteria simultaneously, we need a space X of agents, and a continuous surjection f:X→[0,1]X.

Unifying Fixed Point Theorems: While we are primarily interested in the above motivation, there is another secondary motivation, which may be more compelling for those less interested in agent foundations.

There are (at least) two main clusters of fixed point theorems that have come up many times in decision theory, and mathematics in general.

First, there is the Lawvere cluster of theorems. This includes the Lawvere fixed point theorem, the diagonal lemma, and the existence of Quines and fixed point combinators. These are used to prove Gödel's incompleteness Theorem, Cantor's Theorem, Löb's Theorem, and achieve robust cooperation in the Prisoner's Dilemma in modal framework and bounded variants. All of these can be seen as corollaries of Lawvere's fixed point theorem, which states that in a cartesian closed category, if there is a point-surjective map f:X→YX, then every morphism g:Y→Y has a fixed point.

Second, there is the Brouwer cluster of theorems. This includes Brouwer's fixed point theorem, The Kakutani fixed point theorem, Poincaré–Miranda, and the intermediate value theorem. These are used to prove the existence of Nash Equilibria, Logical Inductors, and Reflective Oracles.

If we had a topological space and a continuous surjection X→[0,1]X, this would allow us to prove the one-dimensional Brouwer fixed point theorem directly using the Lawvere fixed point theorem, and thus unify these two important clusters.

Thanks to Qiaochu Yuan for pointing out the connection to Lawvere's fixed point theorem (and actually asking this question three years ago).

Partial Progress:

Most Diagonalization Intuitions Do Not Apply: A common initial reaction to this question is to conjecture that such an X does not exist, due to cardinality or diagonalization intuitions. However, note that all of the diagonalization theorems pass through (some modification of) the same lemma: Lawvere's fixed point theorem. However, this lemma does not apply here!

For example, in the category of sets, the reason that there is no surjection from any set X to the power set, {T,F}X, is because if there were such a surjection, Lawvere's fixed point theorem would imply that every function from {T,F} to itself has a fixed point (which is clearly not the case, since there is a function that swaps T and F).

However, we already know by Brouwer's fixed point theorem that every continuous function from the interval [0,1] to itself has a fixed point, so the standard diagonalization intuitions do not work here.

Impossible if You Replace [0,1] with e.g. S1: This also provides a quick sanity check on attempts to construct an X. Any construction that would not be meaningfully different if the interval [0,1] is replaced with the circle S1 is doomed from the start. This is because a continuous surjection X→(S1)X would violate Lawvere's fixed point theorem, since there is a continuous map from S1 to itself without fixed points.

Impossible if you Require a Homeomorphism: When I first asked this question I asked for a homeomorphism between X and [0,1]X. Sam Eisenstat has given a very clever argument why this is impossible. You can read it here. In short, using a homeomorphism, you would be able to use Lawvere to construct a continuous map that send a function from [0,1] to itself to a fixed point of that function. However, no such continuous map exists.

Notes:

If you prefer not to think about the topology of [0,1]X, you can instead find a space X, and a continuous map h:X×X→[0,1], such that for every continuous function f:X→[0,1], there exists an xf∈X, such that for all x∈X, h(xf,x)=f(x).

Many of the details in the motivation could be different. I would like to see progress on similar questions. For example, you could add some computability condition to the space of functions. However, I am also very curious which way this specific question will go.

This post came out of many conversations, with many people, including: Sam, Qiaochu, Tsvi, Jessica, Patrick, Nate, Ryan, Marcello, Alex Mennen, Jack Gallagher, and James Cook.

This post was originally published on March 31st 2017, and has been brought forwarded as part of the AI Alignment Forum launch sequences.

Tomorrow's AIAF sequences post will be 'Iterated Amplification and Distillation' by Ajeya Cotra, in the sequence on iterated amplification.

Discuss

### Reflective oracles as a solution to the converse Lawvere problem

29 ноября, 2018 - 06:24
Published on Thu Nov 29 2018 03:23:51 GMT+0000 (UTC)

(This post was originally published on Nov 30th 2017, and is 1 of 4 posts brought forwarded today as part of the AI Alignment Forum launch sequence on fixed points.)

1 Introduction

Before the work of Turing, one could justifiably be skeptical of the idea of a universal computable function. After all, there is no computable function f:N×N→N such that for all computable g:N→N there is some index ig such that f(ig,n)=g(n) for all n. If there were, we could pick g(n)=f(n,n)+1, and then g(ig)=f(ig,ig)+1=g(ig)+1, a contradiction. Of course, universal Turing machines don't run into this obstacle; as Gödel put it, "By a kind of miracle it is not necessary to distinguish orders, and the diagonal procedure does not lead outside the defined notion." [1]

The miracle of Turing machines is that there is a partial computable function f:N×N→N∪{⊥} such that for all partial computable g:N→N∪{⊥} there is an index i such that f(i,n)=g(n) for all n. Here, we look at a different "miracle", that of reflective oracles [2,3]. As we will see in Theorem 1, given a reflective oracle O, there is a (stochastic) O-computable function f:N×N→{0,1} such that for any (stochastic) O-computable function g:N→{0,1}, there is some index i such that f(i,n) and g(n) have the same distribution for all n. This existence theorem seems to skirt even closer to the contradiction mentioned above.

We use this idea to answer "in spirit" the converse Lawvere problem posed in [4]. These methods also generalize to prove a similar analogue of the ubiquitous converse Lawvere problem from [5]. The original questions, stated in terms of topology, remain open, but I find that the model proposed here, using computability, is equally satisfying from the point of view of studying reflective agents. Those references can be consulted for more motivation on these problems from the perspective of reflective agency.

Section 3 proves the main lemma, and proves the converse Lawvere theorem for reflective oracles. In section 4, we use that to give a (circular) proof of Brouwer's fixed point theorem, as mentioned in [4]. In section 5, we prove the ubiquitous converse Lawvere theorem for reflective oracles.

2 Definitions

For any measurable space X, the set of probability measures on X is denoted ΔX.

A probabilistic oracle is a map N→[0,1]. A probabilistic oracle machine is a probabilistic Turing machine that can query a probabilistic oracle O. On a given query n∈N, it gets the answer 1 with probability O(n) and the answer 0 otherwise. Different queries are independent events, so this completely determines the behavior of such a machine.

We denote by ϕOi the stochastic partial O-computable function N→Δ(N∪{⊥}), where ⊥ represents nonhalting, computed by the probabilistic Turing machine with index i. The notation ϕOi(n)↓ indicates the event that ϕOi halts on input n, and ϕOi(n)↓=m is the event that ϕOi(n) halts and outputs m. Finally, ϕOi(n)↑ is the event that ϕOi does not halt on input n.

We use ⟨⋅⟩ to represent a computable coding function, in order to biject arbitrary countable sets to the naturals for the purpose of computability.

A reflective oracle is a probabilistic oracle O such that for all i,n∈N and p∈[0,1]Q, p & \qquad\Longrightarrow\qquad O\left(\left\langle i,n,p\right\rangle \right)=1\\ \mathbb{P}\left(\neg\phi_{i}^{O}\left(n\right)\downarrow=0\right)

P(ϕOi(n)↓=1)>p⟹O(⟨i,n,p⟩)=1P(¬ϕOi(n)↓=0)<p⟹O(⟨i,n,p⟩)=0. For more on reflective oracles, see Fallenstein et al., 2015 [2].

A function f:N→[0,1] is O-computable if there is an index i such that for all n∈N, we have P(ϕOi(n)↓∈{0,1})=1 and P(ϕOi(n)↓=1)=f(n). That is, ϕOi represents f by probabilistically outputting either 0 or 1.

For any m∈N, a function f:N→[0,1]m is O-computable if each coordinate fj:N→[0,1] for 1≤j≤m is O-computable.

A function f:N→[0,1]N is O-computable if the corresponding function N→[0,1] given by ⟨n,m⟩↦f(n)(m) is O-computable.

For any point p∈[0,1], a probabilistic oracle O is compatible with p if for all r∈[0,1]Q, we have O(⟨r⟩)=0 if p<r and O(⟨r⟩)=1 if r">p>r. More generally, for any m∈N and any p∈[0,1]m, a probabilistic oracle O is compatible with p if for all j such that 1≤j≤m and all r∈[0,1]Q, we have O(⟨j,r⟩)=0 if pj<r and O(⟨j,r⟩)=1 if r">pj>r.

A function f:[0,1]m→[0,1]m is O-computable if for each coordinate fj:[0,1]m→[0,1], there is an index i such that whenever P is compatible with p, we have P(ϕO,Pi(0)↓∈{0,1})=1 and P(ϕO,Pi(0)↓=1)=fj(p).

A map f:N→[0,1]N is O-computably ubiquitous if for every O-computable map e:N→[0,1]N, there is some n∈N such that f(n)=e(n).

3 Converse Lawvere property

We first need a lemma that tells us that we can replace certain partial O-computable functions with total ones when working relative to a reflective oracle. As discussed in the introduction, this contrasts strongly with the situation for computable functions. All of our theorems will make essential use of this lemma.

Lemma 1 (totalizing): There is a computable map τ:N→N such that for all i,n∈N and any reflective oracle O, we have P(ϕOτ(i)(n)↓∈{0,1})=1 and for b∈{0,1}, P(ϕOτ(i)(n)↓=b)≥P(ϕOi(n)↓=b).

Proof: We construct τ using a recursive procedure that ensures that P(ϕOτ(i)(n)↓=b) upper-bounds P(ϕOi(n)↓=b) using what may be thought of as repeated subdivision or binary search. This is essentially the same as the function flip in [3], but we handle computability issues differently. Let S⊆[0,1]×[0,1] be the set of pairs of dyadic rationals (p,q) such that p<q. We recursively define a stochastic O-computable function t:N×N×S→Δ{0,1}; the intent is to have P(t(i,n,p,q)=1) equal P(ϕOτ(i)(n)↓=1)−pq−p if that quantity is in the interval [0,1], and take the closest possible value, either 0 or 1, otherwise. Then, we will be able to define ϕOτ(i)(n) to be t(i,n,0,1).

Construct t so that a call t(i,n,p,q) first queries O(⟨i,n,r⟩), where r=p+q2, and also flips a fair coin C. Then, it either outputs 0, 1, or the result of a recursive call, as follows: t(i,n,p,q)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩0if O(⟨i,n,r⟩)=0 and C=0t(i,n,p,r)if O(⟨i,n,r⟩)=0 and C=11if O(⟨i,n,r⟩)=1 and C=0t(i,n,r,q)if O(⟨i,n,r⟩)=1 and C=1. We can now choose τ so that ϕOτ(i)(n)=t(i,n,0,1).

This algorithm upper bounds the probabilities P(ϕOi(n)↓=0) and P(ϕOi(n)↓=1) by binary search. Once the initial call t(i,n,0,1) has recursed to t(i,n,p,q), it has already halted outputting 1 with probability p, confirming that this is an upper bound since it received answer 1 from a call to O(⟨i,n,p⟩). Similarly, it has output 0 with probability 1−q, confirming this bound since it received the answer 0 from a call to O(⟨i,n,q⟩). Further, since each call to t halts without recursing with probability 12, t halts almost surely. Thus, we get the totality property P(ϕOτ(i)(n)↓∈{0,1})=1 and the bounds P(ϕOτ(i)(n)↓=b)≥P(ϕOi(n)↓=b) for b∈{0,1}. □

Now we can prove our main theorem.

Theorem 1 (converse Lawvere for reflective oracles): Let O be a reflective oracle. Then, there is an O-computable map f:N→[0,1]N such that for all O-computable g:N→[0,1], there is some index i such that g=f(i).

Proof: Using τ from the totalizing lemma, let f(i)(n)=P(ϕOτ(i)(n)↓=1). Given any O-computable g:N→[0,1], there is some i such that P(ϕOi(n)↓=1)=g(n)P(ϕOi(n)↓=0)=1−g(n). Then, f(i)(n)=P(ϕOτ(i)(n)↓=1)≥P(ϕOi(n)↓=1)=g(n) and similarly 1−f(i)(n)≥1−g(n), so f(i)=g. □

This theorem gives a computable analogue to the problem posed in [4]. The analogy would be strengthened if we worked in a cartesian closed category where the present notion of O-computability gave the morphisms, and where [0,1]N is an exponential object. In addition, the totalizing lemma would have a nice analogue in this setting. I expect that all this can be done using something like an effective topos [6], but I leave this for future work.

4 Recovering Brouwer's fixed point theorem

As mentioned in [4], the intermediate value theorem would follow from the existence of a space with the converse Lawvere property, that is, a space X that surjects onto the mapping space [0,1]X. Further, Brouwer's fixed point theorem on the n-ball, Bn, would follow from the existence of a topological space X with a surjection X→(Bn)X. We can do something similar to conclude Brouwer's fixed point theorem from the converse Lawvere theorem for reflective oracles. Of course, this is circular; Kakutani's fixed point theorem, a generalization of Brouwer's fixed point theorem, is used to prove the existence of reflective oracles. Still, it is interesting to see how this can be done.

We start with two technical lemmas telling us some basic facts about reflective-oracle-computable functions [0,1]m→[0,1]m. Using these, it will be easy to derive Brouwer's fixed point theorem.

Lemma 2 (continuous implies relatively computable): Take m∈N and let h:[0,1]m→[0,1]m be continuous. Then, there is a (deterministic) oracle O such that h is O-computable.

Proof: For each coordinate hj of h, each rectangle R=[ℓ1,u1]×⋯×[ℓm,um]⊆[0,1]m with rational endpoints, and each pair of rationals ℓ0,u0∈[0,1]Q with ℓ0<u0, let O(⟨j,ℓ0,u0,…,ℓm,um⟩) be 1 if hj(R)⊆[ℓ0,u0] and 0 otherwise. To see that h is O-computable, we compute any hj(p) for any j with 1≤j≤m, and any p∈[0,1]m, making use of O and any oracle P compatible with p.

We proceed by a search process similar to the argument in the totalizing lemma. Start with R0=[0,1]m, ℓ00=0 and u00=1. At each step s, perform an exhaustive search for a rectangle Rs+1=[ℓs+11,us+11]×⋯×[ℓs+1m,us+1m]⊆Rs and points ℓs+10,us+10 such that a query to P(⟨k,ℓs+1k⟩) returns 1 for all k, a query to any P(⟨k,us+1k⟩) returns 0, a query to O(⟨j,ℓs+10,us+10,Rs+1⟩) returns 1, and where either ℓs+10=ℓs0 and us+10=13ℓs0+23us0, or ℓs+10=23ℓs0+13us0 and us+10=us0. In the first case, output 0 with probability 13 and continue with probability 23, and in the second case, output 1 with probability 13 and continue with probability 23.

By construction, p∈Rs and hj(Rs)⊆[ℓs0,us0] at each stage s. Since hj is continuous, there is some neighbourhood of p on which its image is contained in either [ℓs0,13ℓs0+23us0] or [23ℓs0+13us0,us0]. There are two possibilities to consider. If p∈intRs, then there is some rectangle R contained in such a neighbourhood of p, and with p∈intR. This rectangle would be chosen if considered, so the algorithm will move beyond step s.

The remaining possibility is that p is on the boundary of Rs; say, pk=ℓsk. Since Rs was chosen previously though, we know that querying P(k,ℓs+1k) has returned 1 at least once, so P(k,ℓs+1k)≠0. Thus, the algorithm will almost surely eventually accept Rs or another rectangle.

Putting this together, this algorithm almost surely halts, outputting either 0 or 1. By stage s, it has halted outputting 1 with probability ℓs0 and outputting 0 with probability 1−us0, so overall it outputs 1 with probability hj(p). Thus, h is O-computable. □

Lemma 3 (composition): Let O be a reflective oracle and let g:N→[0,1]m and h:[0,1]m→[0,1]m be O-computable. Then, h∘g:N→[0,1]m is O-computable.

Proof: For each j with 1≤j≤m, take ij∈N such that ϕOij witnesses the computability of gj. Then, O(⟨ij,n,r⟩)=0 if gj(n)<r and O(⟨ij,n,r⟩)=1 if r">gj(n)>r, so O lets us simulate a probabilistic oracle compatible with g(n). Hence, by the O-computability of h, for each k with 1≤k≤m, we have a probabilistic O-machine that always halts, outputting either 0 or 1, and that on input n outputs 1 with probability hk∘g(n). □

Theorem 2 (Brouwer's fixed point theorem): Take m∈N and h:[0,1]m→[0,1]m. Then, h has a fixed point.

Proof: By Lemma 2, we have an oracle O such that h is O-computable. By relativizing the construction of a reflective oracle [2,3] to O, we get a reflective oracle ˜O above O. Notice that h is ˜O-computable. Letting f be the ˜O-computable function f(i)(n)=P(ϕ˜Oτ(i)(n)↓=1) constructed in the converse Lawvere theorem for reflective oracles, define fm:N→([0,1]m)N by fm(⟨i1,…,im⟩)(n)=(f(i1)(n),…,f(im)(n)). The rest will now follow the proof of Lawvere's fixed point theorem [7].

Define g:N→[0,1]m by g(n)=h(fm(n)(n)); this is ˜O-computable by Lemma 3. Now, by converse Lawvere theorem, for each coordinate 1≤j≤m of g, there is some ij∈N such that gj=f(ij). Letting i=⟨i1,…,im⟩, we have gj(i)=hj(fm(i)(i))=hj(g(i)), so g(i)=h(g(i)), and so g(i) is a fixed point of h. □

5 Ubiquitous converse Lawvere property

Theorem 3 (ubiquitous converse Lawvere): Let O be a reflective oracle. Then, there is an O-computable, O-computably ubiquitous map f:N→[0,1]N.

Proof: This follows by a combination of the recursion theorem and the totalizing lemma. We use the same map f from Theorem 1. Let e:N→[0,1]N be any O-computable map. There is a computable map s:N→N such that for all i,n∈N, we have P(ϕOs(i)(n)↓∈{0,1})=1, and e(i)(n)=P(ϕOs(i)(n)↓=1). By the recursion theorem, there is some i such that ϕOs(i)=ϕOi. Then, e(i)(n)=P(ϕOs(i)(n)↓=1)=P(ϕOi(n)↓=1)=P(ϕOτ(i)(n)↓=1)=f(i)(n), so e(i)=f(i). □

References

[1] Kurt Gödel. 1946. "Remarks before the Princeton bicentennial conference of problems in mathematics." Reprinted in: Martin Davis. 1965. "The Undecidable. Basic Papers on Undecidable Propositions, Unsolvable Problems, and Computable Functions." Raven Press.

[2] Benja Fallenstein, Jessica Taylor, and Paul F. Christiano. 2015. "Reflective Oracles: A Foundation for Classical Game Theory." arXiv: 1508.04145.

[3] Benya Fallenstein, Nate Soares, and Jessica Taylor. 2015. "Reflective variants of Solomonoff induction and AIXI." In Artificial General Intelligence. Springer.

[4] Scott Garrabrant. 2017. "Formal Open Problem in Decision Theory." https://agentfoundations.org/item?id=1356.

[5] Scott Garrabrant. 2017. "The Ubiquitous Converse Lawvere Problem." https://agentfoundations.org/item?id=1372

[6] Jaap van Oosten. 2008. "Realizability: an introduction to its categorical side." Studies in Logic and the Foundations of Mathematics, vol. 152. Elsevier.

[7] F. William Lawvere. 1969. "Diagonal arguments and cartesian closed categories." In Category Theory, Homology Theory and their Applications, II (Battelle Institute Conference, Seattle, Wash., 1968, Vol. Two), pages 134–145. Springer.

This post was originally published on Nov 30th 2017, and has been brought forwarded as part of the AI Alignment Forum launch sequences.

Tomorrow's AIAF sequences post will be 'Iterated Amplification and Distillation' by Ajeya Cotra, in the sequence on iterated amplification.

Discuss

### The Ubiquitous Converse Lawvere Problem

29 ноября, 2018 - 06:16
Published on Thu Nov 29 2018 03:16:16 GMT+0000 (UTC)

(This post was originally published on Oct 20th 2017, and is 1 of 4 posts brought forwarded today as part of the AI Alignment Forum launch sequence on fixed points.)

In this post, I give a stronger version of the open question presented here, and give a motivation for this stronger property. This came out of conversations with Marcello, Sam, and Tsvi.

Definition: A continuous function f:X→Y is called ubiquitous if for every continuous function g:X→Y, there exists a point x∈X such that f(x)=g(x).

Open Problem: Does there exist a topological space X with a ubiquitous function f:X→[0,1]X?

I will refer to the original problem as the Converse Lawvere Problem, and the new version as the the Ubiquitous Converse Lawvere Problem. I will refer to a space satisfying the conditions of (Ubiquitous) Converse Lawvere Problem, a (Ubiquitous) Converse Lawvere Space, abbreviated (U)CLS. Note that a UCLS is also a CLS, since a ubiquitous is always surjective, since g can be any constant function.

Motivation: True FairBot

Let X be a Converse Lawvere Space. Note that since such an X might not exist, the following claims might be vacuous. Let f:X→[0,1]X be a continuous surjection.

We will view X as a space of possible agents in an open source prisoner's dilemma game. Given two agents A,B∈X, we will interpret fA(B) as the probability with which A cooperates when playing against B. We will define UA(B):=2fB(A)−fA(B), and interpret this as the utility of agent A when playing in the prisoner's dilemma with B.

Since f is surjective, every continuous policy is implemented by some agent. In particular, this means gives:

Claim: For any agent A∈X, there exists another agent A′∈X such that fA′(B)=fB(A). i.e. A′ responds to B the way that B responds to A.

Proof: The function B↦fB(A) is a continuous function, since B↦fB is continuous, and evaluation is continuous. Thus, there is a policy B↦fB(A) in [0,1]X. Since f is surjective, this policy must be the image under f of some agent A′, so fA′(B)=fB(A).

Thus, for any fixed agent A, we have some other agent A′ that responds to any B the way B responds to A. However, it would be nice if A′=A, to create a FairBot that responds to any opponent the way that that opponent responds to it. Unfortunately, to construct such a FairBot, we need the extra assumption that f is ubiquitous.

Claim: If f is ubiquitous, then exists a true fair bot in X: an agent FB∈X, such that fFB(A)=fA(FB) for all agents A∈X.

Proof: Given an agent B∈X, there exists an policy gB∈[0,1]X such that gB(A)=fA(B) for all A, since A↦fA(B) is continuous. Further, the function B↦gB is continuous, since the function A,B↦fA(B) and the definition of the exponential topology. Since f is ubiquitous, there must be some FB∈X such that fFB=gFB. But then, for all A, we have fFB(A)=gFB(A)=fA(FB).

Note that we may not need the full power of ubiquitous here, but it is the simplest property I see that gets the result.

Note that this FairBot is fair in a stronger sense than the FairBot of modal combat, in that it always has the same output as its opponent. This may make you suspicious, since the you can also construct an UnfairBot, UB such that fUB(A)=1−fA(UB) for all A. This would have caused a problem in the modal combat framework, since you can put a FairBot and an UnfairBot together to form a paradox. However, we do not have this problem, since we deal with probabilities, and simply have fUB(FB)=fFB(UB)=1/2. Note that the exact phenomenon that allows this to possibly work is the fixed point property of the interval [0,1] which is the only reason that we cannot use diagonalization to show that no CLS exists.

Finally, note that we already have a combat framework that has a true FairBot: the reflective oracle framework. In fact, the reflective oracle framework may have all the benefits we would hope to get out of a UCLS. (other than the benefit of simplicity of not having to deal with computability and hemicontinuity).

This post was originally published on Oct 20th 2017, and has been brought forwarded as part of the AI Alignment Forum launch sequences.

Tomorrow's AIAF sequences post will be 'Iterated Amplification and Distillation' by Ajeya Cotra, in the sequence on iterated amplification.

Discuss

### Hyperreal Brouwer

29 ноября, 2018 - 06:15
Published on Thu Nov 29 2018 03:15:23 GMT+0000 (UTC)

(This post was originally published on Oct 6th 2017, and has been temporarily brought forwarded as part of the AI Alignment Forum launch sequence on fixed points.)

This post explains how to view Kakutani's fixed point theorem as a special case of Brouwer's fixed point theorem with hyperreal numbers. This post is just math intuitions, but I found them useful in thinking about Kakutani's fixed point theorem and many things in agent foundations. This came out of conversations with Sam Eisenstat.

Brouwer's fixed theorem says that a continuous function from a compact convex subset of Rn to itself has fixed point. Kakutani's fixed point is similar, but instead of continuous functions, it uses Kakutani functions, which are set valued functions with closed graph which are point wise nonempty and convex.

When I think about Kakutani functions, I usually think about them as limits of continuous functions. For example, consider the kakutani function f from [−1,1] to itself which sends negative inputs to 1, positive inputs to −1, and sends 0 to the entire interval. You can view f as a the limit of a sequence of functions fn sends x to min(max(−n⋅x,−1),1). This is not a point wise limit, since if it was 0 would be sent to 0, rather than the entire interval. Instead, it is a limit in the Hausdorff metric between the graphs of the functions.

Given two compact nonempty subsets X and Y of Rn, the Hausdorff distance between X and Y is the maximum over all points in X or Y of the Euclidean distance between that point and the closest point in the other set. Since X and Y are compact, this maximum is achieved.

Given compact convex subsets X and Y of Rn, we say that a sequence of continuous functions fn from X to Y converges in graph Hausdorff distance to the closed graph set valued function f if the graphs of fn viewed as subsets of X×Y converges to the graph of f in Hausdorff distance.

We say that a closed graph function f from X to Y is a continuous graph limit of if there exists a sequence of continuous functions which converges to f in graph Hausdorff distance.

Theorem 1: Every continuous graph limit f from a compact convex subset of Rn to itself has a fixed point (a point contained in its image).

Proof: f is a limit of continuous functions fn each of which has a fixed point by Brouwer. Choose one fixed point from each fn to get a sequence of points which has a convergent subsequence by Bolzano–Weierstrass. Let x be the limit of this convergent subsequence. If f(x) did not contain x, then (x,x) would not be in the graph of f. Since f is closed graph, a ball around (x,x) would not be in the graph of f, which contradicts the fact that {fn} must contain functions with graphs arbitrarily close to the graph of f with fixed points arbitrarily close to x and thus points in their graphs arbitrarily close to (x,x). □

This theorem is not equivalent to Kakutani's fixed point theorem. There exist continuous graph limits which are not point wise convex (but only in more than one dimension). For example the function from [−1,1]2 to [−1,1]2 which sends every point to the circle of points distance 1 from 0 is not Kakutani, but is a continuous graph limit. It is the limit of functions fn given by (x,y)↦(cos(n⋅x),sin(n⋅x)).

However, this theorem is strictly stronger that Kakutani's fixed point theorem (although sometimes harder to use, since showing a function is Kakutani might be easier than showing it is a continuous graph limit)

Theorem 2: Given compact convex subsets X and Y of Rn, every Kakutani function f from X to Y is a continuous graph limit.

Proof: We define a function fn as follows. Take a finite set S of radius 1/n open balls in X×Y such that each ball intersects the graph of f, the X coordinates of the centers of all the balls are distinct, and the balls cover the entire graph of f. This induces a covering SX of X by radius 1/n open balls by taking balls centered at the X coordinates of the centers of S. We continuously map each point in X to a weighted average of balls in SX as follows. If a point is the center of some ball, it is sent to 100% that ball. Otherwise, it is sent to a combination of all the balls in which it is contained with weight proportional to (the reciprocal of the distance to the center of that ball) minus n. This gives a function fn from X to Y by mapping each point to the weighted average of the Y coordinates of the centers of the balls in S with weights equal to the weights of the corresponding balls in SX above. One can verify that fn is continuous.

Observe that the graph of fn contains the centers of all balls in S. Thus, fn contains points within 1/n of every point is the graph of f. Thus, if fn did not converge to f, it must be because infinitely many fn contain points a distance from the graph of f bounded away from 0. Consider a convergent subsequence of these points. This gives a point (x,y) not in the graph of f and a subsequence of the fn with points in their graphs converging to (x,y).

Let d be half the distance between y and f(x), and consider the set T of all points in Y at most d from the nearest point in f(x). Note that T is convex. Note that all (x′,y′) in the graph of f with x′ sufficiently close to x must have y′ in T, since otherwise there would be (x′,y′) with x′ converging to x which must have a convergent subsequence with y′ converging to a point not in f(x), contradicting the fact that f has closed graph.

However fn must send all points within distance ε of x to a point in the convex hull of the images under f of points within ε+1/n of x. But, we showed that for ε sufficiently small and 1/n sufficiently large all of these points must be in T. Therefore, for all sufficiently large n, fn must send all points within ε of x to points in T, which are bounded away from y, contradicting the assumption that points in the fn converge to (x,y). □

Corollary: Kakutani's fixed point theorem

We have proven (a strengthening of) Kakutani's Fixed Point Theorem from Brouwer's fixed point theorem, and given a way to think about Kakutani functions as just limits of graphs of continuous functions, and thus have better intuitions about what (a superset of) Kakutani functions look like. We will now take this further and think about Kakutani as a consequence of an analogue of Brouwer using Hyperreal infinitesimal numbers. (I am going to be informal now. I am not going to use standard notation here. I am not going to make sense to people who don't already know something about non-standard analysis. Sorry.)

Given a compact convex subset X of Rn, we can define ∗X to be the set of all equivalence classes of infinite sequences of elements of X, where two sequences are equivalent if they agree on a set that matters according to some ultrafilter U on N. A function ∗f from ∗X to itself is defined by a sequence of functions from X to itself {fn}, where you apply the functions pointwise. I will call a function hyper-continuous if each of the component functions is continuous. (I am not sure what this is actually called.) Each point in ∗X has a standard part, which is a point in X, which is the unique point such that a subset of components that matter converges to X.

Claim: Every hyper continuous function ∗f from ∗X to itself has a fixed point.

Proof: Just take the sequence of the fixed points of the individual component functions.□

Claim: Continuous graph limits f from X to Y are exactly those closed graph set-valued functions such that there exists a hyper-continuous function ∗f:∗X→∗Y such that y∈f(x) if and only if there exist points ∗x∈∗X and ∗y∈∗Y with standard parts x and y respectively such that f(∗x)=∗y

Proof: Use the same sequence of functions with graphs converging to that of f as the sequence of functions defining ∗f.□

Thus, we can view continuous graph limits (and thus Kakutani functions) as something you get when looking at just the standard part of a hyper-continuous function from the hyper version of X to itself. The fixed point will fix everything, including the infinitesimal parts, and we do not have to deal with any set-valued functions.

For example, consider our original function f from [−1,1] to itself which sends negative inputs to 1, positive inputs to −1, and sends 0 to the entire interval. We can view this as a function ∗f involving infinitesimals where everything with positive real part is sent to something with real part −1, everything with negative real part is sent to something with real part 1, and the infinitesimal numbers very close to 0 are sent to something in-between. If we use the sequence of functions from above and let the infinitesimal ε be {1/n}, then zooming in on the inputs between −ε and ε, ∗f will just be a steep linear function with slope −1/ε.

Now to be even more vague and connect things back up with agent foundations, perhaps this can give some good intuitions about what is happening with reflective oracles and probabilistic truth predicates. The oracle/truth predicate is effectively "zooming in" on the area around a specific probability, and when you stack oracle calls or truth predicates within each other, you can zoom in further. The fact that the probabilistic truth predicate does not know that it is reflectively consistent, can be viewed as it not believing a sentence akin to "If I assign probability less that ε to ϕ, then I also assign probability less that ε2 to ϕ," which seems very reasonable. It also makes reflective oracles and the probabilistic truth predicates look more similar to other approaches to the same problem that are more hierarchy forming solutions to the same problem like normal halting oracles. Here the hierarchy comes from zooming in further and further on the infinitesimal in the Kakutani function.

This post was originally published on Oct 6th 2017, and has been brought forwarded as part of the AI Alignment Forum launch sequences.

Tomorrow's AIAF sequences post will be 'Iterated Amplification and Distillation' by Ajeya Cotra, in the sequence on iterated amplification.

Discuss

### Sorry, we broke the "All Posts" filter. Will be back in a few minutes

29 ноября, 2018 - 06:15
Published on Thu Nov 29 2018 03:15:06 GMT+0000 (UTC)

Programming is hard. All posts will be back for your enjoyment within a few minutes.

Discuss

28 ноября, 2018 - 23:33
Published on Wed Nov 28 2018 20:33:30 GMT+0000 (UTC)

This has been sitting in my drafts folder since 2011. Decided to post it today given the recent post about Dunning—Kruger and related discussions.

The standard rationalist answer when someone asks for career advice is "find your comparative advantage." I don't have any really good suggestions about how to make this easier, but it seems like a good topic to bring up for discussion.

If 15 years ago (when I was still in college and my initial career choice hadn't been finalized yet), someone told me that perhaps I ought to consider a career in philosophy, I would have laughed. "You must be joking. Obviously, I'll be really bad at doing philosophy," I would have answered. I thought of myself as a natural born programmer, and that's the career direction I ended up choosing.

As it turns out, I am a pretty good programmer, and a terrible philosopher, but it also happens to be the case that just about everyone else is even worse at doing philosophy, and getting some philosophical questions right might be really important.

The usual (instinctive) way for someone to choose a career is probably to pick a field that they think they will be particularly good at, using a single standard of goodness across all of the candidate fields. For example, the implicit reasoning behind my own career choice could be something like "Given a typical programming problem, I can solve it in a few hours with high probability. Whereas, given a typical philosophical problem, I can at best solve it after many years with low probability."

On the other hand, comparative advantage says that in addition to your own abilities, you should also consider how good other people are (or will be) at various fields, and how valuable the outputs of those fields are (will be). Unless you're only interested in maximizing income, and the fields you're considering are likely to remain stable over your lifetime (in which case you can just compare current salaries, although apparently many people don't even do that), this can be pretty tricky.

(There doesn't appear to be any previous OB/LW posts on comparative advantage. The closest I could find is Eliezer's Money: The Unit of Caring. Most discussions elsewhere seem to focus on simple static examples where finding comparative advantage is relatively trivial.)

Today (in 2018) there's an 80,000 Hour article about comparative advantage but that is more about how to find one's comparative advantage in a community of people who share a cause, like in EA, rather in the wider economy.

I would also add (in 2018) that besides everyone else lacking skill or talent at something, an even bigger source of comparative advantage is being one of the first people to realize that a problem is a problem, or to realize an important new variant or subproblem of an existing problem. In that case, everyone else is really bad at solving that problem just because they have no idea the problem even exists.

Discuss

### Stabilize-Reflect-Execute

28 ноября, 2018 - 20:26
Published on Wed Nov 28 2018 17:26:39 GMT+0000 (UTC)

You've recently joined a major organization in a senior management role. How can you organize your plans?

One simple way to think about them is with what can be called the "Stabilize-Reflect-Execute" cycle.

Stabilize

You first check if there are any urgent issues and address them immediately. Are there burning problems or opportunities that need to be dealt with? Second, you do anything you need to do to best prepare yourself for reflection. If there are people you need to talk to in order to get necessary information, you set that up upfront.

Reflect

Once urgent issues are dealt with and you are able to properly access the situation, you work to do so. For executives this can mean a lengthy period of discussions with all of the relevant people and thoughts on strategy before making formal announcements. This could take a few weeks or months.

Execute

Now is the time to begin working on non-urgent important problems, which should be the main ones. You follow through with your reflection. Execution may involve deciding on pursuing future larger stabilize-reflect-execute loops.

Let’s summarize. “Stabilize” refers to handling urgent issues and preparing for reflection. This is similar to the notion of getting one’s “house in order.” “Reflect” refers to deciding how to best deal with the important non-urgent issues. “Execute” refers to working on the important issues. This is basically a subset of the Eisenhower Method for situations where these three steps make up the majority of the work.

Examples

I think this cycle plays out in many important situations, so may be worth some independent study. Some examples of these cycles include:

Necessary Conditions

The stabilize-reflect-execute cycle is good for some specific situations. I think it may require the following:

1. There are some tasks that are both urgent and important.

If this is not true, the "stabilize" step isn't necessary.

2. There are some tasks that are both non-urgent and important.

If this is not true, the "reflect" and "execute" steps aren't necessary.

3. There is significant expected value for spending time figuring out how to do the non-urgent and important tasks.

If this is not true, the "reflect" step isn't necessary.

4. Figuring out how to do the important tasks can be done in one batch per full cycle.

If this is not true, then the work may be a bit of a mess of stabilizations, reflections, and executions, as opposed to one serial process.

Implications

We could use this model to compare use cases

I think that different use cases (like the examples above) share a good amount of similarities. I hope that they could be seen as such, and used to help understand each other. One may think that we don’t have may similar references classes to things like AI takeoffs and global governance. While this may be generally true, I hope that this similarity could provide a bit of a counter.

We should acknowledge uncertainty of the “Execute” stage

One trying to understand the stabilize-reflect-execute of an actor may try to predict the actions of the execution stage, but I this should be regarded with skepticism. The point of the reflect stage is to better understand how to enact the execute stage; its’ presence suggests that the execution stage could go in different ways. This means that a lot of attention on stabilize-reflect-execute processes should be on the first two parts, rather than the third.

Comparison to the Eisenhower Method

Image from James Clear

The urgent / important distinction comes from the Eisenhower Matrix. That matrix is supposed to apply to more situations, so is more generic. Work around the matrix typically recommends "quickly doing" the "important & urgent" tasks, then "planning" the "important & non urgent" tasks.

Comparison to Decision Cycles

Wikipedia lists several examples of interesting decision cycles. These are sequences with specific steps for decision making. For instance, in quality control, "PDCA (Plan-Do-Check-Act) is used." I believe the decision cycles typically include some kind of learning component, which is absent in the stabilize-reflect-execute cycle.

Many thanks to Toby Ord for discussion & inspiration, and to Owen Cotton-Barratt and Max Daniel for feedback.

Discuss

### Genetically Modified Humans Born (Allegedly)

28 ноября, 2018 - 19:14
Published on Wed Nov 28 2018 16:14:05 GMT+0000 (UTC)

I realize normally we don't talk about the news or hot-button issues, but this is of sufficiently high importance I am posting anyway.

There is a link from Nature News on Monday here. There is a link from MIT Technology Review discussing the documents uploaded by the team behind the effort here.

Summarizing the report I heard on the radio this morning:

• The lead scientist is He Jiankui, with a team at Southern University of Science and Technology, in Shenzhen.
• 7 couples in the experiment, each with HIV+ fathers and HIV- mothers.
• CRISPR was used to genetically edit embryos to eliminate the CCR5 gene, providing HIV resistance.
• Allegedly twin girls have been born with these edits.

I don't think DNA testing of the twins has taken place yet. If anyone has a line on good, nuanced sources, I'd be interested in hearing about it.

Discuss

### Oracle Induction Proofs

28 ноября, 2018 - 11:12
Published on Wed Nov 28 2018 08:12:38 GMT+0000 (UTC)

Notation:

{0,1}L and {0,1}∗ are the sets of all bitstrings of length L, and the set of all finite bitstrings, respectively. Elements of these sets are denoted by x. The length i prefix of x is denoted by x:i.

BL is the set of all satisfiable booleans with all variables having an index ≤L, and B is the set of all satisfiable booleans. Similarly, B∗L and B∗ are the set of all booleans with all variables having an index ≤L, and the set of all booleans. Elements of these sets are denoted by B. B∧B′ is also a boolean. Bitstrings x may be interpreted as boolean constraints saying that the prefix of the bitstring must equal x.

f is a function N+→N+ that is time-constructible, monotonically increasing, and t">f(t)>t, which gives the runtime bound for traders.

l is some function N+→N+ upper-bounded by 2f(t) that is time-constructible, monotonically increasing, and t">l(t)>t. It gives the most distant bit the oracle inductor thins about on turn t.

d is the function N+→N+ given by d(t)=l(t)+⌈log2l(t)⌉+2t+7, giving the number of iterations of binary-search deployed on turn t.

δ is the function N+→[0,1]∪Q given by δ(t)=2−(t+3). It gives the proportion of the inductor distribution that is made up by the uniform distribution.

The operations query(A,p) and flip(p) take unit time to execute in our model of computation, though the input still has to be written down in the usual amount of time.

Given some arbitrary distribution Δ over {0,1}L, and a B∈B∗L, Δ(B) is the probability mass assigned to bitstrings that fulfill B. Given a B∈BL, ΔB is the conditional distribution given by ΔB(x)=Δ(B∧x)Δ(B). Δ∗B is the distribution induced by Approx(Δ,L,D,B). There is an implicit dependence on D which will be suppressed in the notation.

Lemma 1 (n-Lemma): Let ϵ=2−D. Given some distribution Δ over {0,1}L for which ∀x∈{0,1}L:Δ(x)≥nϵ, then

∀B∈BL,x∈{0,1}L:(n−1n+1)LΔB(x)≤Δ∗B(x)≤(n+1n−1)LΔB(x) .

To begin with, if x is incompatible with B, Δ∗B(x)=0 by the construction of Approx, and the inequality is trivial. So we can consider the case where x is compatible with B. Given σ as a strict prefix of x, abbreviate NextBit(Δ,L,D,|σ|+1,B,σ) as NB(σ).

Because we're using a binary-search depth of D on each bit, and taking the average of the interval, the probabilities are approximated to within ϵ2.

Δ(B∧σ1)−ϵΔ(B∧σ)+ϵ<Δ(B∧σ1)−ϵ2Δ(B∧σ)+ϵ2≤P(NB(σ)=1)≤Δ(B∧σ1)+ϵ2Δ(B∧σ)−ϵ2<Δ(B∧σ1)+ϵΔ(B∧σ)−ϵ

Taking the midle three terms and subtracting 1 from all of them, which flips the sign of the inequality, and rewriting 1−P(NB(σ)=1) as P(NB(σ)=0) and rewriting 1 as Δ(B∧σ)±ϵ2Δ(B∧σ)±ϵ2, we get

Δ(B∧σ)−ϵΔ(B∧σ)+ϵ<Δ(B∧σ)−ϵ2Δ(B∧σ)−ϵ2−Δ(B∧σ1)+ϵ2Δ(B∧σ)−ϵ2≤P(NB(σ)=0)≤Δ(B∧σ)+ϵ2Δ(B∧σ)+ϵ2−Δ(B∧σ1)−ϵ2Δ(B∧σ)+ϵ2<Δ(B∧σ)+ϵΔ(B∧σ)−ϵ

Now, because Δ∗B(x)=∏Li=1P(NB(x:i−1)=xi), using the previous inequalities, we can establish

∏Li=1Δ(B∧x:i)−ϵΔ(B∧x:i−1)+ϵ<Δ∗B(x)<∏Li=1Δ(B∧x:i)+ϵΔ(B∧x:i−1)−ϵ

For an individual term in the product of the lower/upper bound (± and ∓ will be used, the upper sign is used for the lower bound, the lower sign is used for the upper bound, and ⋛ will be abused to mean ≥ for the lower bound and ≤ for the upper bound), we can get the following equality.

Δ(B∧x:i)∓ϵΔ(B∧x:i−1)±ϵ=Δ(B∧x:i)Δ(B∧x:i−1)±ϵ∓ϵΔ(B∧x:i−1)±ϵ=Δ(B∧x:i)Δ(B∧x:i−1)Δ(B∧x:i−1)Δ(B∧x:i−1)±ϵ(1∓ϵΔ(B∧x:i−1))

Because, by assumption, all bitstrings with nonzero probability and therefore prefixes of them have probability bounded below by nϵ, we get

Δ(B∧x:i−1)Δ(B∧x:i−1)±ϵ⋛nϵnϵ±ϵ=nn±1 and 1∓ϵΔ(B∧x:i−1)⋛1∓ϵnϵ=1∓1n=n∓1n

which can be applied to conclude

Δ(B∧x:i)Δ(B∧x:i−1)Δ(B∧x:i−1)Δ(B∧x:i−1)±ϵ(1∓ϵΔ(B∧x:i−1))⋛Δ(B∧x:i)Δ(B∧x:i−1)nn±1n∓1n=Δ(B∧x:i)Δ(B∧x:i−1)n∓1n±1

Therefore, since we have lower-bounded or upper-bounded every term in the product, and Δ(B∧x:L)=Δ(B∧x)=Δ(x), we can show

Δ∗B(x)≷∏Li=1Δ(B∧x:i)∓ϵΔ(B∧x:i−1)±ϵ⋛∏Li=1Δ(B∧x:i)Δ(B∧x:i−1)n∓1n±1=Δ(x)Δ(B)(n∓1n±1)L=ΔB(x)(n∓1n±1)L

The n-Lemma is thus proven.

Lemma 2: The distribution induced by OI(t) has the probability of all bitstrings bounded above by nϵ.

Identify D with d(t)=l(t)+⌈log2l(t)⌉+2t+7. Identify L with l(t). Identify n with l(t)2t+4.

Because ϵ=2−D, ϵ=2−(l(t)+⌈log2l(t)⌉+2t+7)≤1l(t)2−(l(t)+2t+7). Due to δ(t) of the distribution being composed of the uniform distribution, the probability of all x is bounded below by

δ(t)2−L=2−(t+3)2−l(t)=l(t)2t+41l(t)2−(l(t)+2t+7)≥nϵ

And Lemma 2 is proved.

From here on, we will frequently be going from the approximation of a conditional distribution to a conditional distribution via the n-Lemma and Lemma 2, so let ϵt=2−d(t), and nt=l(t)2t+4, and Lt=l(t). Δi and ΔiB will be used to refer to the probability distribution and conditional probability distribution over bitstrings that OI(i) produces, and Δ∗i and Δ∗iB refer to the approximation of that probability distribution with D=d(t).

Lemma 3: The assessed value at time t in world x of a trader T with a runtime and oracle-call filter attached is lower-bounded by

−t+∑1≤i≤tmax((nt−1nt+1)Lt∑B∈BLt(P(Ti=B)ΔiB(x))−ϵt,0)Δi(x)+ϵt and upper-bounded by

−t+(nt+1nt−1)Lt∑1≤i≤t∑B∈BLtP(Ti=B)ΔiB(x)Δi(x) .

To begin, the assessed value at time t in world x of a trader T by OverBudget is −t+∑1≤i≤t(∑B∈BLt(P(Ti=B)Δ∗iB(x)))∗Δ∗i(x) .

Δ∗i(x) has d(t) rounds of binary search run on it, which yields an interval with a width of ϵt, and an upper-bound estimate is taken, so Δi(x)≤Δ∗i(x)≤Δi(x)+ϵt. Similarly, the net value of a trade has a lower-bound estimate run on it, so

∑B∈BLt(P(Ti=B)Δ∗iB(x))−ϵt≤(∑B∈BLtP(Ti=B)Δ∗iB(x))∗≤∑B∈BLtP(Ti=B)Δ∗iB(x)

Further, by the n-Lemma and Lemma 2 (probability of x is either 0 or over ntϵt), we can replace Δ∗iB(x) by (nt+1nt−1)LiΔiB(x) or (nt−1nt+1)LiΔiB(x) respectively, which are then upper and lower-bounded by (nt+1nt−1)LtΔiB(x) and (nt−1nt+1)LtΔiB(x), which, because the fraction doesn't depend on i, can be pulled out of the sum, yielding the stated bounds.

In the next theorem, we will be abbreviating ∑B∈Bl(i)P(Ti=B)ΔiB(x) as ai and abbreviating Δi(x) as ci, so the value of a trader at timestep t in world x against the sequence of OI distributions is −t+∑1≤i≤taici.

Theorem 1: If a trader exploits the market, there is a budgeted version that trader that exploits the market.

As before, the net value of a trader at timestep t in world x against the sequence of OI distributions is −t+∑1≤i≤taici. Because the trader exploits the market, for all t and all x plausible at t, the net value ≥−b for some integer constant b, we get that for all t and all x plausible at t, ∑1≤i≤taici≥t−b. Call this quantity the gain of the trader. Separate the sum into "good" days where ai≥tϵt, and "bad" days where this is false, and use Lemma 2 to get

Abbreviate (nt−1nt+1)Lt as yt, and abbreviate (nt−1nt+1)Lt+1 as zt. Using Lemma 3, the worst-case assessed gain of the trader is ≥∑1≤i≤tmax(ytai−ϵt,0)ci+ϵt≥∑goodytai−ϵtci+ϵt. Focusing on an individual day, and using the fact that ci≥ntϵt, and it's a good day so ai≥tϵt, we get

\frac{n_t}{n_t+1}\left(y_t-\frac{1}{t}\right)\frac{a_i}{c_i}> \left(\frac{n_t-1}{n_t+1}y_t-\frac{1}{t}\right)\frac{a_i}{c_i}= \left(z_t-\frac{1}{t}\right)\frac{a_i}{c_i}">ytai−ϵtci+ϵt=cici+ϵtaici(yt−ϵtai)>ntnt+1(yt−1t)aici>(nt−1nt+1yt−1t)aici=(zt−1t)aici

Substituting this into the sum and pulling out terms that don't depend on i, we get

(z_t-\frac{1}{t})\sum_{good}\frac{a_i}{c_i}> (z_t-\frac{1}{t})(t-b-\frac{5}{64})> tz_t-b-\frac{5}{64}-1">∑goodytai−ϵtci+ϵt>(zt−1t)∑goodaici>(zt−1t)(t−b−564)>tzt−b−564−1

The last step was done by zt<1 and ignoring some positive terms. Now we will bound tzt.

(Lt+1)∫nt+1nt−1dxx<2(Lt+1)nt−1≤8Ltnt=2−(t+1)≤14t<∫tt−1/4dxx

Multiplying both sides by −1, which flips the upper and lower bounds of integration, we get

\\ \int_{t}^{t-1/8}\frac{dx}{x}= \ln(t-1/4)-\ln(t)= \ln\left(\frac{t-1/4}{t}\right)= \ln\left(1-\frac{1/4}{t}\right)}">ln((nt−1nt+1)Lt+1)=(Lt+1)(ln(nt−1)−ln(nt+1))=(Lt+1)∫nt−1nt+1dxx>∫t−1/8tdxx=ln(t−1/4)−ln(t)=ln(t−1/4t)=ln(1−1/4t)

By putting both sides in an exponent, this inequality is equivalent to 1-\frac{1/4}{t}">(nt−1nt+1)Lt+1>1−1/4t. Multiplying both sides by t and using the definition of zt, we get t-\frac{1}{4}">tzt>t−14.

Using the previous inequalities, we get that the worst-case assessed gain of the trader is greater than t−b−14−564−1=t−b−8564. Therefore, the worst-case assessed value of the trader is greater than −b−8564, so when the budget is b+3, OverBudget will at worst evaluate the value as -(b+3)+1">−b−8564>−(b+3)+1, so it will never return 1 and interfere with the trade, so the budgeted trader makes the same trades as the original trader and exploits the market.

Lemma 4: The worst-case value of a trader with a budget of b is ≥−b−1/4.

Assume the value of a trader equals −b for some b∈R≥0 on some turn t. Then the gain of the trader, ∑1≤i≤taici is ≤t−b. By Lemma 3, the assessed gain is less than

(nt+1nt−1)Lt∑1≤i≤taici=(nt+1nt−1)Lt(t−b)<t(nt+1nt−1)Lt−b

Now we will bound the last term.

ln((nt+1nt−1)Lt)=Lt(ln(nt+1)−ln(nt−1))=Lt∫nt+1nt−1dxx<2Ltnt−1≤4Ltnt=2−(t+2)≤1/4t+1<1/4t+1/4<∫t+1/4tdxx=ln(t+1/4)−ln(t)=ln(t+1/4t)=ln(1+1/4t)

By putting both sides in an exponent, we get the equivalent inequality (nt+1nt−1)Lt<1+1/4t and multiplying both sides by t yields t(nt+1nt−1)Lt<t+1/4, so the assessed gain is less than t−b+1/4 and the assessed value is ≤−b+1/4.

Then, if the trader has a budget of b (an integer), the worst-case scenario is that on some turn t it has a true value of −b+3/4 and it is assessed to have a value of −b+1 (maximum possible), so it barely passes the budgeting filter, and then loses 1 dollar on the next turn, getting a final value of −b−1/4 (after which it only outputs B⊤ which has a value of 0).

Theorem 2: If a budgeted trader exploits the market, the supertrader exploits the market.

Pi(A∧b)=Pi(A)Pi(b) is the probability of the supertrader drawing a bitstring a which encodes the algorithm A and budget b on timestep i. P(Aib=B) is the probability of A(i) with a runtime and oracle call filter and a budget filter of b outputting the boolean B. P(A) is the probability of a randomly selected infinite string encoding A, and P(b)=2−b. For sufficiently large i, Pi(A)=P(A) and Pi(b)=P(b).

A useful thing to note is that, because the bitstring a that is chosen is of length f(t), and it is only run on the UTM for f(t) steps, its behavior is identical to that of all algorithms that are encoded by a bitstring that has a as a prefix. Therefore, even though the probability of some A being drawn may be 0 on a timestep, there is some amount of probability mass in the supertrader that might as well have come from A, in the sense that its behavior is indistinguishable from A after the runtime and oracle call filter is applied.

Similarly, because the maximum selectable budget is f(t), and a trader can lose at most 1 dollar each turn, for all A and positive integers c, Aif(i) has the same exact behavior as Aif(i)+c, so we can pretend we are dealing with the full distribution over all budgets.

Therefore, for all i, the distribution over satisfiable booleans given by ∑A,bPi(A∧b)P(Aib=B) equals the distribution over satisfiable booleans given by ∑A,bP(A)P(b)P(Aib=B), and because of this,

∀i∀x∈{0,1}Li:∑A,bPi(A∧b)∑B∈BLiP(Aib=B)ΔiB(x)=∑A,bP(A)2−b∑B∈BLiP(Aib=B)ΔiB(x)

Also, let Vi(Ab) be the value of A(i)'s trade on day i according to some fixed x. V≤i(Ab) is the value that Ab accumulates over all days up to t.

The value of the supertrader at time t according to a world x that is plausible at that time is

∑1≤i≤t(−1+∑A,bPi(A∧b)∑B∈BLiP(Aib=B)ΔiB(x)Δi(x))=∑1≤i≤t∑A,bP(A)2−b(−1+∑B∈BLiP(Aib=B)ΔiB(x)Δi(x))=∑1≤i≤t∑A,bP(A)2−bVi(Ab)=∑A,bP(A)2−bV≤t(Ab)

Now the worst-case value of V≤t(Ab) is −b−14 by Lemma 4, so we get that the value of the supertrader is bounded below by

∑AP(A)∑b2−b(−b−1/4)=−94∑AP(A)=−94.

Also, if our exploiting trader and the budget which ensures its trade is never altered are T and b′, we get that the value of the supertrader equals

P(A)2^{-b'}V_{\le t}(T_{b'})-\frac{9}{4}}">∑A,bP(A)2−bV≤t(Ab)=P(A)2−b′V≤t(Tb′)+∑A,b≠T,b′P(A)2−bV≤t(Ab)>P(A)2−b′V≤t(Tb′)−94

Because V≤t(Tb′) is unbounded above for appropriate choices of t and x by assumption, it continues to be unbounded above when multiplied by P(A)2−b′ and has 94 subtracted from it. Therefore, the supertrader has plausible value unbounded above as well.

Theorem 3: The supertrader doesn't exploit OI.

Now, we will show that the maximum value that the supertrader can possibly get in worlds plausible at some turn t is upper-bounded by 2−t, so the value of the supertrader is upper-bounded by 1.

To begin with, the probability mass the supertrader places on world x at timestep t is ∑A,bP(A)2−b∑B∈BP(Atb=B)ΔB(x). This sum can be rewritten as ∑B∈BΔB(x)∑A,bP(A)2−bP(Atb=B), and for brevity, from here on, we will abbreviate ∑A,bP(A)2−bP(Atb=B) as Pt(B). With this abbreviation, we can express the value of the supertrader's trade on day t and world x as ∑B∈BPt(B)ΔB(x)δ(t)2−Lt+(1−δ(t))∑B∈BPt(B)Δ∗B(x)−1. Note that the numerator of the fraction is the supertrader, which uses the true conditional distribution, while the denominator is what the actual distribution is composed of, a small fragment of a uniform distribution with the rest is composed of a mixture of approximations to the appropriate conditional distribution. To begin the bounding argument, apply the n-Lemma to yield

∑B∈BPt(B)ΔB(x)δ(t)2−Lt+(1−δ(t))∑B∈BPt(B)Δ∗B(x)−1<∑B∈BPt(B)ΔB(x)(1−δ(t))(nt−1nt+1)Lt∑B∈BPt(B)ΔB(x)−1=(nt+1nt−1)Lt1−δ(t)−1

Also,

ln((nt+1nt−1)Lt)−ln(1−δ(t))=Lt(ln(nt+1)−ln(nt−1))−ln(1−δ(t))+ln(1)=Lt∫nt+1nt−1dxx+∫11−δ(t)dxx<2Ltnt−1+δ(t)1−δ(t)<4Ltnt+2δ(t)=2−(t+2)+2−(t+2)=2−(t+1)<2−t1+2−t<∫1+2−t1dxx=ln(1+2−t)−ln(1)=ln(1+2−t)

By putting both sides in an exponent, this inequality is equivalent to (nt+1nt−1)Lt1−δ(t)<1+2−t, so (nt+1nt−1)Lt1−δ(t)−1<2−t, so the value of the supertrader gained on any given day is bounded above by 2−t, so the total value of the supertrader at any time is bounded above by 1.

Runtime Analysis:

The strength of the bounded reflective oracle needed is controlled by the longest runtime of the algorithms that the oracle is queried about. The relevant oracle calls are the invocations of SAT in NextBit, eval(B',OI(t)) in the binary search portion of NextBit, OverBudget, the SAT call in TradeToBool, the eval(B,OI(i)) calls that the trader may make, and the oracle calls in the binary search portion of OverBudget.

To begin with, the runtime of eval(B,x) is the time needed to flip enough coins to get to the most distant variable index in B (call it n), and then the time to run the boolean circuit on the padded x itself, which would be be O(n⋅|B|), to check each bit of the padded x and do a pass over the boolean to substitute ⊤ or ⊥ for that variable, and then evaluating the resulting boolean with variables filled in.

Similarly, the runtime of SAT is the time to write down B and the binary string corresponding to the probability bound, which has a length of approximately n, for a runtime of O(n+|B|).

Now we can look at the runtime of BinSearch. It is always called with D=d(t)=O(l(t)), and there are that many iterations of writing down Δ (which, everywhere it's invoked, takes less than d(t) bits) and the average of the two bounds, which, by a suitable binary encoding, takes at most about d(t) bits, so we get a runtime of about d(t)2, which is O(l(t)2).

As for the runtime of NextBit, the length of the boolean at the start is at most f(t)+O(l(t)) (the first part is because B was returned by an algorithm that ran for at most f(t) steps, and the second part is the bound on the length of x). A boolean of this length is written down 3 times, and SAT is invoked twice, and BinSearch is run twice, for a runtime of O(3(f(t)+l(t))+2(l(t)+f(t)+l(t))+2l(t)2)=O(f(t)+l(t)2) . Adding in the time to write in/read B and x from the input just adds another f(t)+l(t) term which doesn't affect runtime that much.

Now for Approx. There are at most l(t) iterations of NextBit, and we already accounted for the time to write the input to NextBit, so the runtime is O(f(t)l(t)+l(t)3).

TradeToBool reads the input, which takes O(f(t)) time, emulates f(t) steps of a turing machine, which takes O(f(t)logf(t)) time, clips the input which takes O(f(t)+l(t)) time (the latter is an upper bound on the time to compute l(t) because l is time-constructible), calls SAT on a boolean of length at most f(t) with a maximum index of l(t), and writes the boolean, for a runtime of O(f(t)logf(t)+l(t)).

BTtoBool reads the input and writes parts of it into the query, which takes O(f(t)) time, writes down the probability bound in O(l(t)) time, and runs TradeToBool, for a runtime of O(f(t)logf(t)+l(t)).

OI either writes down δ(t) in O(t) time and generates a string in O(l(t)) time, or computes f(t), l(t), and d(t) in O(f(t)+l(t)) time, generates the trader and budget string in O(f(t)) time, and invokes TradeToBool and Approx for a runtime of O(l(t)3+f(t)l(t)+f(t)logf(t)) .

This just leaves establishing the runtime of OverBudget to ensure that the oracle is strong enough for our needs.

Generating n takes O(t) time. Generating the world x takes O(l(t)) time. Running the deductive process takes O(f(t)+l(t)) time, and so does writing down the resulting boolean, and the maximum index is l(t) so the SAT invocation takes O(f(t)+l(t)) time as well, for a runtime so far of O(f(t)+l(t)). That leaves the sum (computing the < isn't harder than computing the sum in the first place). We are carrying out at most t fraction-additions. Each fraction-addition involves three multiplications, one addition, and two iterations of binary search. The length of the numbers is l(t), but with each multiplication, they get longer by l(t), so the maximum length of the numbers is tl(t). Using the standard inefficient multiplication algorithm for O(t2l(t)2) runtime per multiplication, and with addition being faster than multiplication, we get O(t(t2l(t)2+l(t)2))=O(t3l(t)2) for the whole sum, to get a runtime of O(t3l(t)2+f(t)), which again is sufficiently low to permit oracle calls.

Thus, an oracle strength of O(l(t)3+t3l(t)2+f(t)l(t)+f(t)logf(t)) suffices to make all oracle calls directed to algorithms with permissibly low runtime.

Discuss

### Bounded Oracle Induction

28 ноября, 2018 - 11:11
Published on Wed Nov 28 2018 08:11:28 GMT+0000 (UTC)

Introduction:

Because logical induction relies on the Brouwer fixed-point theorem, and reflective oracles rely on the Kakutani fixed-point theorem which Brouwer is a special case of, it's possible that logical induction could have been derived for the first time from reflective oracles. Attempting to do this in the most obvious way by having traders output a circuit that takes a binary-search approximation of the market as input doesn't produce any insights in particular. However, by attempting to redevelop the logical induction algorithm from scratch with the aid of a bounded reflective oracle, we arrive at a new way of looking at logical induction, with the following interesting features.

1: It collapses the distinction between the algorithm outputting the trading circuit, and the trading circuit itself.

2: All trades can be naturally interpreted as probability measures over bitstrings, with the reward given by a simple betting game, instead of shares that pay off if some boolean combination is true. However, the betting game doesn't incentivize true belief reporting, just moving the probability distribution of the market in the right direction. This turns out to be isomorphic to the original formulation of the value of a trade, subject to one extra restriction on worst-case trade value.

3: The market prices are also a probability measure over bitstrings/worlds, exactly like a universal inductor.

4: It does the reflective-solomonoff thing of "draw a turing machine with probability 2−K-complexity, use it to predict future bits"

First, notation will be established, and the algorithm will be discussed. Then the connection between OI-trader scoring and LI-trader scoring will be established, and questions of how many of the nice properties of LI carry over will be discussed. We will finish with basic discussion of what this result means, and then there will be an appendix which contains the definitions of auxiliary algorithms, and the next post will contain the proof that the algorithm is an Oracle Inductor.

As a quick refresher, a bounded reflective oracle takes as input a query of the form (A,p), and returns 1 if A has bounded runtime, and only makes oracle calls to algorithms with the same runtime bound, and outputs 1 with probability greater than p, and 0 if A has bounded runtime and only makes oracle calls to algorithms with the same runtime bound, and outputs 1 with probability less than p. If the probability of a 1 is exactly p, or if A exceeds runtime bounds, the oracle is allowed to randomize arbitrarily.

Notation:

{0,1}L and {0,1}∗ are the sets of all bitstrings of length L, and the set of all finite bitstrings, respectively. Elements of these sets are denoted by x. λ refers to the empty string.

BL is the set of all satisfiable booleans with all variables having an index ≤L, and B is the set of all satisfiable booleans. B∗ is the set of all booleans (including unsatisfiable ones), and B∗L is the set of all booleans with all variables having an index ≤L. Elements of these sets are denoted by B. B⊤ is the trivial boolean which is satisfied by all bitstrings. Note that a boolean B∈BL induces a function {0,1}L→{0,1} .

f is a function N+→N+ that is time-constructible, monotonically increasing, and t">f(t)>t. It will give the runtime bound for the traders.

l is some function N+→N+ upper-bounded by 2f(t), that is time-constructible, monotonically increasing, and t">l(t)>t. It gives the most distant bit the oracle inductor thinks about on turn t.

d is the function N+→N+ given by d(t)=l(t)+⌈log2l(t)⌉+2t+7 , giving the number of iterations of binary-search deployed on turn t.

δ is the function N+→[0,1]∪Q given by δ(t)=2−(t+3). It gives the proportion of the inductor distribution that is made up by the uniform distribution, to ensure that all bitstrings have a probability high enough for binary-search to accurately approximate their probability.

query(A,p) consults the bounded reflective oracle about whether A returns 1 with probability greater than p.

flip(p) rounds p down to 1 if it is greater than 1, and then flips a coin that returns 1 with probability p. In our model of computation, this operation takes unit time.

The OI Algorithm:

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 1: OI(t), N+→Δ{0,1}l(t)––––––––––––––––––––––––––––––––––––––––––IF flip(δ(t))=1     return Unif(l(t))ELSEa←Unif(f(t))b←1FOR i from 1 to f(t)     IF flip(12)=0          break     ELSE     b←b+1return Approx(OI(t),l(t),d(t),BTtoBool(a,t,b))

In short, the distribution induced by the oracle induction algorithm has a small portion of the probability mass composed of the uniform distribution, and otherwise the algorithm selects a turing machine according to the universal distribution, and a "budget", with 2−b of the probability mass on a budget of b, much like the logical inductor algorithm. In this setting, all trades can be interpreted as a mixture of probability distributions of the form "condition the oracle induction distribution on some boolean being true", so the budgeted trader is consulted to get a boolean (note that the trader may be randomized!), and Approx is used to approximately replicate the probability distribution produced by conditioning the oracle induction distribution on the resulting boolean.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 2: Approx(Δ,L,D,B), Δ{0,1}L×(N+)2×BL→Δ{0,1}L––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––x←λFOR i from 1 to L     x←x+NextBit(Δ,D,i,B,x)return x

This starts with the empty bitstring and repeatedly concantates it with NextBit which gets the next bit, until a bitstring of length L is produced. Δ, D and B are just passed on to NextBit.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 3: NextBit(Δ,D,i,B,x), Δ{0,1}L×(N+)2×BL×{0,1}i−1→Δ{0,1}––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––B′←ToPref(x)∧BB′0←B′∧¬xiB′1←B′∧xiIF SAT(B′0)=0     return 1ELSE IF SAT(B′1)=0     return 0ELSEreturn flip(BinSearch(1,eval(B′1,Δ),D)BinSearch(1,eval(B′,Δ),D))

This defines a boolean constraint that says that B must be true, and the initial prefix of the bitstring must equal x. Then two more boolean constraints are generated, which are the same except they specify that the next bit is a 0 or 1. If only one of the two is a satisfiable boolean, the next bit is forced to be 0 or 1 in compliance with the satisfiability requirement, otherwise, binary search for D iterations on the probability of Δ outputting a bitstring that satisfies B′ or B′1 is used to figure out the probability of the next bit being a 1.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 4: BinSearch(n,Δ,D), {0,1,2}×Δ{0,1}×N+→[0,1]∪Q––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––hi←1lo←0FOR i from 1 to D      IF query(Δ,hi+lo2)=1          lo←hi+lo2     ELSE     hi←hi+lo2IF n=0     return loELSE IF n=1     return hi+lo2ELSEreturn hi

This uses the oracle to implement binary-search for D rounds on some algorithm, and output either a lower-bound, average, or upper-bound estimate of the probability of the algorithm outputting 1.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 5: BTtoBool(a,t,b),{0,1}f(t)×(N+)2→Bl(t)––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––IF query(OverBudget(a,t,b),2−l(t−1)2(t−1))=1     return B⊤ELSEreturn TradeToBool(a,t)

This uses the oracle to test whether the trader is possibly over-budget, and if so, returns the null boolean, otherwise, it returns the boolean that the trader returns. OverBudget randomly selects a day and a world/bitstring, and returns 1 if the world/bitstring hasn't been ruled out by that day and the trader is over-budget relative to that world/bitstring, so the oracle call is testing whether there's any combination of worlds and days where the trader is over-budget.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 6: OverBudget(a,t,b), {0,1}∗×(N+)2→Δ{0,1}–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––n←rand(1,t−1)B←ToPref(Unif(l(n)))IF SAT(DP(n)∧B)=0     return 0ELSE IF −n+∑1≤i≤nBinSearch(0,eval(clip(B,l(i)),Approx(OI(i),l(i),d(t),TradeToBool(a,i))),d(t))BinSearch(2,eval(clip(B,l(i)),OI(i)),d(t))<−b+1     return 1ELSEreturn 0

This randomly picks a day and bitstring, and returns 1 if the bitstring is plausible (hasn't been ruled out by the deductive process) on that day, and the trader might be over-budget. Note that the strength of the approximation rises as a function of t. This means that at later days, more accurate retroactive estimates are used for the value of a trade on previous days. The mess of parentheses in the numerator essentially clips the bitstring to the appropriate length, and uses binary search to find the probability that (an approximation of (the distribution produced by conditioning the OI distribution on (the boolean outputted by the trader))) assigns to the bitstring.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Algorithm 7: TradeToBool(a,i), {0,1}∗×N+→ΔBl(i)–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––run (a,i) on a UTMIF UTM(a,i) doesn't halt in f(i) steps, or an oracle call to an algorithmother than eval(B,OI(i')) for i'≤i, B∈B∗l(i), is made,     return B⊤ELSEB←clip(ToBool(UTM(a,i)),l(i))IF SAT(B)=0     return B⊤ELSEreturn B

This takes a trader, and returns the null constraint if it times out or makes an "out of bounds" oracle call or outputs an unsatisfiable boolean. Otherwise, it takes the bitstring the trader outputs, interprets it as some boolean constraint, clips the constraint to the appropriate length, and outputs that constraint. Note that because algorithms can be randomized, this won't necessarily output the same boolean every time, so the distribution produced by Approx(Δ,L,D,TradeToBool(a,i)) should actually be thought of as a probabilistic mixture of conditional distributions.

To put all this together, the inductor selects a turing machine and a budget with the appropriate probability, and queries the turing machine about what boolean combination it thinks the true sequence of bits fulfills. As the turing machine can only report one boolean combination, mixtures of various booleans are implemented by the turing machine randomizing. If the past history of the turing machine has it possibly going over-budget on the next "trade", or violates conditions by running over time, or making illegal oracle calls, or providing an impossible-to-fulfill boolean, then the market just defaults to asking (an approximation of) itself about its own probabilities. Otherwise, the market outputs (an approximation of) its own probability distribution, conditioned on being in conformance with the trader. A bounded reflective oracle is exploited as an NP-oracle, and to effectively narrow down the probability of itself outputting a specific bitstring.

The interpretation of a trade from the logical induction paper was that you'd lose P(xi) dollars in order to acquire a share that would be worth 1 dollar in worlds x where xi=1,and 0 dollars if xi=0.

The interpretation of a trade in this setting is that a trader and a market spread 1 dollar amongst various bitstrings/worlds x,(which will be lost), giving a probability measure, and if the world is revealed to be x, the trader earns T(x)P(x) dollars in return. (where T(x) is the probability the trader assigned to x, and P(x) is the probability the market assigned to x). The value of a trader at time t in world x is then −t+∑1≤i≤tTi(x)Pi(x) .

Surprisingly enough, these two interpretations are actually equivalent! We can take (some) LI trades, and convert them into an OI trade with the same value in all worlds, and also do the reverse.

First, let's say our OI trader spreads its dollar according to P|B (it copies the market probability distribution, conditional on the world fulfilling the constraint B). Then, in all worlds x that don't fulfill the boolean, it will lose 1 dollar because it assigned 0 measure to x, and in all x that fulfill the boolean, because T(x)=P(x)P(B), it gets −1+1P(B) dollars in return. If we multiply these two values by P(B), we get −P(B) dollars when it fails, and 1−P(B) dollars when it succeeds, which is the exact same as a logical-induction trader buying one share in B. So, the value of the OI trader outputting the distribution P|B has the exact same value in all worlds as buying 1P(B) shares of B, which is the same as spending 1 dollar buying shares of B.

We interpret a randomized choice of a boolean as the trader T spending P(T=B) (P is an ordinary probability, not the market price of anything)of the probability mass on the conditional distribution P|B. This sums up to produce a probability measure, where T(x)=∑B∈BP(T=B)⋅(P|B)(x).

Generalizing a bit, all OI trades are equivalent to a LI trade where it spends T(x) dollars buying shares of the boolean corresponding to x, and spends 1 dollar in total.

Also, if the trader outputs the null boolean, T(x)=P(x), so the value of that trade in all worlds is 0 and equivalent to not buying or selling anything. This can be used to have the equivalent LI trade spend less than 1 dollar buying shares, if some portion of T is composed of P .

Going in the reverse direction, from a LI trade to a OI trade, is more difficult, because there is both buying and selling of shares. From before, buying 1 share of B is equivalent to the OI trader distributing P(B) of its probability mass according to the P|B distribution. It's slightly more involved to show, but selling 1 share of B has equivalent value in all worlds as distributing P(¬B) of the OI trader mass on the P|¬B distribution.

Now, if, at the end of translating the LI trade into OI terms, less than all of the probability mass has been spent on various conditional distributions, the rest of the probability mass can be spent on copying P since it has 0 value in all worlds. But what if the measure of the resulting trade sums to greater than 1? In that case, the extreme-worst-case value of the LI trade (all purchased shares were worth 0, all sold shares were worth 1)is below −1.

Therefore, if there's a LI trader with a circuit that's evaluatable in polynomial time (not just writable in poly-time), and it never makes a trade with extreme worst-case value below −1 (ie, it doesn't blow too much money at any given time), there's a corresponding OI trader! This includes almost all the traders from the proofs of the logical induction paper. However, the traders from 4.7.2 (conditionals on theories) and 4.5.10 (affine unbiasedness from feedback) need to be adapted.

Conjecture 1: There is an LI trader with extreme-worst-case value on each turn of −1 that exploits the market if it violates Affine Unbiasedness from Feedback, and another such trader that exploits the market if the market conditional on ψ is exploitable by a trader with extreme-worst-case value on each turn of −1.

If this conjecture holds, oracle induction would inherit all the nice properties of logical induction.

Exploitation is defined in the standard way, as the set of plausible values according to worlds consistent with the deductive process at time t , over all t, being bounded below and unbounded above.

Note that if we consider the market as composed of all the traders probability-distributions aggregated together, the payoff to everybody corresponds to taking everyone's money, and distributing that money back to everyone according to the fraction of the probability-mass in the winning world x that was contributed from their distribution.

Also note that because the OI-traders can implement very long and complex combinations of distributions by randomizing, OI-traders are able to make some trades that LI-traders can't, because they can output trades that are too long and complicated for the LI-trader to write down in polynomial time. An OI-trader, converted to an LI-trade, may even have purchases in every single boolean, which LI-traders definitely can't replicate.

Conjecture 2: There is an LI trader that runs in poly-time, and an Oracle Inductor that is inexploitable by poly-time adversaries, such that the trader exploits the Oracle Inductor, and there is an OI trader that runs in poly-time and a Logical Inductor that is inexploitable by poly-time adversaries, such that the trader exploits the Logical Inductor.

Just like logical inductors, there is no trader that runs in time less than f(t) that exploits the market. This is shown by the following theorems which are analogous to the theorems establishing that the logical induction algorithm is inexploitable.

Theorem 1: If a trader that can be simulated on a UTM in less than f(t) time exploits OI, there is some finite budget b such that the budgeted trader exploits OI.

Theorem 2: If a budgeted trader exploits OI, the supertrader exploits OI.

Theorem 3: The supertrader doesn't exploit OI.

The proofs will be deferred to the next post.

As for the strength of the bounded reflective oracle needed to guarantee that all oracle calls are well-defined, it is O(l(t)3+t3l(t)2+l(t)f(t)+f(t)logf(t)) . Again, the proof will be deferred to the next post.

Future Directions:

The two conjectures would be interesting to settle, although only the first is truly critical to showing that this is as powerful as a logical inductor.

The interpretation of trades as probability measures over bitstrings, with payoff given by the proportion of probability mass in the winning string contributed by a trader is useful, although the lack of an incentive for accurate belief reporting is slightly worrying, and prevents us from truly attributing beliefs to individual traders.

The close parallel between this and Reflective-Oracle Solomonoff Induction may prove to be fruitful, and potentially lead to a variant of AIXI in this setting, which may be ported back to logical induction.

APPENDIX: Auxiliary Algorithms

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ flip(p), Q≥0→Δ{0,1}––––––––––––––––––––––––––p←min(1,p)return output of a coin which returns 1 with probability p

In our model of computation, assume that this takes unit time.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ Unif(n), N→Δ{0,1}n–––––––––––––––––––––––––x←λFOR i from 1 to n     x←x+flip(12)return x

This generates a random bitstring of length n.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ rand(m,n), (N+)2→ΔN+–––––––––––––––––––––––––––––FOR i from m to n     IF flip(1n−i+1)=1          return i          break

This randomly selects an integer in [m,n], with equal probability for each integer.

n removed}\\ \textbf{return } B">¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ clip(B,n), B∗×N+→B∗–––––––––––––––––––––––––––B←B with all variables with an index >n removedreturn B

This is used to ensure that the traders output boolean constraints that aren't about unreasonably distant digits.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ eval(B,x), B∗×{0,1}∗→Δ{0,1}–––––––––––––––––––––––––––––––––––––n←max index of a variable in Bx←x+Unif(max(0, n-|x|))return B(x)

This applies a boolean circuit to a bitstring, and pads it with randomly selected bits if the bitstring is too short. Note that since traders are allowed to call the oracle on this function applied to the oracle inductor, this implicitly assigns 50% probability to all bits that are too distant for the oracle inductor to have thought about them yet.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ToPref(x), {0,1}∗→B––––––––––––––––––––––––––B←⋀1≤i≤|x|yi=xireturn B

This converts a bitstring to a boolean which requires that the prefix of a bitstring equals x. The equality should be understood as ¬yi if xi=0, and yi if xi=1.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ SAT(B), B∗→{0,1}––––––––––––––––––––––––n←max index of a variable in Breturn query(eval(B, λ),2−(n+1))

This uses the oracle as a SAT-solver, by using eval(B, λ) to randomly generate a bitstring of length n. If there is a bitstring which fulfills the boolean, there is a 2−n probability of generating that bitstring, so the oracle can perfectly discriminate between satisfiable and unsatisfiable booleans.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ToBool(x), {0,1}∗→B∗–––––––––––––––––––––––––––

This just turns a bitstring into the boolean it encodes, given some efficient encoding scheme.

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ DP(n), N+→Bl(n)–––––––––––––––––––––

This is the deductive process, a blackbox deterministic algorithm which outputs booleans of index at most l(n) in at most max(l(n),f(n)) time s.t ∀n∀i<n:DP(n)∧DP(i)=DP(n) (ie, the constraints on the true environment get more stringent over time but always stay satisfiable).

Discuss

### Double-Dipping in Dunning--Kruger

28 ноября, 2018 - 06:46
Published on Wed Nov 28 2018 03:40:58 GMT+0000 (UTC)

Originally posted at sandymaguire.me.

I have to get you to drop modesty and say to yourself, "Yes, I would like to do first-class work." Our society frowns on people who set out to do really good work. You're not supposed to; luck is supposed to descend on you and you do great things by chance. Well, that's a kind of dumb thing to say. I say, why shouldn't you set out to do something significant. You don't have to tell other people, but shouldn't you say to yourself, "Yes, I would like to do something significant."

Richard Hamming

I want to talk about impostor syndrome today. If you're unfamiliar with the term, it's this phenomenon where people discount their achievements and feel like frauds---like at any moment, others will realize that they don't belong. People with impostor syndrome are convinced that they've somehow pulled the wool over others' eyes, and have conned their way into their jobs or social circles. And it's very, very common.

On the other end of the spectrum, there's another phenomenon known as Dunning--Kruger. This one is widely reported in the media as "the average person believes they are better than 70% of drivers." They can't all be right---by definition, 50% of people must be worse than the median driver. This too is very common; it seems to hold true in most positive-feeling, every-day domains.

While this is indeed supported by Dunning--Kruger's data, the paper itself focuses on the fact that "the worst performers in a task are often so bad that they have no idea." For example, the least funny people are so tragically unfunny that they wouldn't know humor if it bit them. As a result, unfunny people are entirely blind to their lack of humor:

Participants scoring in the bottom quartile on our humor test not only overestimated their percentile ranking, but they overestimated it by 46 percentile points.

A less well-known finding of Dunning--Kruger is that the best performers will systematically underestimate how good they are, by about 15 percentile points. The proposed reason is that they found the task to be easy, assume others must have also found it easy, and go from there. In other words, top performers are so good that they don't notice many challenges.

It's unfortunate that Dunning--Kruger has been popularized as "most people think they are better than they are." Not only do high-performers already underestimate themselves, but those who know about Dunning--Kruger in its popularized form are likely to double-dip, and further adjust down to compensate for this. For example, if you are in fact in the top 90% for some skill, due to Dunning--Kruger you will likely estimate yourself to be at 75%. But, if you know that people routinely overestimate their abilities by 20%, you might drop your estimate down to 55% in compensation---significantly lower than your true skill level.

If this is true, it would suggest some of the world's best people will estimate themselves to be worse than the self-evaluations of the worlds' worst. The takeaway is this: if you're the kind of person who worries about statistics and Dunning--Kruger in the first place, you're already way above average and clearly have the necessary meta-cognition to not fall victim to such things. From now on, unless you have evidence that you're particularly bad at something, I want you to assume that you're 15 percentile points higher than you would otherwise estimate.

The mathematician Richard Hamming is said to have often ruffled feathers by asking his colleagues "what's the most important problem in your field, and why aren't you working on it?" Most people, I'd suspect, would say that the most important problems are too hard to be tackled by the likes of them. That it would take minds greater minds than theirs to make progress on such things. They give up before having even tried.

Instead, the smartest people I know join Google or go work at B2B startups and are simultaneously bored in their day jobs, feel like they're frauds while they're there, and don't have enough energy to work on personal projects after hours. But at least they're making wicked-big paychecks for themselves. And for their less-qualified leadership.

The best minds of my generation are thinking about how to make people click ads.

Jeff Hammerbacher

Here's an interesting thought experiment. If you randomly swapped places with someone for a week---you doing their job and they doing yours---how would it turn out? If you're a competent programmer with passable social skills, I suspect you would be a lot more successful in that week than your replacement. Most things just aren't that hard, and a good percentage of the people doing those jobs are phoning it in anyways.

The bar on competency is tragically low. And yet the world revolves.

If you agree with this line of reasoning, it means the world is just oozing with potential, ready for the taking. Most of the world's most competent people are unaware of just how good they are. Most things really aren't as hard as getting through that graph-theory class you took, and don't take nearly as much effort. The world is being run by people who are too incompetent to know it; people who are only in power because they're the ones who showed up, and because showing up is most of the battle.

Which leads us to the inescapable conclusion that this world we live in is particularly amenable to change. That if you're willing to trust in your own instinct and tackle hard-seeming problems, you're going to experience literally unbelievable amounts of success. Everyone else is deferring to better minds. So be the change you want to see in the world, because we've been waiting for you.

Discuss

### How democracy ends: a review and reevaluation

27 ноября, 2018 - 13:50
Published on Tue Nov 27 2018 10:50:01 GMT+0000 (UTC)

Last month I attended a talk by David Runciman, the author of a recent book called How Democracy Ends. I was prepared for outrage-stirring and pearl-clutching, but was pleasantly surprised by the quality of his arguments, which I’ve summarised below, along with my own thoughts on these issues. Note, however, that I haven’t read the book itself, and so can’t be confident that I’ve portrayed his ideas faithfully.

Which lessons from history?

Many people have compared recent populist movements with the stirrings of fascism a century ago. And indeed it’s true that a lot of similar rhetoric being thrown around. But Runciman argues that this is one of the least interesting comparisons to be made between these two times. Some things that would be much more surprising to a denizen of the early 20th century:
• Massive transformations in societal demographics
• Very few changes in our institutions
The last of those is particularly surprising in light of the first two. Parliamentary democracies in the Anglophone world have been governed by the same institutions - and in some cases, even the same parties - for centuries. Continental European democracies were more disrupted by World War 2, but have been very stable since then, despite the world changing in many ways overall. That’s true even for institutions that are probably harmful - consider the persistence of the electoral college in the US, the House of Lords in the UK, the monarchy in Australia (despite their ardent republicanism movement), first-past-the-post voting systems in many countries, and so on. (In fact, Runciman speculates that Americans voted for Trump partly because of how much confidence they had in the durability of their institutions - a confidence which so far seems to have been well-founded.)
So history gives us pretty good evidence for the robustness of democratic institutions to the normal flow of time - but not to exceptional circumstances. In fact, an inability to make necessary changes may well render them more fragile in the face of sharp opposition. If and when pressure mounts, are they going to snap like the democratic institutions of 1930s Europe did? Runciman argues that they won’t, because of the nature of the demographic changes the West has seen. There are three particularly important axes of variation:
• Wealth: the average person is many times wealthier than they were a century ago, and the middle class is much larger.
• Education: we’ve gone from only a few percent of people getting tertiary education (and many of the remainder not finishing high school) to nearly 50% of young people being in university in many western countries.
• Age: in the last century, the median age has risen by over ten years in most western countries.
These three factors are some of the most powerful predictors of behaviour that we have, and so we should take them into account when judging the likelihood of democratic failure. For instance, wealthier and more educated people are much less likely to support populist or extremist groups. But Runciman focuses the most on age, which I think is the correct approach. Wealth is relative - even if people are actually much richer, they can feel poor and angry after a recession (as they did in the 1930s, despite still being many times wealthier than almost all their ancestors). Education may just be correlated with other factors, rather than the actual cause of lasting differences in mindset. But there are pretty clear biological and social reasons to think that the behaviour and priorities of older people are robustly and significantly different from those of younger people. You need only look at the age distribution of violent crime, for example, to see how strong this effect is (although it may have lessened somewhat over recent years, since marriage rates are declining and single men cause more trouble).

In short: the failures of democracy in the 30’s were based on large populations of young men who could be mobilised in anger by militaristic leaders - see for instance the brownshirts in Germany and blackshirts in Italy. But that’s not what the failure of democracy in our time would look like, because that group of people is much smaller now. For better or worse, older populations are less disruptive and more complacent. To see where that might lead, consider Japan: an ageing population which can’t drag itself out of economic torpor, resistant to immigration, dominated for decades by one political party, betting the country's future on using robots to replace the missing workforce.

During a Q&A after the talk, I pointed out that Japan is very different to western countries in its particularly strong culture of social conformity and stability. Age trends notwithstanding, I have much more difficulty imaging the same quiet tolerance of slow decline occurring in the US or UK. So, given that government institutions are very difficult to change, where will people direct their frustration if lacklustre growth continues in the coming decades?

In response, Runciman raised two possibilities. Firstly, that people will “go around their governments”, finding new domains in which politics is less relevant. We could call this the “Wild West” possibility. Of course, there’s no longer an uncolonised West to explore - but there is the internet, which isn’t democratically run and probably never will be. We already see fewer young men working full-time, because the alternative of spending most of their time gaming has become more appealing. As virtual worlds become even more immersive, it seems plausible that people will begin to care much less about political issues.

One problem with the idea of “going around governments”, though, is that governments are just much bigger now than they used to be. And as technology companies profit from the growing role of the internet, there’ll be pressure for governments to intervene even more to fight inequality. So a second option is a more Chinese approach, with increasingly autocratic Western governments exerting heavy pressure on (and perhaps eventually control over) tech companies.

A more optimistic possibility is for the internet to make democracy more accountable. Runciman invites us to consider Plato’s original argument against direct democracy (in which people vote on individual issues) - that it would lead to rule by the poor, the ignorant, and the young, all of whom necessarily outnumber the wealthy, wise and old. This argument turned out not to apply for representative democracy, since elected representatives tend to be wealthy, educated and old despite their constituents being the opposite. But now it’s inapplicable for a different reason - that although our representatives haven’t changed much, the rest of us are starting to look much more like them. So maybe it’ll become feasible to implement a more direct democracy, facilitated by the internet and modern communication technology. (This still seems like a bad idea to me, though.)

Base rates and complacency

The last section was a little speculative, so let’s take a step back and think about how to make predictions about these sorts of events in general. Runciman’s analysis above provides good reasons not to draw a specific parallel between the rise of fascism last century and recent political events. But it would take extraordinary evidence to exempt us from extrapolating broader historical trends, in particular the fact that states always collapse eventually, and that the base rate for coups and other forms of internal strife is fairly high. Are the extraordinary changes we’ve seen since the industrial revolution sufficient to justify belief in our exceptionalism?

It’s true that since World War 2, almost no wealthy democracies have descended into autocracy or chaos (Turkey and Ireland being two edge cases). It’s also true that, despite widespread political disillusionment, norms against violence have held to a remarkably large degree. But drawing judgements from the historical period “since World War 2” is a classic case of the Texan Sharpshooter’s Fallacy (and possibly also anthropic bias?). In fact, this recent lull should make us skeptical about our ability to evaluate the question objectively, because people are in general very bad at anticipating extreme events that haven't occurred in living memory. I think this is true despite these possibilities being discussed in the media. For example, while there’s a lot of talk about Trump being a potential autocrat, few Americans are responding by stockpiling food or investing in foreign currencies or emigrating. This suggests that hostility towards Trump is driven primarily by partisan politics, rather than genuine concern about democratic collapse. An additional data point in favour of this hypothesis is how easily the Republican political establishment has fallen in line.

Another key question which isn’t often discussed is the nature of modern military culture. Historically, this has been a major factor affecting governmental stability. But, apart from vague intuitions about modern militaries being fairly placid, I find myself remarkably ignorant on this subject, and suspect others are as well. What facts do you know about your country's military, about the character of its commanders or the distribution of power within it, that make you confident that it won't launch a coup if, for example, one of its generals is narrowly defeated in a disputed presidential election (as in Gore vs Bush)? Note that military demographics haven’t changed nearly as much as those of our societies overall. They’re still primarily composed of young working-class men without degrees - a group that’s unusually angry about today’s politics. So while I am pretty convinced by Runciman’s arguments, this is one way in which they may not apply. Also consider that warfare is much less hands-on than it used to be, and firepower much more centrally concentrated, both of which make coups easier.

So far I've looked at societal collapse from a political point of view. But many historical transitions were precipitated by natural disasters or diseases. See, for instance, the Mayan collapse, or the Little Ice Age, or the Black Death. Today, we're much safer from natural disasters, both because of our technology and because of the scale of our societies - few people live in countries in which the majority of a population can be struck by a single natural disaster. Similarly, we're also much safer from natural diseases. But we're much more vulnerable to severe man-made disasters, which I think are very likely to occur over the next century. Since this post is focused on political collapse as a distinct phenomenon to technological disaster, I won’t discuss extreme risks from technology here. However, it's worthwhile to look at the ways in which smaller technological harms might exacerbate other trends. AI-caused unemployment and the more general trend towards bimodal outcomes in western countries are likely to cause social unrest. Meanwhile terrorism is going to become much easier - consider being able to 3D-print assassin drones running facial recognition software, for instance. And due to antibiotic overuse, it's likely that our safety from disease will decline over the coming years (even without the additional danger of bioterrorism using engineered diseases). Finally, I think we're much softer than we used to be - it won't take nearly as much danger to disrupt a country. Runciman is probably correct that we’re less susceptible to a collapse into authoritarianism than we were in the past - but the same trends driving that change are also pushing us towards new reasons to worry.

In addition to the talk by Runciman, this post was inspired by discussions with my friends Todor and Julio, and benefited from their feedback.

Discuss

### MIRI’s 2018 Fundraiser

27 ноября, 2018 - 08:30
Published on Tue Nov 27 2018 05:30:11 GMT+0000 (UTC)

(Crossposted from the MIRI blog)

MIRI is running its 2018 fundraiser through December 31! Our progress:

Donate now

MIRI is a math/CS research nonprofit with a mission of maximizing the potential humanitarian benefit of smarter-than-human artificial intelligence. You can learn more about the kind of work we do in “Ensuring Smarter-Than-Human Intelligence Has A Positive Outcome” and “Embedded Agency.”

Our funding targets this year are based on a goal of raising enough in 2018 to match our “business-as-usual” budget next year. We view “make enough each year to pay for the next year” as a good heuristic for MIRI, given that we’re a quickly growing nonprofit with a healthy level of reserves and a budget dominated by researcher salaries.

We focus on business-as-usual spending in order to factor out the (likely very large) cost of moving to new spaces in the next couple of years as we continue to grow, which introduces a high amount of variance to the model.[1]

My current model for our (business-as-usual, outlier-free) 2019 spending ranges from $4.4M to$5.5M, with a point estimate of $4.8M—up from$3.5M this year and $2.1M in 2017. The model takes as input estimated ranges for all our major spending categories, but the overwhelming majority of the variance comes from the number of staff we’ll add to the team.[2] In the mainline scenario, our 2019 budget breakdown looks roughly like this: In this scenario, we currently have ~1.5 years of reserves on hand. Since we’ve raised ~$4.3M between Jan. 1 and Nov. 25,[3] our two targets are:

• Target 1 ($500k), representing the difference between what we’ve raised so far this year, and our point estimate for business-as-usual spending next year. • Target 2 ($1.2M), what’s needed for our funding streams to keep pace with our growth toward the upper end of our projections.[4]

Below, we’ll give an overview of (1) what's new at MIRI, emphasizing our focus on hiring programmers and other technical staff; (2) our room for more funding after the major support we received last year; and (3) donation matching opportunities, including multiple significant opportunities tomorrow (Giving Tuesday)!

We’ve released a string of new posts on our recent activities and strategy:

• 2018 Update: Our New Research Directions discusses the new set of research directions we’ve been ramping up over the last two years, how they relate to our Agent Foundations research and our goal of “deconfusion,” and why we’ve adopted a “nondisclosed-by-default” policy for this research.
• Embedded Agency describes our Agent Foundations research agenda as different angles of attack on a single central difficulty: we don’t know how to characterize good reasoning and decision-making for agents embedded in their environment.
• Summer MIRI Updates discusses new hires, new donations and grants, and new programs we’ve been running to recruit researcher staff and grow the total pool of AI alignment researchers.

Our 2018 Update also discusses the much wider pool of engineers and computer scientists we’re now trying to recruit, and the much larger total number of people we’re trying to add to the team in the near future:

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.
In practice, this means: people who natively think in math or code, who take seriously the problem of becoming less confused about AI alignment (quickly!), and who are generally capable. In particular, we’re looking for high-end Google programmer levels of capability; you don’t need a 1-in-a-million test score or a halo of destiny. You also don’t need a PhD, explicit ML background, or even prior research experience.

If the above might be you, and the idea of working at MIRI appeals to you, I suggest sending in a job application or shooting your questions at Buck Shlegeris, a researcher at MIRI who’s been helping with our recruiting.

We’re also hiring for Agent Foundations roles, though at a much slower pace. For those roles, we recommend interacting with us and other people hammering on AI alignment problems on Less Wrong and the AI Alignment Forum, or at local MIRIx groups. A great place to start developing intuitions for these problems is Scott Garrabrant’s recently released fixed point exercises, or various resources on the MIRI Research Guide. We then generally hire Agent Foundations researchers from people we’ve gotten to know through the above channels, visits, and events like the AI Summer Fellows program.

2. Room for more funding

As noted above, the biggest sources of uncertainty in our 2018 budget estimates are about how many research staff we hire, and how much we spend on moving to new offices.

In our 2017 fundraiser, we set a goal of hiring 10 new research staff in 2018–2019. So far, we’re up two research staff, with enough promising candidates in the pipeline that I still consider 10 a doable (though ambitious) goal.

Following the amazing show of support we received from donors last year (and continuing into 2018), we had significantly more funds than we anticipated, and we found more ways to usefully spend it than we expected. In particular, we’ve been able to translate the “bonus” support we received in 2017 into broadening the scope of our recruiting efforts. As a consequence, our 2018 spending, which will come in at around $3.5M, actually matches the point estimate I gave in 2017 for our 2019 budget, rather than my prediction for 2018—a large step up from what I predicted, and an even larger step from last year’s budget of$2.1M.[5]

Our two fundraiser goals, Target 1 ($500k) and Target 2 ($1.2M), correspond to the budgetary needs we can easily predict and account for. Our 2018 went much better as a result of donors’ greatly increased support in 2017, and it’s possible that we’re in a similar situation today, though I’m not confident that this is the case.

Concretely, some ways that our decision-making changed as a result of the amazing support we saw were:

• We spent more on running all-expenses-paid AI Risk for Computer Scientists workshops. We ran the first such workshop in February, and saw a lot of value in it as a venue for people with relevant technical experience to start reasoning more about AI risk. Since then, we’ve run another seven events in 2018, with more planned for 2019.As hoped, these workshops have also generated interest in joining MIRI and other AI safety research teams. This has resulted in one full-time MIRI research staff hire, and on the order of ten candidates with good prospects of joining full-time in 2019, including two recent interns.
• We’ve been more consistently willing and able to pay competitive salaries for top technical talent. As a special case, we made a research hire who is more senior than the research staff we normally recruit, and who we’ll be announcing soon.
• We raised salaries for some existing staff members. We have a very committed staff, and some staff at MIRI had previously asked for fairly low salaries in order to help keep MIRI’s organizational costs lower. The inconvenience this caused staff members doesn’t make much sense at our current organizational size, both in terms of our team’s productivity and in terms of their well-being.
• We ran a summer research internship program, on a larger scale than we otherwise would have.
• As we’ve considered options for new office space that can accommodate our expansion, we’ve been able to filter less on price relative to fit. We’ve also been able to spend more on renovations that we expect to produce a working environment where our researchers can do their work with fewer distractions or inconveniences.

2018 brought a positive update about our ability to cost-effectively convert surprise funding increases into (what look from our perspective like) very high-value actions, and the above list hopefully helps clarify what that can look like in practice. We can’t promise to be able to repeat that in 2019, given an overshoot in this fundraiser, but we have reason for optimism.

3. Matching opportunities

Coinciding with this fundraiser, MIRI is included in three matching campaigns, offering leverage on donations in different ways—and at different rates—all focused around Giving Tuesday, November 27.

1. Got ETH? Support MIRI in WeTrust’s Spring ETH matching campaign through Nov 27 (23:59 PST). This campaign is the first-ever implementation of Glen Weyl, Zoë Hitzig, and Vitalik Buterin’s Liberal Radicalism model for non-profit funding matching. Unlike most matching campaigns which match exclusively based on total amount donated, this campaign matches in a way that heavily factors in the number of unique donors when divvying out the matching pool of 500ETH ( = ~$53,000). If you have ETH, please consider supporting us at https://spring.wetrust.io/miri. Note that donations equal to or slightly more than the minimum qualifying amount of 0.1 ETH help us by boosting the overall match rate for all other donations made—at time of posting, each donation of .1ETH (~$11) by a new donor is adding 4.5 ETH ($480) to MIRI’s matching total, a 4300% match!—while larger amounts let us capitalize on the resultant increased match rate.[6] 2. Can you donate online at 8:00am EST tomorrow (Giving Tuesday, November 27)? If so, go to MIRI’s Fundraiser Page and support us as part of Facebook/PayPal$7M Giving Tuesday matching event. Donations made before the $7M matching pool is exhausted will be matched 1:1, up to a maximum of$250,000 per organization, and a limit of $20,000 per donor. In last year’s event, it took only 86 seconds for the matching pool to be exhausted, and it’s likely to happen even faster this year. Colm has put together an instruction page to maximize the chances of your donation being matched. 3. For everyone, through December 29 … Professional poker players Martin Crowley, Tom Crowley, and Dan Smith, in partnership with Raising for Effective Giving, have just announced a$1 million Double-Up Drive and included MIRI among a list of “ten highly impactful charities” to which they will match donations. MIRI are coupled with Raising for Effective Giving in the Long Term Cause Pool, which has its own matching pool of $200,000. The Drive kicks off at 14:00 PST tomorrow, Giving Tuesday, AFTER which donations can be made either on doubleupdrive.com, or donating directly on MIRI’s website and sending your donation receipt to receiptsforcharity@gmail.com. We recommend the latter, particularly for US tax residents (MIRI is a 501(c)(3) organization). Donate now 1. That is, our business-as-usual model tries to remove one-time outlier costs so that it’s easier to see what “the new normal” is in MIRI’s spending and think about our long-term growth curve. 2. This estimation is fairly rough and uncertain. One weakness of this model is that it treats the inputs as though they were independent, which is not always the case. I also didn’t try to account for the fact that we’re likely to spend more in worlds where we see more fundraising success. However, a sensitivity analysis on the final output showed that the overwhelming majority of the uncertainty in this estimate comes from how many research staff we hire, which matches my expectations and suggests that the model is doing a decent job of tracking the intuitively important variables. I also ended up with similar targets when I ran the numbers on our funding status in other ways and when I considered different funding scenarios. 3. This excludes earmarked funding for AI Impacts, an independent research group that’s institutionally housed at MIRI. 4. We could also think in terms of a “Target 0.5” of$100k in order to hit the bottom of the range, $4.4M. However, I worried that a$100k target would be misleading given that we’re thinking in terms of a $4.4–5.5M budget. 5. Quoting our 2017 fundraiser post: “If we succeed, our point estimate is that our 2018 budget will be$2.8M and our 2019 budget will be $3.5M, up from roughly$1.9M in 2017.” The $1.9M figure was an estimate from before 2017 had ended. We’ve now revised this figure to$2.1M, which happens to bring it in line with our 2016 point estimate for how much we’d spend in 2017.
6. WeTrust utilizes a number of tactics to reduce/deter Sybil attacks including requiring (fairly easily setup) user accounts, minimum donation amount, monitoring history of ETH addresses and activity of addresses.

Discuss