# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 9 минут 52 секунды назад

### Approval-directed agents: "implementation" details

24 ноября, 2018 - 02:26
Published on Fri Nov 23 2018 23:26:08 GMT+0000 (UTC)

Follow-up to approval-directed agents: overview.

So far I’ve talked about approval-direction imprecisely. Maybe I’m talking about something incoherent, which has desirable properties only in the same sense as a four-sided triangle—vacuously. I won’t really be able to dispel this concern here, but I’ll at least take some steps.

How do you define approval?

Eventually you would have to actually write code implementing approval-directed behavior. What might that code look like? I want to set aside the problem “what does a sophisticated AI look like?” since I obviously don’t know. So let’s suppose we had some black box that did all of the hard work. I’ll consider a few cases for what the black box does, ranging from “easy to work with” to “very hard to work with.”

(Note: I now believe that we can target AI systems trained (nearly) end-to-end with gradient descent, which is most similar to “learning from examples.”)

Natural language

As an easy case, suppose we have a natural language question-answering system, which can assign a probability to any natural language sentence. In this case, we ask the question:

“Suppose that Hugh understood the current situation, was asked `on a scale from 0 to 1, how good is the action a?’ and was given a few hours to determine his answer. What would his answer be?”

We then loop over each action a and take the action with the highest expected answer.

In this framework, it is easy to replace Hugh by a more powerful overseer—all you have to do is specify the replacement in natural language.

“Math intuition module”

At an opposite extreme, suppose we have a “math intuition module,” a system which can assign probabilities only to perfectly precise statements—perhaps of the form “algorithm A returns output y on input x.”

I’ve written about defining “approval upon reflection” algorithmically (see here, here). These definition can be used to define approval-directed behavior completely precisely. I’m pretty hesitant about these definitions, but I do think it is promising that we can get traction even in such an extreme case.

In reality, I expect the situation to be somewhere in between the simple case of natural language and the hard case of mathematical rigor. Natural language is the case where we share all of our concepts with our machines, while mathematics is the case where we share only the most primitive concepts. In reality, I expect we will share some but not all of our concepts, with varying degrees of robustness. To the extent that approval-directed decisions are robust to imprecision, we can safely use some more complicated concepts, rather than trying to define what we care about in terms of logical primitives.

Learning from examples

In an even harder case, suppose we have a function learner which can take some labelled examples f(x) = y and then predict a new value f(x’). In this case we have to define “Hugh’s approval” directly via examples. I feel less comfortable with this case, but I’ll take a shot anyway.

In this case, our approval-directed agent Arthur maintains a probabilistic model over sequences observation[T] and approval[T](a). At each step T, Arthur selects the action a maximizing approval[T](a). Then the timer T is incremented, and Arthur records observation[T+1] from his sensors. Optionally, Hugh might specify a value approval[t](a) for any time t and any action a’. Then Arthur updates his models, and the process continues.

Like AIXI, if Arthur is clever enough he eventually learns that approval[T](a)refers to whatever Hugh will retroactively input. But unlike AIXI, Arthur will make no effort to manipulate these judgments. Instead he takes the action maximizing his expectation of approval[T] — i.e., his prediction about what Hugh will say in the future, if Hugh says anything at all. (This depends on his self-predictions, since what Hugh does in the future depends on what Arthur does now.)

At any rate, this is quite a lot better than AIXI, and it might turn out fine if you exercise appropriate caution. I wouldn’t want to use it in a high-stakes situation, but I think that it is a promising idea and that there are many natural directions for improvement. For example, we could provide further facts about approval (beyond example values), interpolating continuously between learning from examples and using an explicit definition of the approval function. More ambitiously, we could implement “approval-directed learning,” preventing it from learning complicated undesired concepts.

How should Hugh rate?

So far I’ve been very vague about what Hugh should actually do when rating an action. But the approval-directed behavior depends on how Hugh decides to administer approval. How should Hugh decide?

If Hugh expects action a to yield better consequences than action b, then he should give action a a higher rating than action b. In simple environments he can simply pick the best action, give it a rating of 1, and give the other options a rating of 0.

If Arthur is so much smarter than Hugh that he knows exactly what Hugh will say, then we might as well stop here. In this case, approval-direction amounts to Arthur doing exactly what Hugh instructs: “the minimum of Arthur’s capabilities and Hugh’s capabilities” is equal to “Hugh’s capabilities.”

But most of the time, Arthur won’t be able to tell exactly what Hugh will say. The numerical scale between 0 and 1 exists to accomodate Arthur’s uncertainty.

To illustrate the possible problems, suppose that Arthur is considering whether to drive across a bridge that may or may not collapse. Arthur thinks the bridge will collapse with 1% probability. But Arthur also think that Hugh knows for sure whether or not the bridge will collapse. If Hugh always assigned the optimal action a rating of 1 and every other action a rating of 0, then Arthur would take the action that was most likely to be optimal — driving across the bridge.

Hugh should have done one of two things:

• Give a bad rating for risky behavior. Hugh should give Arthur a high rating only if he drives across the bridge and knows that it is safe. In general, give a rating of 1 to the best action ex ante.
• Assign a very bad rating to incorrectly driving across the bridge, and only a small penalty for being too cautious. In general, give ratings that reflect the utilities of possible outcomes—to the extent you know them.

Probably Hugh should do both. This is easier if Hugh understands what Arthur is thinking and why, and what range of possibilities Arthur is considering.

Other details

I am leaving out many other important details in the interest of brevity. For example:

• In order to make these evaluations Hugh might want to understand what Arthur is thinking and why. This might be accomplished by giving Hugh enough time and resources to understand Arthur’s thoughts; or by letting different instances of Hugh “communicate” to keep track of what is going on as Arthur’s thoughts evolve; or by ensuring that Arthur’s thoughts remains comprehensible to Hugh (perhaps by using approval-directed behavior at a lower level, and only approving of internal changes that can be rendered comprehensible).
• It is best if Hugh optimizes his ratings to ensure the system remains robust. For example, in high stakes settings, Hugh should sometimes make Arthur consult the real Hugh to decide how to proceed—even if Arthur correctly knows what Hugh wants. This ensures that Arthur will seek guidance when he incorrectly believes that he knows what Hugh wants.

…and so on. The details I have included should be considered illustrative at best. (I don’t want anyone to come away with a false sense of precision.)

Problems

It would be sloppy to end the post without a sampling of possible pitfalls. For the most part these problems have more severe analogs for goal-directed agents, but it’s still wise to keep them in mind when thinking about approval-directed agents in the context of AI safety.

My biggest concerns

I have three big concerns with approval-directed agents, which are my priorities for follow-up research:

• Is an approval-directed agent generally as useful as a goal-directed agent, or does this require the overseer to be (extremely) powerful? Based on the ideas in this post, I am cautiously optimistic.
• Can we actually define approval-directed agents by examples, or do they already need a shared vocabulary with their programmers? I am again cautiously optimistic.
• Is it realistic to build an intelligent approval-directed agent without introducing goal-directed behavior internally? I think this is probably the most important follow-up question. I would guess that the answer will be “it depends on how AI plays out,” but we can at least get insight by addressing the question in a variety of concrete scenarios.
Motivational changes for the overseer

“What would I say if I thought for a very long time?” might have a surprising answer. The very process of thinking harder, or of finding myself in a thought experiment, might alter my priorities. I may care less about the real world, or may become convinced that I am living in a simulation.

This is a particularly severe problem for my proposed implementation of indirect normativity, which involves a truly outlandish process of reflection. It’s still a possible problem for defining approval-direction, but I think it is much less severe.

“What I would say after a few hours,” is close enough to real life that I wouldn’t expect my thought process to diverge too far from reality, either in values or beliefs. Short time periods are much easier to predict, and give less time to explore completely unanticipated lines of thought. In practice, I suspect we can also define something like “what I would say after a few hours of sitting at my desk under completely normal conditions,” which looks particularly innocuous.

Over time we will build more powerful AI’s with more powerful (and perhaps more exotic) overseers, but making these changes gradually is much easier than making them all at once: small changes are more predictable, and each successive change can be made with the help of increasingly powerful assistants.

Treacherous turn

If Hugh inadvertently specifies the wrong overseer, then the resulting agent might be motivated to deceive him. Any rational overseer will be motivated to approve of actions that look reasonable to Hugh. If they don’t, Hugh will notice the problem and fix the bug, and the original overseer will lose their influence over the world.

This doesn’t seem like a big deal—a failed attempt to specify “Hugh” probably won’t inadvertently specify a different Hugh-level intelligence, it will probably fail innocuously.

There are some possible exceptions, which mostly seem quite obscure but may be worth having in mind. The learning-from-examples protocol seems particularly likely to have problems. For example:

• Someone other than Hugh might be able to enter training data for approval[T](a). Depending on how Arthur is defined, these examples might influence Arthur’s behavior as soon as Arthur expects them to appear. In the most pathological case, these changes in Arthur’s behavior might have been the very reason that someone had the opportunity to enter fraudulent training data.
• Arthur could accept the motivated simulation argument, believing himself to be in a simulation at the whim of a simulator attempting to manipulate his behavior.
• The simplest explanation for Hugh’s judgments may be a simple program motivated to “mimic” the series approval[T] and observation[T] in order to influence Arthur.
Ignorance

An approval-directed agent may not be able to figure out what I approve of.

I’m skeptical that this is a serious problem. It falls under the range of predictive problems I’d expect a sophisticated AI to be good at. So it’s a standard objective for AI research, and AI’s that can’t make such predictions probably have significantly sub-human ability to act in the world. Moreover, even a fairly weak reasoner can learn generalizations like “actions that lead to Hugh getting candy, tend to be approved of” or “actions that take control away from Hugh, tend to be disapproved of.”

If there is a problem, it doesn’t seem like a serious one. Straightforward misunderstandings will lead to an agent that is inert rather than actively malicious (see the “Fail gracefully” section). And deep misunderstandings can be avoided, by Hugh approving of the decision “consult Hugh.”

Conclusion

Making decisions by asking “what action would your owner most approve of?” may be more robust than asking “what outcome would your owner most approve of?” Choosing actions directly has limitations, but these might be overcome by a careful implementation.

More generally, the focus on achieving safe goal-directed behavior may have partially obscured the larger purpose of the AI safety community, which should be achieving safe and useful behavior. It may turn out that goal-directed behavior really is inevitable or irreplaceable, but the case has not yet been settled.

This post was originally posted here.

Tomorrow's AI Alignment Forum sequences post will be 'Fixed Point Discussion' by Scott Garrabrant, in the sequence 'Fixed Points'.

The next posts in this sequence will be 'Approval directed bootstrapping' and 'Humans consulting HCH', two short posts which will come out on Sunday 25th November.

Discuss

### What if people simply forecasted your future choices?

23 ноября, 2018 - 13:52
Published on Fri Nov 23 2018 10:52:25 GMT+0000 (UTC)

tldr: If you could have a team of smart forecasters predicting your future decisions & actions, they would likely improve them in accordance with your epistemology. This is a very broad method that's less ideal than more reductionist approaches for specific things, but possibly simpler to implement and likelier to be accepted by decision makers with complex motivations.

Background

The standard way of finding questions to forecast involves a lot of work. As Zvi noted, questions should be very well-defined, and coming up with interesting yet specific questions takes considerable consideration.

One overarching question is how predictions can be used to drive decision making. One recommendation (one version called "Decision Markets") often comes down to estimating future parameters, conditional on each of a set of choices. Another option is to have expert evaluators probabilistically evaluate each option, and have predictors predict their evaluations (Prediction-Augmented Evaluations.)

Proposal

One prediction proposal I suggest is to have predictors simply predict the future actions & decisions of agents. I temporarily call this an "action prediction system." The evaluation process (the choosing process) would need to happen anyway, and the question becomes very simple. This may seem too basic to be useful, but I think it may be a lot better than at least I initially expected.

Say I'm trying to decide what laptop I should purchase. I could have some predictors predicting which one I'll decide on. In the beginning, the prediction aggregation shows that I have an 90% chance of choosing one option. While I really would like to be the kind of person who purchases a Lenovo with Linux, I'll probably wind up buying another Macbook. The predictors may realize that I typically check Amazon reviews and the Wirecutter for research, and they have a decent idea of what I'll find when I eventually do.

It's not clear to me how to best focus predictors on specific uncertain actions I may take. It seems like I would want to ask them mostly about specific decisions I am uncertain of.

One important aspect is that I should have a line of communication to the predictors. This means that some clever ones may eventually catch on to practices such as the following:

A forecaster-sales strategy

1. Find good decision options that have been overlooked

2. Make forecasts or bets on them succeeding

3. Provide really good arguments and research as to why they are overlooked

If I, the laptop purchaser, am skeptical, I could ignore the prediction feedback. But if I repeat the process for other decisions eventually I should eventually develop a sense of trust in the aggregation accuracy, and then in the predictor ability to understand my desires. I may also be very interested in what that community has to say, as they have developed a model of what my preferences are. If I'm generally a reasonable and intelligent person, I could learn how to best rely on these predictors to speed up and improve my future decisions.

In a way, this solution doesn't solve the problem of "how to decide the best option;" it just moves it into what may be a more manageable place. Over time I imagine that new strategies may emerge for what generally constitutes "good arguments", and those will be adopted. In the meantime, agents will be encouraged to quickly choose options they would generally want, using reasoning techniques they generally prefer. If one agent were really convinced by a decision market, then perhaps some forecasters would set one up in order to prove their point.

Failure Modes

There are few obvious failure modes to such a setup. I think that it could dilute signal quality, but am not as worried about some of the other obvious ones.

Weak Signals

I think it's fair to say that if one wanted to optimize for expected value, asking forecasters to predict actions instead could lead to weaker signals. Forecasters would be estimating a few things at once (how good an option is, and how likely the agent is to choose it.) If the agent isn't really intent on optimizing for specific things, and even if they are, it may be difficult to provide enough signal in their probabilities of chosen decisions for them to be useful. I think this would have to be empirically tested under different conditions.

There could also be complex feedback loops, especially for naive agents. An agent may trust its predictors too much. If the predictors believe the agent is too trusting or trusts the wrong signals, they could amplify those signals and find "easy stable points." I'm really unsure of how this would look or how much competence the agent or predictors would need to have net-beneficial outcomes. I'd be interested in testing and paying attention to this failure mode.

That said, the reference class of groups who were considering and interested in paying for using "action predictions" vs. "decision markets" or similar is a very small one, and one that I expect would be convinced only by pretty good arguments. So pragmatically, in the rare cases where the question of "would our organization be wise enough to get benefit from action predictions" is asked, I'd expect the answer to lean positively. I wouldn't expect obviously sleazy sales strategies to work to convince GiveWell of a new top cause area, for example.

Inevitable Failures

Say the predictors realized that a MacBook wouldn't make any sense for me, but that I was still 90% likely to choose it, even after I heard all of the best arguments. It would be somewhat of an "inevitable failure." The amount of utility I get from each item could be very uncorrelated with my chances of choosing that item, even after hearing about that difference.

While this may be unfortunate, it's not obvious what would work in these conditions. The goal of predictions shouldn't be to predict the future accurately, but instead to help agents make better decisions. If there were a different system that did a great job outlining the negative effect of a bad decision to my life, but I predictably ignored the system, then it just wouldn't be useful, despite being accurate. Value of information would be low. It's really tough for a system of information to be so good as to be useful even when ignored.

I'd also argue that the kinds of agents that would make predictably poor decisions would be ones that really aren't interested in getting accurate and honest information. It could seem pretty brutal to them; basically, it would involve them paying for a system that continuously tells them that they are making mistakes.

This previous discussion has assumed that the agents making the decisions are the same ones paying for the forecasting. This is not always the case, but in the counterexamples, setting up other proposals could easily be seen as hostile. If I set up a system to start evaluating the expected total values of all the actions of my friend George, knowing that George would systematically ignore the main ones, I could imagine George may not be very happy with his subsidized evaluations.

Principal-agent Problems

I think "action predictions" would help agents fulfill their actual goals, while other forecasting systems would more help them fulfill their stated goals. This has obvious costs and benefits.

Let's consider a situation with a CEO who wants to their company to be as big as possible, and corporate stakeholders who want instead for the company to be as profitable as possible.

Say the CEO commits to "maximizing shareholder revenue," and commits to making decisions that do so. If there were a decision market set up to tell how much "shareholder value" would be maximized for each of a set of options (different to a decision prediction system), and that information was public to shareholders, then it would be obvious to them when and how often the CEO disobeys that advice. This would be a very transparent set up that would allow the shareholders to police the CEO. It would take away a lot of flexibility and authority of the CEO and place it in the hands of the decision system.

On the contrary, say the CEO instead shares a transparent action prediction system. Predictor participants would, in this case, try to understand the specific motivations of the CEO and optimize their arguments as such. Even if they were being policed by shareholders, they could know this, and disguise their arguments accordingly. If discussing and correctly predicting the net impact to shareholders would be net harmful in terms of predicting the CEO's actions and convincing them as such, they could simply ignore it, or better yet find convincing arguments not to take that action. I expect that an action prediction system would essentially act to amplify the abilities of the decider, even if at the cost of other caring third parties.

Salesperson Melees

One argument against this is a gut reaction that it sounds very "salesy", so probably won't work. While I agree there are some cases where it may not too work well (stated above in the weak signal section), I think that smart people should be positively augmented by good salesmanship under reasonable incentives.

In many circumstances, salespeople practically are really useful. The industry is huge, and I'm under the impression that at least a significant fraction (>10%) is net-beneficial. Specific kinds of technical and corporate sales come to mind, where the "sales" professionals are some of the most useful for discussing technical questions with. There simply aren't other services willing to have lengthy discussions about some topics.

Externalities

Predictions used in this way would help the goals of the agents using them, but these agents may be self-interested, leading to additional negative externalities on others. I think this prediction process doesn't at all help in making people more altruistic. It simply would help agents better satisfy their own preferences. This is a common aspect to almost all intelligence-amplification proposals. I think it's important to consider, but I'm really recommending this proposal more as a "possible powerful tool", and not as a "tool that is expected to be highly globally beneficial if used." That would be a very separate discussion.

Discuss

### Oversight of Unsafe Systems via Dynamic Safety Envelopes

23 ноября, 2018 - 11:37
Published on Fri Nov 23 2018 08:37:30 GMT+0000 (UTC)

Idea

I had an idea for short-term, non-superhuman AI safety that I recently wrote up and will be posting on Arxiv. This post serves to introduce the idea, and request feedback from a more safety-oriented group than those that I would otherwise present the ideas to.

In short, the paper tries to adapt a paradigm that Mobileye has presented for autonomous vehicle safety to a much more general setting. The paradigm is to have a "safety envelope" that is dictated by a separate algorithm than the policy algorithm for driving, setting speed- and distance- limits for the vehicle based on the position of vehicles around it.

For self-driving cares, this works well because there is a physics based model of the system that can be used to find an algorithmic envelope. In arbitrary other systems, it works less well, because we don't have good fundamental models for what safe behavior means. For example, in financial markets there are "circuit breakers" that function as an opportunity for the system to take a break when something unexpected happens. The values for the circuit breakers are set via a simple heuristic that doesn't relate to the dynamics of the system in question. I propose taking a middle path - dynamically learning a safety envelope.

In building separate models for safety and for policy, I think the system can address a different problem being discussed in military and other AI contexts, which is that "Human-in-the-Loop" is impossible for normal ML systems, since it slows the reaction time down to the level of human reactions. The proposed paradigm of a safety-envelope learning system can be meaningfully controlled by humans, because the adaptive time needed for the system can be slower than the policy system that makes the lower level decisions.

Quick Q&A

1) How do we build heuristic safety envelopes in practice?

This depends on the system in question. I would be very interested in identifying domains where this class of solution could be implemented, either in toy models, or in full systems.

2) Why is this better than a system that optimizes for safety?

The issues with balancing optimization for goals versus optimization for safety can lead to perverse effects. If the system optimizing for safety is segregated, and the policy-engine is not given access to it, this should not occur.

This also allows the safety system to be built and monitored by a regulator, instead of by the owners of the system. In the case of Mobileye's proposed system, a self-driving car could have the parameters of the safety envelope dictated by traffic authorities, instead of needing to rely on the car manufacturers to implement systems that drive safely as determined by those manufacturers.

3) Are there any obvious shortcoming to this approach?

Yes. This does not scale to human- or superhuman- general intelligence, because a system aware of the constraints can attempt to design policies for avoiding them. It is primarily intended to serve as a stop-gap measure to marginally improve the safety of near-term Machine Learning systems.

Discuss

### 2018 strategy update from MIRI

23 ноября, 2018 - 02:42
https://intelligence.org/files/mirilogofb.jpg

### Approval-directed agents: overview

23 ноября, 2018 - 00:15
Published on Thu Nov 22 2018 21:15:28 GMT+0000 (UTC)

Note: This is the first post in part two: basic intuitions of the sequence on iterated amplification. The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

Research in AI is steadily progressing towards more flexible, powerful, and autonomous goal-directed behavior. This progress is likely to have significant economic and humanitarian benefits: it helps make automation faster, cheaper, and more effective, and it allows us to automate deciding what to do.

Many researchers expect goal-directed machines to predominate, and so have considered the long-term implications of this kind of automation. Some of these implications are worrying: if sophisticated artificial agents pursue their own objectives and are as smart as we are, then the future may be shaped as much by their goals as by ours.

Most thinking about “AI safety” has focused on the possibility of goal-directed machines, and asked how we might ensure that their goals are agreeable to humans. But there are other possibilities.

In this post I will flesh out one alternative to goal-directed behavior. I think this idea is particularly important from the perspective of AI safety.

Approval-directed agents

Consider a human Hugh, and an agent Arthur who uses the following procedure to choose each action:

Estimate the expected rating Hugh would give each action if he considered it at length. Take the action with the highest expected rating.

I’ll call this “approval-directed” behavior throughout this post, in contrast with goal-directed behavior. In this context I’ll call Hugh an “overseer.”

Arthur’s actions are rated more highly than those produced by any alternative procedure. That’s comforting, but it doesn’t mean that Arthur is optimal. An optimal agent may make decisions that have consequences Hugh would approve of, even if Hugh can’t anticipate those consequences himself. For example, if Arthur is playing chess he should make moves that are actually good—not moves that Hugh thinks are good.

The quality of approval-directed decisions is limited by the minimum of Arthur’s ability and Hugh’s ability: Arthur makes a decision only if it looks good to both Arthur and Hugh. So why would Hugh be interested in this proposal, rather than doing things himself?

• Hugh doesn’t actually rate actions, he just participates in a hypothetical rating process. So Hugh can oversee many agents like Arthur at once (and spend his actual time relaxing on the beach). In many cases, this is the whole point of automation.
• Hugh can (hypothetically) think for a very long time about each decision—longer than would be practical or cost-effective if he had to actually make the decision himself.
• Similarly, Hugh can think about Arthur’s decisions at a very low level of detail. For example, Hugh might rate a chess-playing AI’s choices about how to explore the game tree, rather than rating its final choice of moves. If Arthur is making billions of small decisions each second, then Hugh can think in depth about each of them, and the resulting system can be much smarter than Hugh.
• Hugh can (hypothetically) use additional resources in order to make his rating: powerful computers, the benefit of hindsight, many assistants, very long time periods.
• Hugh’s capabilities can be gradually escalated as needed, and one approval-directed system can be used to bootstrap to a more effective successor. For example, Arthur could advise Hugh on how to define a better overseer; Arthur could offer advice in real-time to help Hugh be a better overseer; or Arthur could directly act as an overseer for his more powerful successor.

In most situations, I would expect approval-directed behavior to capture the benefits of goal-directed behavior, while being easier to define and more robust to errors.

AdvantagesFacilitate indirect normativity

Approval-direction is closely related to what Nick Bostrom calls “indirect normativity” — describing what is good indirectly, by describing how to tell what is good. I think this idea encompasses the most credible proposals for defining a powerful agent’s goals, but has some practical difficulties.

Asking an overseer to evaluate outcomes directly requires defining an extremely intelligent overseer, one who is equipped (at least in principle) to evaluate the entire future of the universe. This is probably impractical overkill for the kinds of agents we will be building in the near future, who don’t have to think about the entire future of the universe.

Approval-directed behavior provides a more realistic alternative: start with simple approval-directed agents and simple overseers, and scale up the overseer and the agent in parallel. I expect the approval-directed dynamic to converge to the desired limit; this requires only that the simple overseers approve of scaling up to more powerful overseers, and that they are able to recognize appropriate improvements.

Avoid lock-in

Some approaches to AI require “locking in” design decisions. For example, if we build a goal-directed AI with the wrong goals then the AI might never correct the mistake on its own. For sufficiently sophisticated AI’s, such mistakes may be very expensive to fix. There are also more subtle forms of lock-in: an AI may also not be able to fix a bad choice of decision-theory, sufficiently bad priors, or a bad attitude towards infinity. It’s hard to know what other properties we might inadvertently lock-in.

Approval-direction involves only extremely minimal commitments. If an approval-directed AI encounters an unforeseen situation, it will respond in the way that we most approve of. We don’t need to make a decision until the situation actually arises.

Perhaps most importantly, an approval-directed agent can correct flaws in its own design, and will search for flaws if we want it to. It can change its own decision-making procedure, its own reasoning process, and its own overseer.

Fail gracefully

Approval-direction seems to “fail gracefully:” if we slightly mess up the specification, the approval-directed agent probably won’t be actively malicious. For example, suppose that Hugh was feeling extremely apathetic and so evaluated proposed actions only superficially. The resulting agent would not aggressively pursue a flawed realization of Hugh’s values; it would just behave lackadaisically. The mistake would be quickly noticed, unless Hugh deliberately approved of actions that concealed the mistake.

This looks like an improvement over misspecifying goals, which leads to systems that are actively opposed to their users. Such systems are motivated to conceal possible problems and to behave maliciously.

The same principle sometimes applies if you define the right overseer but the agent reasons incorrectly about it, if you misspecify the entire rating process, or if your system doesn’t work quite like you expect. Any of these mistakes could be serious for a goal-directed agent, but are probably handled gracefully by an approval-directed agent.

Similarly, if Arthur is smarter than Hugh expects, the only problem is that Arthur won’t be able to use all of his intelligence to devise excellent plans. This is a serious problem, but it can be fixed by trial and error—rather than leading to surprising failure modes.

Is it plausible?

I’ve already mentioned the practical demand for goal-directed behavior and why I think that approval-directed behavior satisfies that demand. There are other reasons to think that agents might be goal-directed. These are all variations on the same theme, so I apologize if my responses become repetitive.

Internal decision-making

We assumed that Arthur can predict what actions Hugh will rate highly. But in order to make these predictions, Arthur might use goal-directed behavior. For example, Arthur might perform a calculation because he believes it will help him predict what actions Hugh will rate highly. Our apparently approval-directed decision-maker may have goals after all, on the inside. Can we avoid this?

I think so: Arthur’s internal decisions could also be approval-directed. Rather than performing a calculation because it will help make a good prediction, Arthur can perform that calculation because Hugh would rate this decision highly. If Hugh is coherent, then taking individual steps that Hugh rates highly leads to overall behavior that Hugh would approve of, just like taking individual steps that maximize X leads to behavior that maximizes X.

In fact the result may be more desirable, from Hugh’s perspective, than maximizing Hugh’s approval. For example, Hugh might incorrectly rate some actions highly, because he doesn’t understand them. An agent maximizing Hugh’s approval might find those actions and take them. But if the agent was internally approval-directed, then it wouldn’t try to exploit errors in Hugh’s ratings. Actions that lead to reported approval but not real approval, don’t lead to approval for approved reasons

Turtles all the way down?

Approval-direction stops making sense for low-level decisions. A program moves data from register A into register B because that’s what the next instruction says, not because that’s what Hugh would approve of. After all, deciding whether Hugh would approve itself requires moving data from one register to another, and we would be left with an infinite regress.

The same thing is true for goal-directed behavior. Low-level actions are taken because the programmer chose them. The programmer may have chosen them because she thought they would help the system achieve its goal, but the actions themselves are performed because that’s what’s in the code, not because of an explicit belief that they will lead to the goal. Similarly, actions might be performed because a simple heuristic suggests they will contribute to the goal — the heuristic was chosen or learned because it was expected to be useful for the goal, but the action is motivated by the heuristic. Taking the action doesn’t involve thinking about the heuristic, just following it.

Similarly, an approval-directed agent might perform an action because it’s the next instruction in the program, or because it’s recommended by a simple heuristic. The program or heuristic might have been chosen to result in approved actions, but the taking the action doesn’t involve reasoning about approval. The aggregate effect of using and refining such heuristics is to effectively do what the user approves of.

In many cases, perhaps a majority, the heuristics for goal-directed and approval-directed behavior will coincide. To answer “what do I want this function to do next?” I very often ask “what do I want the end result to be?” In these cases the difference is in how we think about the behavior of the overall system, and what invariants we try to maintain as we design it.

Relative difficulty?

Approval-directed subsystems might be harder to build than goal-directed subsystems. For example, there is much more data of the form “X leads to Y” than of the form “the user approves of X.” This is a typical AI problem, though, and can be approached using typical techniques.

Approval-directed subsystems might also be easier to build, and I think this is the case today. For example, I recently wrote a function to decide which of two methods to use for the next step of an optimization. Right now it uses a simple heuristic with mediocre performance. But I could also have labeled some examples as “use method A” or “use method B,” and trained a model to predict what I would say. This model could then be used to decide when to use A, when to use B, and when to ask me for more training data.

Reflective stability

Rational goal-directed behavior is reflectively stable: if you want X, you generally want to continue wanting X. Can approval-directed behavior have the same property?

Approval-directed systems inherit reflective stability (or instability) from their overseers. Hugh can determine whether Arthur “wants” to remain approval-directed, by approving or disapproving of actions that would change Arthur’s decision-making process.

Goal-directed agents want to be wiser and know more, though their goals are stable. Approval-directed agents also want to be wiser and know more, but they also want their overseers to be wiser and know more. The overseer is not stable, but the overseer’s values are. This is a feature, not a bug.

Similarly, an agent composed of approval-directed subsystems overseen by Hugh is not the same as an approval-directed agent overseen by Hugh. For example, the composite may make decisions too subtle for Hugh to understand. Again, this is a feature, not a bug.

Black box search

(Note: I no longer agree with the conclusions of this section. I now feel that approval-directed agents can probably be constructed out of powerful black-box search (or stochastic gradient descent); my main priority is now either handling this setting or else understanding exactly what the obstruction is. Ongoing work in this direction is collected at ai-control, and will hopefully be published in a clear format by the end of 2016.)

Some approaches to AI probably can’t yield approval-directed agents. For example, we could perform a search which treats possible agents as a black boxes and measures their behavior for signs of intelligence. Such a search could (eventually) find a human-level intelligence, but would give us very crude control over how that intelligence was applied. We could get some kind of goal-directed behavior by selecting for it, but selecting for approval-directed behavior would be difficult:

1. The paucity of data on approval is a huge problem in this setting. (Note: semi-supervised reinforcement learning is an approach to this problem.)
2. You have no control over the internal behavior of the agent, which you would expect to be optimized for pursuing a particular goal: maximizing whatever measure of “approval” that you used to guide your search. (Note: I no longer endorse this argument as written; reward engineering is a response to the substance of this concern.)
3. Agents who maximized your reported approval in test cases need not do so in general, any more than humans are reliable reproductive-fitness-maximizers. (Note: red teaming is an approach to this problem.)

But [1] and especially [3] are also problems when designing a goal-directed agent with agreeable goals, or indeed any particular goals at all. Though approval-direction can’t deal with these problems, they aren’t new problems.

Such a black-box search—with little insight into the internal structure of the agents—seems worrying no matter how we approach AI safety. Fortunately, it also seems unlikely (though not out of the question).

A similar search is more likely to be used to produce internal components of a larger system (for example, you might train a neural network to identify objects, as a component of a system for navigating an unknown environment). This presents similar challenges, concerning robustness and unintended behaviors, whether we are designing a goal-directed or approval-directed agent.

This essay was originally posted here. The second half of it can be found in the next post in this sequence.

Tomorrow's AI Alignment Forum sequences post will be 'Approval-directed agents: "implementation" details', by Paul Christiano.

Discuss

### Speculative Evopsych, Ep. 1

22 ноября, 2018 - 22:00
Published on Thu Nov 22 2018 19:00:04 GMT+0000 (UTC)

(cw death, religion, suicide, evolutionary psychology, shameless tongue-in-cheek meta-contrarianism)

I have a passing interest in biology, so I recently bought some fruit flies to run experiments on. I did two things to them. First, I bred them for intelligence. The details are kinda boring, so let’s fast-forward: after a few tens of millions of generations, they were respectably intelligent, with language and culture and technology so on.

In parallel with that, and more interestingly, whenever a fly was about to die of injury, I immediately plucked it out of the box, healed it, and put it in a different box (“Box Two”), a magnificent paradise where it blissfully lived out the rest of its days. Evolutionarily, of course, relocation was equivalent to death, and so the flies evolved to treat them the same: you could still make somebody stop affecting your world by stabbing them, and their kin would still grieve and seek revenge – the only difference was the lack of a corpse.

It didn’t really matter that the two boxes were separated only by a pane of glass, and that the flies in Box One could clearly see their “deceased” fellows living fantastic lives in Box Two. They “knew” on an abstract, intellectual level that getting “fatally” wounded wouldn’t actually make them stop having conscious experiences like death would. But evolution doesn’t care about that distinction; so it doesn’t select for organisms that care about that distinction; so the flies generally disregarded Box Two.

A small subculture in Box One claimed that “if anybody actually believed in Box Two and all its wonders, they’d stab themself through the heart in order to get there faster. Everybody’s literally-mortal fear of relocation proves that they don’t truly believe in Box Two, they only – at best – believe they believe.”

Strangely, nobody found this argument convincing.

Discuss

### Perspective Reasoning and the Sleeping Beauty Problem

22 ноября, 2018 - 19:22
Published on Thu Nov 22 2018 11:55:22 GMT+0000 (UTC)

I want to present a new argument for double-halving position for the sleeping beauty problem. It is my contention that paradoxes related to the anthropic principle such as the Sleeping Beauty Problem and the Doomsday Argument is caused by mixing logics from different perspectives. To solve these paradoxes we only need to reason from a single consistent perspective.

When thinking about anthropic related paradoxes one perspective we can employ is an observer’s first-person perspective. The center of a perspective is primitively unique in logic and reasoning. E.g. I am fundamentally special to myself. I do not need to know any objective differences between me and every other human being to tell “this is me”. Similarly, the present moment is primitively meaningful. I can inherently tell “now” is not any other time. From first-person perceptive it is possible to specify the center, e.g. myself and now, from other people and time without any additional information.

We can also reason as an outside observer, or employ a God’s view, I call it the third-person perspective. The main point is that the center of that perspective is irrelevant to the topic of interest. From this perspective there is no obvious “me”. Every observer in question is ordinary as everybody else. In order to specify someone from a group one have to know the differences between him and the rest. Similarly there is no “now” that inherently stands out from any other moment. From third-person perspective every observer and everyday are treated as equals.

I argue the reasonings from these two perspectives should not be mixed and used together. So I am either inherently special (such that no information is needed to specify me) or just an ordinary person as everybody else is. Similarly “today” is either a meaningful moment uniquely stands out or everyday in the experiment are equals. We should always stick to one perspective and not use both points in the same logic framework. Mix them up and paradoxes would ensue.

From this starting point I derived the answer to the Sleeping Beauty Problem should be double halving. Several point worth mentioning:

1. No matter which perspective one choose to reason from there is no new information when beauty wakes up.

2. Questions such as “the probability of today is Monday” is invalid. Such probabilities do not exist since it requires specifying today from first-person perspective and treats both days in the experiment as equals like third-person perspective does.

3. This means after being told it is Monday beauty could keep her answer unchanged at 1/2.

4. Repeating the experiment from any single perspective and the relative frequency of heads can be shown to be 1/2.

5. This provide new arguments for double halfer position when it involves bets and rewards (such as Dutch book arguments)

6. It provides a perfect explanation for the perspective disagreement troubling halfers as pointed out by Pittard (2015).

7. It does not result in the embarrassment as other double halfer position do as pointed out by Titelbaum(2012).

8. It disproves the Doomsday Argument and the Presumptuous Philosopher base on the same principle.

My complete argument in pdf can be found here.

https://www.sleepingbeautyproblem.com/wp-content/uploads/2018/11/Perspective-Reasoning-and-the-Solution-to-the-Sleeping-Beauty-Problem-Xianda-Gao-19-11-2018.pdf

I apologize in advance for my language skills and possible misuse of terminologies. English is not my native language and philosophy is not my field. I desperately need feedbacks especially counter arguments. Thank you.

Discuss

### If You Want to Win, Stop Conceding

22 ноября, 2018 - 18:47
Published on Thu Nov 22 2018 15:46:59 GMT+0000 (UTC)

Author's note: This is the first in what I suppose might be a series of posts with respect to things I learned from playing traditional games competitively that I think might have broader applications.

Traditional games - card games, board games, miniatures, etc. are a lot of fun, and I've played several of them at quite a competitive level. [1]

The #1 piece of advice that I can give if you want to get better at these games - a piece of advice that applies across essentially every game or sport I've played and a lot of "real world" stuff as well - is "if you want to win, stop conceding."

On the surface that doesn't sound super deep or interesting, but there's more to it than the obvious meaning - not all concedes are formal resignations, and indeed the ones that aren't are often more important.

Some time ago I read a book - either "The Inner Game of Tennis" or "Bonds that Make Us Free" or maybe both - that taught me that very many people concede games well before they need to be over, either because they incorrectly estimate their chances or because they make a mental motion away from trying to win and towards trying to make excuses for losing.

Here are some examples of what excuse-making thoughts might sound like:

• "The dice are against me, there's nothing I can do."
• "I didn't get a good night's sleep or eat breakfast this morning, otherwise I would be winning - I just can't focus."
• "This guy's bad but he got lucky." [2]
• "This guy's way better than me, I shouldn't even be matched up against him." [3]
• "I don't know how she even got this far ahead, there must be some bug."
• "This is pointless, why play it out?"
• "This is just for fun anyway, I've basically lost, may as well wind things down."

Once you have moved from trying to win and towards trying to excuse losing, you have more or less already lost. Sometimes an opponent might snatch defeat from the jaws of victory, but that's rare. Most of the time, assuming you've lost is the same as actually losing, it just takes longer.

If you want to get better, stop it. Force yourself to keep playing and keep looking for outs. Learn to recognize these mental patterns and suppress them. Don't concede games unless there's nothing you can do anymore. Fight until the bitter end.

That's a real 'if', because if you don't want to get better I certainly don't recommend doing this! There is a sense in which it is viscerally unpleasant to force yourself to be in a losing position, desperately scrabbling for anything that can get you out.

When you get in that position and escape, it feels great - but there are also going to be times when you don't escape, you struggle for ten or fifteen or thirty minutes but still lose, the whole thing feels bad, and you might wish you had conceded in the first place. If you aren't prepared to face that, maybe don't bother.

But what I've found is that a practice of facing the negative thoughts and pushing through seems to have been quite beneficial to me across a wide range of games and areas, so I would give serious thought to the notion that - at least in areas that you care about and want to be better at - you should consider this approach.

A few days before writing this, I played a card game where around two to five times throughout the course of an ~hour-long game I was struck by thoughts along the lines of "Wow, my position is horrible. I've been really unlucky. I should concede."

I didn't concede, and I won the game. This is not a particularly unusual experience to have once you acquire the inclination and ability to push through.

[1] For calibration:

At various times I have been world #1 by Elo rating in a few different games I've played. I came in third at the World Championships of L5R this year despite being out of practice (though to be fair I had some good luck).

I have flown to a card game tournament in another state in large part because I calculated I was likely to win enough in prizes that I would net gain money from the trip; I haven't eBayed all the promotional items I won yet but I believe I indeed made hundreds of dollars from that venture.

I don't say this to boast but rather to give an indication of where I am coming from. In order to "deflate the sails" a bit, I should say that I do not make a career out of gaming and I don't play poker or Magic: the Gathering competitively, which have a significantly higher level of play than most games I play; there are many people who are better at games than I am.

[2] This thought pattern is especially bad because it prevents you from learning from the game after the fact. Sure, some games do come down to luck in the end, but probably there were decisions you could have made prior to that that would influence the odds.

[3] A joke saying goes: "Anyone worse than me at this game is casual n00b trash. Anyone better than me at this game is a no-life tryhard." Neither of the thoughts in that dichotomy is very useful to have, even in their less straw forms.

Discuss

### Review: Artifact

22 ноября, 2018 - 18:00
Published on Thu Nov 22 2018 15:00:01 GMT+0000 (UTC)

Epistemic Status: Alpha tester

Bottom Line: If you are willing to devote the time and attention to a deep strategic game, Artifact will reward you handsomely. I highly recommended those who like such experiences to make the time. If you are not willing to devote the time and attention, you will likely be frustrated and bounce off, and what time and attention you do have to game with is better spent elsewhere.

Artifact is an amazing game. Artifact is gorgeous, immersive and flavorful, hilarious, innovative, exciting, suspenseful, skill testing, strategically complex and rewarding. The execution is bug-free and flawless. It is the most fun I have had playing a game in a long time.

It streams well and is an excellent spectator sport for those who know the game and cards, and will be supported by an inaugural tournament with a one million dollar first prize. It is a Valve game, so you know it will get the level of support and attention it deserves in every aspect.

The economic model is the right one. Rather than addicting players to daily rewards and grinds, Artifact charges money for a game worth playing. You own your cards and will soon be able to buy and sell them. For your initial $20 you get 20 packs, each containing at least one card of the highest rarity and often two or even three. Additional packs are$2. Playing events costs only a single event ticket ($1), and you turn a large profit if you can get three wins before your second loss. I will say more on this later, but the model presented is extraordinarily generous, and those who are comparing it unfavorably to Magic Online should be ashamed of themselves. The catch is that Artifact is complex. Very complex. Complex enough that I have had multiple Magic professionals try the game only to have them report back that they bounced off the game because they did not understand what was going on. Artifact makes the most of its complexity, and uses its roots in DOTA 2 to justify much of it, but the complexity is still there and complexity is bad. Your first hour is likely to be overwhelming and confusing, as lots of cool things are happening all around you but it’s impossible to fully know why or what they mean, or what is likely to happen next. Gameplay Artifact matches up two teams of five heroes each, who do battle across three distinct lanes. This mimics the structure of games like DOTA 2, where there are five heroes on each side and three lanes in which to do battle, which we will call left, center and right. In addition to lanes and teams of five heroes that return when they die, Artifact also takes its concepts of towers, ancients, creeps, bounties and items directly from DOTA 2. Things that would seem needlessly complex or arbitrary in another context… still feel somewhat that way on occasion, but it helps a lot that they’re carrying the concepts over from another very complex game. This helped me even though I never learned how to play DOTA 2 or any similar game. The basic jist of Artifact is that players have five powerful heroes that let you do things of their color while they are in the active lane, and that fight hard, and they also summon a variety of other things to fight. If heroes die, they come back with a one turn delay. Players start with 3 mana on turn one in each tower, which increases by one each turn, and use this to pay for stuff. After players are done doing stuff, each unit does damage to the unit opposite it. If there’s nothing there, it damages the enemy tower or ancient. Do 40 damage each to destroy two of the enemies’ three towers to win, or do 40 and then on future turns 80 damage to one of them to kill the ancient, which also is a win. At the start of the game, each player sends one of their three starting heroes into each of the three lanes at random, and randomly is given three 2/4 (meaning two power and four health) ‘melee creeps.’ Each player starts with a tower in each lane that has 40 health. The turn order is: 1. Players are shown where new ‘creeps’ will enter the battle. By default, each turn each player gets two 2/4 creeps that are each put in a random lane. 2. Players simultaneously choose which lane to deploy each newly available hero to. Your fourth hero becomes available on turn two, your fifth on turn three. Any heroes that are killed ‘return to the fountain’ and become available two turns later, and any heroes that return to the fountain without being killed become available the next turn. 3. All new units are deployed to each lane. First, the game attempts to place new units in empty spaces across from enemy units. If that can’t be done, then a new pair of open spaces is created (randomly on the left or right side of the existing units) until no more such pairs are needed. If that leaves empty spots on one or both sides of the battle, those slots are filled by a straight arrow (50%), left arrow (25%) or right arrow (25%). Then all new units and all arrows are shuffled and distributed to the lane at random. Or, in English, first you fill in any empty space, then you deploy to the side, and any empty spaces randomly have arrows half the time. 4. Players draw two cards and the mana capacity of all towers increases by one. 5. Play proceeds to the left lane. The player with initiative can either take an action or pass. Actions include casting a spell, deploying a piece of equipment, deploying an improvement, deploying a creep card or using an activated ability of a unit or improvement. If you play an improvement, you can play it in any lane you like. Anything else must go in or impact the active lane unless it explicitly says otherwise. A new creep must first go in the empty space of your choice. If there are no empty spaces, you can choose to put it on the left or right side. If a card deploys other things rather than itself, the game will choose for you. 6. If take an action rather than passing, and you had initiative, you lose it. 7. This continues until both players pass. 8. Each unit does damage to the unit across from it. Damage persists. If a unit runs out of health it dies. Units that are not opposed damage the enemy tower directly. Armor reduces damage taken from any source, negative armor increases it. There are also a bunch of other abilities. 9. If a tower has taken 40 damage, it is destroyed and replaced by the ancient, which has 80 damage. If the ancient has taken 80, it dies. If you destroy two towers or one ancient, game is over and you win. 10. Each enemy you kill gives you one gold, or if it is a hero it gives you five gold. At end of turn, you can shop to buy items, which cost 0 mana and are played on your heroes to make them better. You come with a deck of items, and can choose between buying those (in a random order each turn), buying a fully random item, and/or buying a consumable. Gold can be kept for future turns. There are a lot more details, but that’s Artifact. There are often complex tactical questions that can play out over many turns with a lot of bluffing and uncertainty, and you must divide your resources between the lanes while anticipating how your opponent will do the same. If you invest the wrong amount, you can have more than enough to win a tower without enough to kill the ancient in time to matter, and/or find that you’re a turn behind on the tower that actually matters. Often a lot of power effectively goes to waste in a side fight unlikely to change the outcome, and one player figures this out well in advance of the other. Initiative on most turns is a liability, as you must commit to and show what you are doing before your opponent. But on key turns, initiative is everything, because your first action kills or stuns their heroes, preventing them from doing anything. Keeping the right color hero alive in the right lane is a frequent focal point. Often games come down to who can have initiative in the key moment, and how much players can afford to give up to make sure they get it. On the first turn, players face a random set of matchups between heroes. Cards that turn unfavorable or neutral matchups into more favorable ones (e.g. they were going to kill you and live, and now at least you both die, or before you both die and now you win) are usually the most valuable things in the first two turns. If you win those early fights, you get five gold, which you can use to buy an item that threatens to win you the next fight, and meanwhile you’re doing early damage and can cast spells while your enemy sits on the sidelines. Some decks try to do serious damage to towers right at the start, which shifts focus to that. As the game develops, it becomes more and more about deciding which towers to fight for, and deploying power to those towers, and shifts from building up long term resources to winning fights on the spot, saving your tower or damaging theirs, or unleashing your powerful high-cost spells and not letting them cast or capitalize on their versions. Common endgame scenarios include each player winning one tower and a big fight over the third one, one player going for an ancient and trying to stall in one or both of the other two towers, and a race to take down his ancient before the enemy takes down yours. Sacrificing one tower is often a wise move, either because you can’t reasonably fight back, or because you can strand a lot of resources there, often including multiple heroes, without a practical path to killing the ancient before the game ends. A key experience is that all the things are happening all over three boards. It is up to you to figure out which of those things are worth fighting for and spending resources to protect or attack. Due to the optionality and randomness of deployment and arrows, and not knowing your opponents’ hand, assuming the game will go a certain way has a habit of backfiring, but I also often have the bad habit of trying to win everything at once when that is not necessary. A very important rare is Annihilation, which destroys every unit in a lane. When you play against a blue deck, it likely has this card and that forces you to play the entire game differently to avoid an over-commitment, and to not let them have an opening to use it later. Another option is At Any Cost, which does six damage to every unit in the lane. Another important rare is Time of Triumph. When red decks reach eight mana, they can give all heroes in a lane a huge boost in effectiveness. As the hero Axe puts it, your ancients’ days are numbered. Green decks have Emissary of the Quorum. This costs eight mana, and is a creep with high health. A key item is Blink Dagger, which allows heroes to shift between lanes. You choose your five heroes and the other they will enter the game. Each also adds three copies of its associated card to your deck, which accounts for 15 of your 45 cards. You also build a nine (or more) card item deck. As a result of these and a few additional similar cards, games where players have access to rare cards feel very different than games where they have access to only common cards. Games of limited, or where players lack key rares, are about figuring out where to fight how hard, lining up key resources and accumulating board advantage. Games with higher level constructed decks force players to worry much more about over-commitment and be on the lookout for sweepers and powerful knockout blows that threaten to render those early battles irrelevant. Ways to Play Current options are to play constructed, phantom draft or keeper draft, and do so in social, casual or expert mode. Casual mode is like expert except with no entry fees and no prizes. I generally dislike playing on casual mode, since players won’t take it seriously when there is nothing at stake, so I have only experimented with the expert queues. A new player would presumably start out in casual for a while, to avoid hemorrhaging cash and/or getting consistently crushed while learning the game. You can also play against the AI. The AI is not good at Artifact, and lacks strategic planning, but it is good enough to allow one to get a sense of whether a deck has been built in a reasonable fashion and get a handle on how it plays. It is also very good for players starting out. Expert queues in constructed and phantom draft cost one event ticket ($1) to enter. You play until you get five wins or two losses. For three wins you get your ticket back, and the fourth and fifth wins each grant a booster pack ($2). If we value boosters at their retail price, we get an expected value of 0.906 at a 50% win rate, and even a little better than that puts you ahead of the game. The catch, of course, is that packs cannot be efficiently converted into event tickets, and there are transaction costs to using the Steam Marketplace, so even if packs trade at the full$2 you’re going to take a haircut. Keeper drafts require you to bring the packs and allow you to keep what you draft, and cost two event tickets rather than one. I expect the default draft mode to be phantom draft.

Artifact is a complicated structure as it is, so it is understandable that players are limited to a few proven play modes and a single constructed format. Hopefully in the long term this will expand as we get more players, better collections and more sets.

In the future, we have been promised a $1 million dollar first prize tournament, doubtless with more to follow if the game does well. Players who are good enough to consider competing for that should not be distracted by small expenses or grinding opportunities, and focus on improving their play as rapidly as possible. Graphics, Sound and Under Interface This is a beautiful and supremely well-polished game. I have not yet encountered a bug or glitch of any kind in the beta, and there were almost none even in the alpha. Game play is as smooth as can be and everything just works. Given the complexity of the underlying game, it does a great job showing you what is happening and why. While the card art is only about on par with other games, the rest of the interface is full of great touches. Each player gets a highly emotive imp that illustrates what is going on, which is mostly great fun, even if it is kind of annoying when it is emphatically pointing out that your tower is about to die (since I assure you I am already aware of that). There is a ton of information on the screen, or on other screens one can scroll to, and it is all easy to navigate and view once you are used to it. The only thing I dislike in this area is that it is currently a bit annoying to see what the associated hero cards are for the enemy heroes, but that will doubtless be fixed soon. Rather than the quick repetitive one liners of Hearthstone and Eternal, the heroes and many of the crepes have rich personalities and lots to say. There are many lines that depend on the interaction of multiple specific cards, or cards getting to do a particular thing. It actively makes me want to play with a variety of cards and heroes, and unlike every other card game, I never play with the sound off. Economic Analysis Players have come to expect a free to play experience from many of their games. Not only do they expect to not pay, they expect to be paid for playing, in the sense of having a better and more expensive collection after playing than they did before playing. This is a toxic, no good, very bad thing. I shared some of my thoughts in my write-ups of Eternal. I fully echo and endorse what Richard Garfield said in A Game Player’s Manifesto. Instead of playing games, players are trapped in Skinner boxes, or playing in a way that is effectively working for a tiny effective wage. Huge swaths of games are never seen and experienced, while the most efficient are used over and over, and I believe the economic model of free to play and card creation is to blame. Artifact returns us to the model of buying packs and then buying, selling and trading cards to get what we want. The game costs$20, packs cost $2, entering expert-level events costs$1. Playing casually is free once you’ve paid the $20, including casual drafting. So if all you want to do is draft, you can do so endlessly for free, forever, and never have to worry about building a collection. If you want to build a collection, several things work in your favor that are easy to not fully appreciate. Packs always contain at least one rare card, but often contain two or even three. This adds up to a substantial discount over time. Much more importantly, Artifact doesn’t have a fourth rarity. There are no ‘Legendary’ or ‘Mythic’ cards, at all. If you are getting your cards from packs, this cuts the number of packs you need to get a full collection by a factor of three or more. If you are buying singles to get what you need, it cuts the cost of the best cards by that amount or more. We don’t know what the price of top cards is going to be, but there isn’t that much room before opening packs becomes a better solution, and traders start busting open packs to find cards to sell. Another bonus is that you can liquidate any 20 cards to get an event ticket. Packs contain 12 cards, so if you liquidate the bulk of your surplus commons, you can get back a ticket about once every three packs, cutting the cost by a full sixth. This also should help hold up the market value of commons, preventing the market from becoming flooded. The market itself is not yet operational, as the game remains in beta. I worry that the lack of a working market is giving players the wrong idea of the game’s economics, but this will be remedied soon. If the market runs well and Valve takes only a reasonable cut, it will considerably reduce the cost of getting the cards you need and allow the bulk of the value to be reclaimed from packs. Magic Online has long had the issue that packs trade for well below the cost to buy packs from the store. Either a lot of players who don’t know any better must be buying packs from the store anyway, tournaments must be giving out packs faster than the players need cards, or some combination of the two, for this to happen. Artifact also has a tournament system that eats event tickets and spits out packs, which in turn only spit out a fraction of their value in event tickets. So there is definitely at least some risk of there being a flood of packs causing them to drop below store value. If that does happen, packs get cheaper and so do singles, making the game cheaper to buy, but making event payouts worse. Magic Online charged a much higher amount of money to enter events, with drafts usually at least$8 or so and constructed events at least $5, and constructed leagues now over$10 for five matches, so increasing the rake was a big deal there. At $1 for an average of about four games, even a large rake means that the hourly cost remains low, with casual being backed by Elo-style matchmaking. That makes me far less concerned about the drop in value, provided the marketplace fees remain reasonable. It would still need to be fixed, of course, if it became a large enough effect, but that is also easy enough – just replace the second pack prize with event tickets. Add all these effects together, and the effective cost of play for Artifact ends up being on the extreme low end of collectible card game costs that aren’t free to play. Free casual phantom drafts are an extreme contrast with Magic Online, as is the far less greedy pack structure. The catch is that those free to play Skinner boxes are powerful stuff. When one plays Arena, Eternal or Hearthstone one feels like one is making money, being paid to play. You’re not. You can’t sell the cards, so you’re being given a discount on buying the game, and a pathetic hourly rate of discount at that even when you are maximizing your grind. But that is not how it feels. Overall Artifact is a unique and amazing game, with tons of interesting decisions and exciting games and super rich in flavor, but it asks a lot of its players. One must pay attention and be comfortable with a lot of complexity. When players of different skill levels play, if the decks are similar in power level, the sufficiently superior player will win almost all the time. The experience is not for everyone. You need to know what experience you are aiming for. Are you here to be competitive and try out for the million bucks against the likes of prohibitive early favorite Stanislav Cifka? Are you here to do a bunch of fun drafts? Are you here for the stories, flavor and lore? All are valid choices. I would have loves to be an Artifact streamer and competitor, but between my family and various job opportunities, I do not have the time to treat that with the seriousness it deserves, and I can’t pretend that at my age I haven’t lost a step. I likely will stream from time to time when I have the chance, and I will of course throw my hat in the ring when the time comes, but I am under no illusions that I am taking home the million dollars. Two choices that are not available are the ability to profitably grind out a dollar or two an hour in game assets, or to fully understand the game and/or do well without giving the game the attention it deserves. I encourage players to try out the game, but do so with your eyes open. At a minimum, if you have the gumption to get through the first hour or two of learning, the basic$20 product, with its free casual phantom drafts, is an amazing bargain. I am a big fan of Artifact.

Discuss

### Believing others' priors

22 ноября, 2018 - 17:51
Published on Thu Nov 22 2018 14:51:47 GMT+0000 (UTC)

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} Meet the Bayesians

In one way of looking at Bayesian reasoners, there are a bunch of possible worlds and a bunch of people, who start out with some guesses about what possible world we're in. Everyone knows everyone else's initial guesses. As evidence comes in, agents change their guesses about which world they're in via Bayesian updating.

The Bayesians can share information just by sharing how their beliefs have changed.

"Bob initially thought that last Monday would be sunny with probability 0.8, but now he thinks it was sunny with probability 0.9, so he must have has seen evidence that he judges as 4/9ths as likely if it wasn't sunny than if it was"

If they have the same priors, they'll converge to the same beliefs. But if they don't, it seems they can agree to disagree. This is a bit frustrating, because we don't want people to ignore our very convincing evidence just because they've gotten away with having a stupid weird prior.

What can we say about which priors are permissible? Robin Hanson offers an argument that we must either (a) believe our prior was created by a special process that correlated it with the truth more than everyone else's or (b) our prior must be the same as everyone else's.

Meet the pre-Bayesians

How does that argument go? Roughly, Hanson describes a slightly more nuanced set of reasoners: the pre-Bayesians. The pre-Bayesians are not only uncertain about what world they're in, but also about what everyone's priors are.

These uncertainties can be tangled together (the joint distribution doesn't have to factorise into their beliefs about everyone's priors and their beliefs about worlds). Facts about the world can change their opinions about what prior assignments people have.

Hanson then imposes a pre-rationality condition: if you find out what priors everyone has, you should agree with your prior about how likely different worlds are. In other words, you should trust your prior in the future. Once you have this condition, it seems that it's impossible to both (a) believe that some other people's priors were generated in a way that makes them as likely to be good as yours and (b) have different priors from those people.

Let's dig into the sort of things this pre-rationality condition commits you to.

Consider the class of worlds where you are generated by a machine that randomly generates a prior and sticks it in your head. The pre-rationality rule says that worlds where this randomly-generated prior describes the world well are more likely than worlds where it is a poor description.

So if I pop out with a very certain belief that I have eleven toes, such that no amount of visual evidence that I have ten toes can shake my faith, the pre-prior should indeed place more weight on those worlds where I have eleven toes and various optical trickery conspires to make it look like I have ten.

If this seems worrying to you, consider that you may be asking too much of this pre-rationality condition. After all, if you have a weird prior, you have a weird prior. In the machine-generating-random-priors world, you already believe that your prior is a good fit for the world. That's what it is to have a prior. Yes, according to our actual posteriors it seems like there should be no correlation between these random priors and the world they're in, but asking the pre-rationality condition to make our actual beliefs win out seems like a pretty illicit move.

Another worry is that it seems there's some spooky action-at-a-distance going on between the pre-rationality condition and the assignment of priors. Once everyone has their priors, the pre-rationality condition is powerless to change them. So how is the pre-rationality condition making it so that everyone has the same prior?

I claim that actually, this presentation of the pre-Bayesian proof is not quite right. According to me, if I'm a Bayesian and believe our priors are equally good, then we must have the same priors. If I'm a pre-Bayesian and believe our priors are equally good, then I must believe that your prior averages out to mine. This latter move is open to the pre-Bayesian (who has uncertainty about priors) but not to the Bayesian (who knows the priors).

I'll make an argument purely within Bayesianism for believing in equally good priors to having the same prior, and then we'll see how belief in priors comes in for a pre-Bayesian.

Bayesian prior equality

To get this off the ground, I want to make precise the claim of believing someone's priors are as good as yours. I'm going to look at 3 ways of doing this. Note that Hanson doesn't suggest a particular one, so he doesn't have to accept any of these as what he means, and that might change how well my argument works.

Let's suppose my prior is p and yours is q. Note, these are fixed functions, not references pointing at my prior and your prior. In the Bayesian framework, we just have our priors, end of story. We don't reason about cases where our priors were different.

Let's suppose score is a strictly proper scoring rule (if you don't know what that means, I'll explain in a moment). score takes in a probability distribution over a random variable and an actual value for that random variable. It gives more points the more of the probability distribution's mass is near the actual value. For it to be strictly proper, I uniquely maximise my expected score by reporting my true probability distribution. That is Ep[score(f,X)] is uniquely maximised when f = p.

Let's also suppose my posterior is p|B, that is (using notation a bit loosely) my prior probability conditioned on some background information B.

Here are some attempts to precisely claim someone's prior is as good as mine:

1. For all X, Ep[score(p,X)]=Ep[score(q,X)].
2. For all X, Ep|B[score(p|B,X)]=Ep|B[score(q|B,X)].
3. For all X, Ep|B[score(p,X)]=Ep|B[score(q,X)].

(1) says that, according to my prior, your prior is as good as mine. By the definition of a proper scoring rule, this means that your prior is the same as mine.

(2) says that, according to my posterior, the posterior you'd have with my current information is as good as the posterior I have. By the definition of the proper scoring rule, this means that your posterior is equal to my posterior. This is a bit broader than (1), and allows your prior to have already "priced in" some information that I now have.

(3) says that given what we know now, your prior was as good as mine.

That rules out q = p|B. That would be a prior that's better than mine: it's just what you get from mine when you're already certain you'll observe some evidence (like an apple falling in 1663). Observing that evidence doesn't change your beliefs.

In general, it can't be the case that you predicted B as more likely than me, which can be seen by taking X = B.

On future events, your prior can match my prior, or diverge from my posterior equally as far as my prior, but in the opposite direction.

I don't really like 3, because while it accepts that your prior was as good as mine in the past, it can think that after you update your prior you'll still be worse than me.

That leaves us with 1 and 2 then. If 1 or 2 are our precise notion, then it follows quickly that we have common priors.

This is just a notion of logical consistency though; I don't have room for believing that our prior-generating processes make yours as likely to be true as mine. It's just that if the probability distribution that happens to be your prior appears to me as good as the probability distribution that happens to be my prior, they are the same probability distribution.

Pre-Bayesian prior equality

How to make pre-Bayesian claim that your prior is as good as mine?

Here let, pᵢ be my prior as a reference, rather than as a concrete probability distribution. Claims about pᵢ are claims about my prior, no matter what function that actually ends up being. So for example, claiming that pᵢ scores well is claiming that as we look at different worlds, we see it is likely that my prior is a well-adapted prior for that specific world. In contrast, a claim that p scores well would be a claim that the actual world looks a lot like p.

Similarly, pⱼ is your prior as a reference. Let p be a vector assigning a prior to each agent.

Let f be my pre-prior. That is, my initial beliefs over combinations of worlds and prior assignments. Similarly to above, let f|B be my pre-posterior (a bit of an awkward term, I admit).

For ease of exposition (and I don't think entirely unreasonably), I'm going to imagine that I know my prior precisely. That is f(w, p) = 0 if pᵢ ≠ p.

Here are some ways of making the belief that your prior is as good as mine precise in the pre-Bayesian framework.

1. For all X, Ep[score(p,X)]=Ef[score(pⱼ,X)].
2. For all X, Ep|B[score(p|B,X)]=Ef|B[score(pⱼ|B,X)].
3. For all X, Ep|B[score(p,X)]=Ef|B[score(pⱼ,X)].

On the LHS, the expectation uses p rather than f, because of the pre-rationality condition. Knowing my prior, my updated pre-prior agrees with it about the probability of the ground events. But I still don't know your prior, so I have to use f on the RHS to "expect" over the event and your prior itself.

(1) says that, according to my pre-prior, your prior is as good as mine in expectation. The proper scoring rule says that my prior is the unique maximum for a fixed function. But I could, in principle, believe that your prior is better adapted to each world than my prior, but I'm still not certain which world we're in (or what your prior is), so I can't update my beliefs.

Given the equality, I can't want to switch priors with you in general, but I could think you have a prior that's more correlated with truth than mine in some cases and less so in others.

(2) says that, according to my pre-posterior, your prior conditioned on my info is, in expectation, as good as my prior conditioned on my info.

I like this better than (1). Evidence in the real world leads me to beliefs about the prior production mechanisms (like genes, nurture and so on). These don't seem to give a good reason for my innate beliefs to be better than anyone else's. Therefore, I believe your prior is probably as good as mine on average.

But note, I don't actually know what your prior is. It's just that I believe we probably share similar priors. The spooky action-at-a-distance is eliminated. This is just (again) a claim about consistent beliefs: if I believe that your prior got generated in a way that made it as good as mine, then I must believe it's not too divergent from mine.

1. says that, given what we now know, I think your prior is no better or worse than mine in expectation. This is about as unpalatable in the pre-Bayesian as the Bayesian case.

So, on either (1) or (2), I believe that your prior will, on average, do as well as mine. I may not be sure what your prior is, but cases where it's far better will be matched by cases where it's far worse. Even knowing that your prior performs exactly as well as mine, I might not know exactly which prior you have. I know that all the places it does worse will be matched by an equal weight of places where it does better, so I can't appeal to my prior as a good reason for us to diverge.

Discuss

### The Semantic Man

22 ноября, 2018 - 11:38
Published on Thu Nov 22 2018 08:38:21 GMT+0000 (UTC)

Irving Lee discusses the goals of General Semantics, one of the historical movements that underlies intellectual currents expressed here now on LessWrong.

I want to speak very briefly to you. There is one small notion that I should like to talk about rather briefly. It was in 1946, I remember almost the time of year. I had just taken off that Air Force uniform and had managed to persuade Alfred Korzybski to let me pose some questions to him. I had a number of things that bothered me. I had read that “blue peril” and there were paragraphs in it that made no sense even after the fifteenth reading, and I wanted the opportunity to confront him with these paragraphs. I wanted to say: “Now, Alfred, what did you mean when you said this?” And he very kindly agreed to submit to some such questioning, over a period of several afternoons, and I think Miss Kendig may remember some of them. And at one of these sessions, I said, “Now, Alfred, you have been thinking about this stuff for a very long time. Can you tell me, in a nutshell, what are you trying to do? What is the objective of all this reading and studying and talking and sweating that you go through day after day, year after year? What are you after?” And, you know, I never could call on him in those sessions without being forced to take notes. If I came without a pencil and paper, he invariably found a pad and pencil, and “take some notes” was the continuous refrain. Well, I have gone over those notes many times and in answer to that question, this is almost a verbatim account of what he said when I asked him, “Alfred, what are you trying to do, in a nutshell?”

Discuss

### Jesus Made Me Rational (An Introduction)

22 ноября, 2018 - 08:09
Published on Thu Nov 22 2018 05:09:43 GMT+0000 (UTC)

Writer's note: what follows is a descriptive narrative of my epistemology not a statement of universal fact (though some facts are contained therein).

In the beginning was Rationality, and Rationality was with God, and Rationality was God. He was in the beginning with God. All things were made through Him, and without Him nothing was made that was made. In Him was life, and the life was the light of men. And the light shines in the darkness, and the darkness did not comprehend it.

In a very new university (for all universities were new) as the 12th Century drew to a close a grand experiment was proposed: so grand that its conclusion may never be reached (though of course those proposing it made that fundamental error of optimism, believing it could be completed in their lifetimes), and the likes of which had never - indeed could never - have been attempted before.

For a little over one thousand years before someone had come into this world who changed our understanding of it forever. Instead of an irrational universe created and ruled over by fickle and oft-competing gods - where mathematics that held true in Egypt had no reason to be true in Greece - this person had said that not only was the universe created by rational laws but that he was rationality himself.

So in this (very new) university this group of men set this grand experiment in motion. If the universe was, at this person claimed, made by rationality then surely it ought to follow rational laws. And if, as this person claimed, rationality was the same yesterday, today, and tomorrow, then these rational laws must be the same no matter who tests them, and no matter where that person is doing the testing. This grand experiment would be to test the rationality of the universe with the expectation that the universe would be the consistent, would be common, and would be rational.

In the century that was to follow the grand experiment would in turn motivate men like Thomas Acquinas who definitively showed that Aristotle was wrong - and if he could be wrong about one thing, why he might be wrong about many things.

This grand experiment would be tested, time and time again over the millennia which was to follow. It would lead Nicole d'Oresme to liken the universe to a clock that had been made and set to run its own course. It would lead Rene Descartes in his quest for the laws of nature. It would cause Roger Bacon to create the scientific method to ensure the results of the experiment were valid. It would be the inspiration for the oft-misused William of Ockham to codify rational thought. It would be the foundation of Gregor Mendel's discovery of genetics.

Eventually it would cause philosopher-mathematician George Alfred Whitehead to declare in front of a crowd of unbelieving sceptics that faith in science was a derivative of medieval theology. And so this great experiment would end up impacting the life of a young man who had been raised a Christian and hated that he was incapable of disbelieving in Jesus no matter how much he tried.

This young man had already (though perhaps unknowingly) decided to dedicate his life to being as rational as he could be - and with all the presumption and thoughtless energy of youth proceeded to make as many unthinking, irrational decisions that his brain, drunk on self-importance as it was, was capable of making. That was until he started studying mathematics.

here ends the story.

Mathematics has changed my life. It is the reason that I have pursued rationality. And it is the reason that I no longer hate that I believe in the resurrection of Jesus, but rather test the implications of it. HPMoR is the reason I have ended up at this particular site, but mathematics is the reason I read HPMoR in the first place.

Thank you for having me here. Sorry my introduction was so long I didn't know quite how to write what I wanted to write and I am not a good enough writer to do a series of posts on it. I look forward to becoming more rational by being here - even as I stand fully aware that my unshakeable belief is the definition of irrationality - it is, however, evidence of the truth of a prediction made nearly two thousand years before my birth. I would love you to ask me about that, but understand this website is not about religion.

Discuss

### Iteration Fixed Point Exercises

22 ноября, 2018 - 03:35
Published on Thu Nov 22 2018 00:35:09 GMT+0000 (UTC)

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

This is the second of three sets of fixed point exercises. The first post in this sequence is here, giving context.

Note: Questions 1-5 form a coherent sequence and questions 6-10 form a separate coherent sequence. You can jump between the sequences.

1. Let (X,d) be a complete metric space. A function f:X→X is called a contraction if there exists a q<1 such that for all x,y∈X, d(f(x),f(y))≤q⋅d(x,y). Show that if f is a contraction, then for any x, the sequence {xn=fn(x0)} converges. Show further that it converges exponentially quickly (i.e. the distance between the nth term and the limit point is bounded above by c⋅an for some a<1)

2. (Banach contraction mapping theorem) Show that if (X,d) is a complete metric space and f is a contraction, then f has a unique fixed point.

3. If we only require that d(f(x),f(y))<d(x,y) for all x≠y, then we say f is a weak contraction. Find a complete metric space (X,d) and a weak contraction f:X→X with no fixed points.

4. A function f:Rn→R is convex if f(tx+(1−t)y)≤tf(x)+(1−t)f(y), for all t∈[0,1] and x,y∈Rn. A function f is strongly convex if you can subtract a positive parabaloid from it and it is still convex. (i.e. f is strongly convex if x↦f(x)−ε||x||2 is convex for some 0">ε>0.) Let f be a strongly convex smooth function from Rn to R, and suppose that the magnitude of the second derivative ∥∇2f∥ is bounded. Show that there exists an 0">ε>0 such that the function g:Rn→Rn given by x↦x−ε(∇f)(x) is a contraction. Conclude that gradient descent with a sufficiently small constant step size converges exponentially quickly on a strongly convex smooth function.

5. A finite stationary Markov chain is a finite set S of states, along with probabilistic rule A:S→ΔS for transitioning between the states, where ΔS represents the space of probability distributions on S. Note that the transition rule has no memory, and depends only on the previous state. If for any pair of states s,t∈ΔS, the probability of passing from s to t in one step is positive, then the Markov chain (S,A) is ergodic. Given an ergodic finite stationary Markov chain, use the Banach contraction mapping theorem to show that there is a unique distribution over states which is fixed under application of transition rule. Show that, starting from any state s, the limit distribution limn→∞An(s) exists and is equal to the stationary distribution.

6. A function f from a partially ordered set to another partially ordered set is called monotonic if x≤y implies that f(x)≤f(y). Given a partially ordered set (P,≤) with finitely many elements, and a monotonic function from P to itself, with the property that x≤f(x) for all x, show that if f(x)≥x or f(x)≤x, then fn(x) is a fixed point of f for all |P|">n>|P|.

7. A complete lattice (L,≤) is a partially ordered set in which each subset of elements has a least upper bound and greatest lower bound. Under the same hypotheses as the previous exercise, extend the notion of fn(x) for natural numbers n to fα(x) for ordinals α, and show that fα(x) is a fixed point of f for all x∈X with f(x)≤x or f(x)≤x and all |L|">|α|>|L| (|A|≤|B| means there is an injection from A to B, and |B|">|A|>|B| means there is no such injection).

8. (Knaster-Tarski fixed point theorem) Show that the set of fixed points of a monotonic function on a complete lattice themselves form a complete lattice. (Note that since the empty set is always a subset, a complete lattice must be nonempty.)

9. Show that for any set A, (P(A),⊆) forms a complete lattice, and that any injective function from A to B defines a monotonic function from (P(A),⊆) to (P(B),⊆). Given injections f:A→B and g:B→A, construct a subset A′ of A and a subset of B′ of B such that B′=f(A′) and A−A′=g(B−B′).

10. (Cantor–Schröder–Bernstein theorem) Given sets A and B, show that if |A|≤|B| and |A|≥|B|, then |A|=|B|. (|A|≤|B| means there is an injection from A to B, and |A|=|B| means there is a bijection)

Please use the spoilers feature - the symbol '>' followed by '!' followed by space -in your comments to hide all solutions, partial solutions, and other discussions of the math. The comments will be moderated strictly to hide spoilers!

I recommend putting all the object level points in spoilers and including metadata outside of the spoilers, like so: "I think I've solved problem #5, here's my solution <spoilers>" or "I'd like help with problem #3, here's what I understand <spoilers>" so that people can choose what to read.

Tomorrow's AI Alignment Forum Sequences post will be "Approval-directed agents: overview" by Paul Christiano in the sequence Iterated Amplification.

The next post in this sequence will be released on Saturday 24th November, and will be 'Fixed Point Discussion'.

Discuss

### EA Funds: Long-Term Future fund is open to applications until November 24th (this Saturday)

21 ноября, 2018 - 06:39
Published on Wed Nov 21 2018 03:39:15 GMT+0000 (UTC)

I am one of the Fund Managers for the Long-Term Future fund and we are doing a very short round of applications to help donors get a sense of what projects we are likely to fund before giving season:

Have an x-risk or far-future focused project you’ve been thinking about, but don’t have funding for? The CEA Long-Term Future Fund is accepting applications until this Saturday (11/24) at this form: https://docs.google.com/forms/d/e/1FAIpQLSf46ZTOIlv6puMxkEGm6G1FADe5w5fCO3ro-RK6xFJWt7SfaQ/viewformThis round of applications has a much shorter deadline than usual, to give the Fund’s new management team a chance to make some grants to help donors get a sense of what kinds of projects we are likely to fund before the end of the giving season in December. Future rounds will have longer timelines, and the Fund may invite applicants from this round to re-submit in future rounds. In other words: if you can’t hit this deadline, no need to worry - we know it’s very tight! Just keep an eye out for future rounds.

Feel free to ask any questions in the comments. I will be happy to answer any that I know the answer to (and will ping other fund managers if they seem better suited to answer a question).

Some more context from the form:

For this round and this form, we are particularly interested in small teams and individuals that are trying to get projects off the ground, or that need less money than existing grant-making institutions are likely to give out (i.e. less than ~\$40k). Here are a few examples of project types that we're open to funding an individual or group for (note that this list is not exhaustive):+ To spend a few months (perhaps during the summer) to research an open problem in AI alignment or AI strategy and produce a few blog posts or videos on their ideas
+ To spend a few months building a web app with the potential to solve an operations bottleneck at x-risk organisations
+ To spend a few months up-skilling in a field to prepare for future work (e.g. microeconomics, functional programming, etc).
+ To spend a year testing an idea that has the potential to be built into an org.You can find more details on the kind of project we are likely to fund on the fund page: https://app.effectivealtruism.org/funds/far-future

Discuss

### Incorrect hypotheses point to correct observations

21 ноября, 2018 - 00:10
Published on Tue Nov 20 2018 21:10:02 GMT+0000 (UTC)

1. The Consciousness Researcher and Out-Of-Body Experiences

In his book Consciousness and the Brain, cognitive neuroscientist Stansilas Dehaene writes about scientifically investigating people’s reports of their out-of-body experiences:

… the Swiss neurologist Olaf Blanke[ did a] beautiful series of experiments on out-of-body experiences. Surgery patients occasionally report leaving their bodies during anesthesia. They describe an irrepressible feeling of hovering at the ceiling and even looking down at their inert body from up there. [...]What kind of brain representation, Blanke asked, underlies our adoption of a specific point of view on the external world? How does the brain assess the body’s location? After investigating many neurological and surgery patients, Blanke discovered that a cortical region in the right temporoparietal junction, when impaired or electrically perturbed, repeatedly caused a sensation of out-of-body transportation. This region is situated in a high-level zone where multiple signals converge: those arising from vision; from the somatosensory and kinesthetic systems (our brain’s map of bodily touch, muscular, and action signals); and from the vestibular system (the biological inertial platform, located in our inner ear, which monitors our head movements). By piecing together these various clues, the brain generates an integrated representation of the body’s location relative to its environment. However, this process can go awry if the signals disagree or become ambiguous as a result of brain damage. Out-of-body flight “really” happens, then—it is a real physical event, but only in the patient’s brain and, as a result, in his subjective experience. The out-of-body state is, by and large, an exacerbated form of the dizziness that we all experience when our vision disagrees with our vestibular system, as on a rocking boat.Blanke went on to show that any human can leave her body: he created just the right amount of stimulation, via synchronized but delocalized visual and touch signals, to elicit an out-of-body experience in the normal brain. Using a clever robot, he even managed to re-create the illusion in a magnetic resonance imager. And while the scanned person experienced the illusion, her brain lit up in the temporoparietal junction—very close to where the patient’s lesions were located.We still do not know exactly how this region works to generate a feeling of self-location. Still, the amazing story of how the out-of-body state moved from parapsychological curiosity to mainstream neuroscience gives a message of hope. Even outlandish subjective phenomena can be traced back to their neural origins. The key is to treat such introspections with just the right amount of seriousness. They do not give direct insights into our brain’s inner mechanisms; rather, they constitute the raw material on which a solid science of consciousness can be properly founded.

The naive hypotheses that out-of-body experiences represented the spirit genuinely leaving the body, were incorrect. But they were still pointing to a real observation, namely that there are conditions which create a subjective experience of leaving the body. That observation could then be investigated through scientific means.

2. The Artist and the Criticism

In art circles, there’s a common piece of advice that goes along the lines of:

When people say that they don’t like something about your work, you should treat that as valid information.

When people say why they don’t like it or what you could do to fix it, you should treat that with some skepticism.

Outside the art context, if someone tells you that they're pissed off with you as a person (or that you make them feel good), then that's likely to be true; but the reason that they give you may not be the true reason.

People have poor introspective access to the reasons why they like or dislike something; when they are asked for an explanation, they often literally fabricate their reasons. Their explanation is likely false, even though it’s still pointing to something in the work having made them dislike it.

3. The Traditionalist and the Anthropologist

The Scholar’s Stage blog post “Tradition is Smarter Than You Are“, quotes Joseph Henrich’s The Secret of Our Success which reports that many folk traditions, such as not eating particular fish during pregnancy, are adaptive: not eating that fish during pregnancy is good for the child, mother, or both. But the people in question often do not know why they follow that tradition:

We looked for a shared underlying mental model of why one would not eat these marine species during pregnancy or breastfeeding—a causal model or set of reasoned principles. Unlike the highly consistent answers on what not to eat and when, women’s responses to our why questions were all over the map. Many women simply said they did not know and clearly thought it was an odd question. Others said it was “custom.” Some did suggest that the consumption of at least some of the species might result in harmful effects to the fetus, but what precisely would happen to the fetus varied greatly, though a nontrivial segment of the women explained that babies would be born with rough skin if sharks were eaten and smelly joints if morays were eaten. Unlike most of our interview questions on this topic, the answers here had the flavor of post-hoc rationalization: “Since I’m being asked for a reason, there must be a reason, so I’ll think one up now.” This is extremely common in ethnographic fieldwork, and I’ve personally experienced it in the Peruvian Amazon with the Matsigenka and with the Mapuche in southern Chile.

The people’s hypotheses for why they do something is wrong. But their behavior is still pointing to the fish in question being bad to eat during pregnancy.

4. The Martial Artist and the Ki

In Types of Knowing, Valentine writes:

Another example is the “unbendable arm” in martial arts. I learned this as a matter of “extending ki“: if you let magical life-energy blast out your fingertips, then your arm becomes hard to bend much like it’s hard to bend a hose with water blasting out of it. This is obviously not what’s really happening, but thinking this way often gets people to be able to do it after a few cumulative hours of practice.But you know what helps better?Knowing the physics.Turns out that the unbendable arm is a leverage trick: if you treat the upward pressure on the wrist as a fulcrum and you push your hand down (or rather, raise your elbow a bit), you can redirect that force and the force that’s downward on your elbow into each other. Then you don’t need to be strong relative to how hard your partner is pushing on your elbow; you just need to be strong enough to redirect the forces into each other.Knowing this, I can teach someone to pretty reliably do the unbendable arm in under ten minutes. No mystical philosophy needed.

The explanation about magical life energy was false, but it was still pointing to a useful trick that could be learned and put to good use.

Observations and the hypotheses developed to explain them often get wrapped up, causing us to evaluate both as a whole. In some cases, we only hear the hypothesis rather than the observation which prompted it. But people usually don’t pull their hypotheses out of entirely thin air; even an incorrect hypothesis is usually entangled with some correct observations. If we can isolate the observation that prompted the hypothesis, then we can treat the hypothesis as a burdensome detail to be evaluated on its own merits, separate from the original observation. At the very least, the existence of an incorrect but common hypothesis suggests to us that there’s something going on that needs to be explained.

Discuss

### Preschool: Much Less Than You Wanted To Know

20 ноября, 2018 - 22:30
Published on Tue Nov 20 2018 19:30:01 GMT+0000 (UTC)

Response to (Scott Alexander): Preschool: Much More Than You Wanted to Know

I see Scott’s analysis of preschool as burying the lead.

I see his analysis as assuming there exists a black box called ‘preschool’ one can choose whether to send children to. Then, we have to decide whether or not this thing has value. Since studies are the way one figures out if things are true, we look at a wide variety of studies, slog through their problems and often seemingly contradictory results, and see if anything good emerges.

The result of that analysis, to me, was that it was possible preschool had positive long term effects on things like high school graduation rates. It was also possible that it did not have such an effect if you properly controlled for things, or that the active ingredient was effectively mostly ‘give poor families time and money’ via a place to park their kids, rather than any benefits from preschool itself. Scott puts it at 60% that preschool has a small positive effect, whether or not it is worth it and whether or not it’s mainly giving families money, and 40% it is useless even though it is giving them money. Which would kind of be an epic fail.

There was one clear consistent result, however: Preschool gives an academic boost, then that academic boost fades away within a few years. Everyone agrees this occurs.

Let us think about what this means.

This means that preschool is (presumably) spending substantial resources teaching children ‘academics,’ and even as measured by future achievement in those same academics, this has zero long term effect. Zippo. Zilch. Not a thing.

Maybe you should stop doing that, then?

This seems to be saying something important – that when you force four year olds to learn to read or add, that you don’t achieve any permanent benefits to their math or reading ability, which strongly implies you’re not helping them in other ways either. That’s not a result about preschool. That’s a result about developing brains and how they learn, and suggesting we should focus on other skills and letting them be kids. Spending early time you will never get back on ‘academic’ skills is a waste, presumably because it’s so horribly inefficient and we’ll end up re-teaching the same stuff anyway.

This seems unlikely to be something that stops happening on a birthday. If there is actual zero effect at four years old, what does that imply about doing it at five years old? What about six? How much of our early child educational system is doing it all wrong?

Going back to preschool, we do not have a black box. We have adults in a room with children. They can do a variety of things, and different locations indeed do choose different buckets of activity. One would hope that learning one of your main categories of activity isn’t accomplishing anything, would at least shift advocates to support different types of activity. It seems kind of crazy to instead find different outcomes and then advocate for doing the same thing anyway. If time was spent learning in non-academic ways, and gaining experience socializing in various ways, that would at least be a non-falsified theory of something that might help.

Discuss

### New safety research agenda: scalable agent alignment via reward modeling

20 ноября, 2018 - 20:29
https://cdn-images-1.medium.com/max/1200/0*gyLZOrKtnKJhACPA

### Prosaic AI alignment

20 ноября, 2018 - 16:56
Published on Tue Nov 20 2018 13:56:39 GMT+0000 (UTC)

(Related: a possible stance for AI control.)

It’s conceivable that we will build “prosaic” AGI, which doesn’t reveal any fundamentally new ideas about the nature of intelligence or turn up any “unknown unknowns.” I think we wouldn’t know how to align such an AGI; moreover, in the process of building it, we wouldn’t necessarily learn anything that would make the alignment problem more approachable. So I think that understanding this case is a natural priority for research on AI alignment.

In particular, I don’t think it is reasonable to say “we’ll know how to cross that bridge when we come to it,” or “it’s impossible to do meaningful work without knowing more about what powerful AI will look like.” If you think that prosaic AGI is plausible, then we may already know what the bridge will look like when we get to it: if we can’t do meaningful work now, then we have a problem.

1. Prosaic AGI

It now seems possible that we could build “prosaic” AGI, which can replicate human behavior but doesn’t involve qualitatively new ideas about “how intelligence works:”

• It’s plausible that a large neural network can replicate “fast” human cognition, and that by coupling it to simple computational mechanisms — short and long-term memory, attention, etc. — we could obtain a human-level computational architecture.
• It’s plausible that a variant of RL can train this architecture to actually implement human-level cognition. This would likely involve some combination of ingredients like model-based RL, imitation learning, or hierarchical RL. There are a whole bunch of ideas currently on the table and being explored; if you can’t imagine any of these ideas working out, then I feel that’s a failure of imagination (unless you see something I don’t).

We will certainly learn something by developing prosaic AGI. The very fact that there were no qualitatively new ideas is itself surprising. And beyond that, we’ll get a few more bits of information about which particular approach works, fill in a whole bunch of extra details about how to design and train powerful models, and actually get some experimental data.

But none of these developments seem to fundamentally change the alignment problem, and existing approaches to AI alignment are not bottlenecked on this kind of information. Actually having the AI in front of us may let us work several times more efficiently, but it’s not going to move us from “we have no idea how to proceed” to “now we get it.”

2. Our current state2a. The concern

If we build prosaic superhuman AGI, it seems most likely that it will be trained by reinforcement learning (extending other frameworks to superhuman performance would require new ideas). It’s easy to imagine a prosaic RL system learning to play games with superhuman levels of competence and flexibility. But we don’t have any shovel-ready approach to training an RL system to autonomously pursue our values.

To illustrate how this can go wrong, imagine using RL to implement a decentralized autonomous organization (DAO) which maximizes its profit. If we had very powerful RL systems, such a DAO might be able to outcompete human organizations at a wide range of tasks — producing and selling cheaper widgets, but also influencing government policy, extorting/manipulating other actors, and so on.

The shareholders of such a DAO may be able to capture the value it creates as long as they are able to retain effective control over its computing hardware / reward signal. Similarly, as long as such DAOs are weak enough to be effectively governed by existing laws and institutions, they are likely to benefit humanity even if they reinvest all of their profits.

But as AI improves, these DAOs would become much more powerful than their human owners or law enforcement. And we have no ready way to use a prosaic AGI to actually represent the shareholder’s interests, or to govern a world dominated by superhuman DAOs. In general, we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal.

I feel pessimistic about human prospects in such a world.

2b. Behaving cautiously

We could respond by not letting powerful RL systems act autonomously, or handicapping them enough that we can maintain effective control.

This leads us to a potentially precarious situation: everyone agrees to deploy handicapped systems over which they can maintain meaningful control. But any actor can gain an economic advantage by skimping on such an agreement, and some people would prefer a world dominated by RL agents to one dominated by humans. So there are incentives for defection; if RL systems are very powerful, then these incentives may be large, and even a small number of defectors may be able to rapidly overtake the honest majority which uses handicapped AI systems.

This makes AI a “destructive technology” with similar characteristics to e.g. nuclear weapons, a situation I described in my last post. Over the long run I think we will need to reliably cope with this kind of situation, but I don’t think we are there yet. I think we could probably handle this situation, but there would definitely be a significant risk of trouble.

The situation is especially risky if AI progress is surprisingly rapid, if the alignment problem proves to be surprisingly difficult, if the political situation is tense or dysfunctional, if other things are going wrong at the same time, if AI development is fragmented, if there is a large “hardware overhang,” and so on.

I think that there are relatively few plausible ways that humanity could permanently and irreversibly disfigure its legacy. So I am extremely unhappy with “a significant risk of trouble.”

2c. The current state of AI alignment

We know many approaches to alignment, it’s just that none of these are at the stage of something you could actually implement (“shovel-ready”) — instead they are at the stage of research projects with an unpredictable and potentially long timetable.

For concreteness, consider two intuitively appealing approaches to AI alignment:

• IRL: AI systems could infer human preferences from human behavior, and then try to satisfy those preferences.
• Natural language: AI systems could have an understanding of natural language, and then execute instructions described in natural language.

Neither of these approaches is shovel ready, in the sense that we have no idea how to actually write code that implements either of them — you would need to have some good ideas before you even knew what experiments to run.

We might hope that this situation will change automatically as we build more sophisticated AI systems. But I don’t think that’s necessarily the case. “Prosaic AGI” is at the point where we can actually write down some code and say “maybe this would do superhuman RL, if you ran it with enough computing power and you fiddled with the knobs a whole bunch.” But these alignment proposals are nowhere near that point, and I don’t see any “known unknowns” that would let us quickly close the gap. (By construction, prosaic AGI doesn’t involve unknown unknowns.)

So if we found ourselves with prosaic AGI tomorrow, we’d be in the situation described in the last section, for as long as it took us to complete one of these research agendas (or to develop and then execute a new one). Like I said, I think this would probably be OK, but it opens up an unreasonably high chance of really bad outcomes.

3. Priorities

I think that prosaic AGI should probably be the largest focus of current research on alignment. In this section I’ll argue for that claim.

3a. Easy to start now

Prosaic AI alignment is especially interesting because the problem is nearly as tractable today as it would be if prosaic AGI were actually available.

Existing alignment proposals have only weak dependencies on most of the details we would learn while building prosaic AGI (e.g. model architectures, optimization strategies, variance reduction tricks, auxiliary objectives…). As a result, ignorance about those details isn’t a huge problem for alignment work. We may eventually reach the point where those details are critically important, but we aren’t there yet.

For now, finding any plausible approach to alignment, that works for anysetting of unknown details, would be a big accomplishment. With such an approach in hand we could start to ask how sensitive it is to the unknown details, but it seems premature to be pessimistic before even taking that first step.

Note that even in the extreme case where our approach to AI alignment would be completely different for different values of some unknown details, the speedup from knowing them in advance is at most 1/(probability of most likely possibility). The most plausibly critical details are large-scale architectural decisions, for which there is a much smaller space of possibilities.

3b. Importance

If we do develop prosaic AGI without learning a lot more about AI alignment, then I think it would be bad news (see section 2). Addressing alignment earlier, or having a clear understanding of why it intractable, would make the situation a lot better.

I think the main way that an understanding of alignment could fail to be valuable is if it turns out that alignment is very easy. But in that case, we should also be able quickly to solve it now (or at least have some candidatesolution), and then we can move on to other things. So I don’t think “alignment is very easy” is a possibility that should keep us up at night.

Alignment for prosaic AGI in particular will be less important if we don’t actually develop prosaic AGI, but I think that this is a very big problem:

First, I think there is a reasonable chance (>10%) that we will build prosaic AGI. At this point there don’t seem to be convincing arguments against the possibility, and one of the lessons of the last 30 years is that learning algorithms and lots of computation/data can do surprisingly well compared to approaches that require understanding “how to think.”

Indeed, I think that if you had forced someone in 1990 to write down a concrete way that an AGI might work, they could easily have put 10–20% of their mass on the same cluster of possibilities that I’m currently calling “prosaic AGI.” And if you’d ask them to guess what prosaic AGI would look like, I think that they could have given more like 20–40%.

Second, even if we don’t develop prosaic AGI, I think it is very likely that there will be important similarities between alignment for prosaic AGI and alignment for whatever kind of AGI we actually build. For example, whatever AGI we actually build is likely to exploit many of the same techniques that a prosaic AGI would, and to the extent that those techniques pose challenges for alignment we will probably have to deal with them one way or another.

I think that working with a concrete model that we have available now is one of the best ways to make progress on alignment, even in cases where we aresure that there will be at least one qualitative change in how we think about AI.

Third, I think that research on alignment is significantly more important in cases where powerful AI is developed relatively soon. And in these cases, the probability of prosaic AGI seems to be much higher. If prosaic AGI is possible, then I think there is a significant chance of building broadly human level AGI over the next 10–20 years. I’d guess that hours of work on alignment are perhaps 10x more important if AI is developed in the next 15 years than if it is developed later, just based on simple heuristics based on diminishing marginal returns.

3c. Feasibility

Some researchers (especially at MIRI) believe that aligning prosaic AGI is probably infeasible — that the most likely approach to building an aligned AI is to understand intelligence in a much deeper way than we currently do, and that if we manage to build AGI before achieving such an understanding then we are in deep trouble.

I think that this shouldn’t make us much less enthusiastic about prosaic AI alignment:

First, I don’t think it’s reasonable to have a confident position on this question. Claims of the form “problem X can’t be solved” are really hard to get right, because you are fighting against the universal quantifier of all possible ways that someone could solve this problem. (This is very similar to the difficulty of saying “system X can’t be compromised.”) To the extent that there is any argument that aligning prosaic AGI is infeasible, that argument is nowhere near the level of rigor which would be compelling.

This implies on the one hand that it would be unwise to assign a high probability to the infeasibility of this problem. It implies on the other hand that even if the problem is infeasible, then we might expect to develop a substantially more complete understanding of why exactly it is so difficult.

Second, if this problem is actually infeasible, that is an extremely important fact with direct consequences for what we ought to do. It implies we will be unable to quickly play “catch up” on alignment after developing prosaic AGI, and so we would need to rely on coordination to prevent catastrophe. As a result:

• We should start preparing for such coordination immediately.
• It would be worthwhile for the AI community to substantially change its research direction in order to avoid catastrophe, even though this would involve large social costs.

I think we don’t yet have very strong evidence for the intractability of this problem.

If we could get very strong evidence, I expect it would have a significant effect on changing researchers’ priorities and on the research community’s attitude towards AI development. Realistically, it’s probably also a precondition for getting AI researchers to make a serious move towards an alternative approach to AI development, or to start talking seriously about the kind of coordination that would be needed to cope with hard-to-align AI.

Conclusion

I’ve claimed that prosaic AGI is conceivable, that it is a very appealing target for research on AI alignment, and that this gives us more reason to be enthusiastic for the overall tractability of alignment. For now, these arguments motivate me to focus on prosaic AGI.

This post was originally published here on 19th Nov 2016.

The next post in this sequence will be "Approval-directed agents: overview" by Paul Christiano, and will release on Thursday 22nd November.

Tomorrow's AI Alignment Forum sequences post will be "Iterated Fixed Point Exercises" by Scott Garrabrant and Sam Eisenstat, in the sequence "Fixed Points".

Discuss

### [Insert clever intro here]

20 ноября, 2018 - 13:47
Published on Tue Nov 20 2018 03:26:32 GMT+0000 (UTC)

Hello, everyone. I'm not entirely sure where to start, or exactly how to say it, but here goes.

I only stumbled across Less Wrong this year, after reading HPMOR (I'm sure you get this all the time). Since then, I have gone from an evangelical Christian to a passionate rationalist with an insatiable hunger for challenging what I think I know. It helps that I already had a rationalist foundation, having originally started my adulthood as an atheist.

My training is in IT, but nowhere near enough to help with AI research. I am also quite intelligent, but not enough to open my own rationality dojo. Instead, I am channeling my extroverted nature to help spread the influence of rationality as far and passionately as I can. I am currently spearheading the Kansas City LW/SSC group, and plan to also begin practicing Street Epistemology.

At the risk of sounding like I am trying to garner empathy, I feel rather intimidated by the conversations that are had here; so my posting and commenting will be limited if not entirely non-existent. Please don't ever be discouraged by my silence. You may count yourselves among my friends.

Discuss

### Alignment Newsletter #33

19 ноября, 2018 - 20:20
Published on Mon Nov 19 2018 17:20:03 GMT+0000 (UTC)

Learning from both demos and preferences, and building a well-motivated AI instead of an AI with the right utility function

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through the database of all summaries.

One correction to last week's newsletter: the title Is Robustness at the Cost of Accuracy should have been Is Robustness the Cost of Accuracy.

Highlights

Reward learning from human preferences and demonstrations in Atari (Borja Ibarz et al): We have had lots of work on learning from preferences, demonstrations, proxy rewards, natural language, rankings etc. However, most such work focuses on one of these modes of learning, sometimes combined with an explicit reward function. This work learns to play Atari games using both preference and demonstration information. They start out with a set of expert demonstrations which are used to initialize a policy using behavioral cloning. They also use the demonstrations to train a reward model using the DQfD algorithm. They then continue training the reward and policy simultaneously, where the policy is trained on rewards from the reward model, while the reward model is trained using preference information (collected and used in the same way as Deep RL from Human Preferences) and the expert demonstrations. They then present a lot of experimental results. The main thing I got out of the experiments is that when demonstrations are good (near optimal), they convey a lot of information about how to perform the task, leading to high reward, but when they are not good, they will actively hurt performance, since the algorithm assumes that the demonstrations are high quality and the demonstrations "override" the more accurate information collected via preferences. They also show results on efficiency, the quality of the reward model, and the reward hacking that can occur if you don't continue training the reward model alongside the policy.

Rohin's opinion: I'm excited to see work that combines information from multiple sources! In general with multiple sources you have the problem of figuring out what to do when the sources of information conflict, and this is no exception. Their approach tends to prioritize demonstrations over preferences when the two conflict, and so in cases where the preferences are better (as in Enduro) their approach performs poorly. I'm somewhat surprised that they prioritize demos over preferences, since it seems humans would be more reliable at providing preferences than demos, but perhaps they needed to give demos more influence over the policy in order to have the policy learn reasonably quickly. I'd be interested in seeing work that tries to use the demos as much as possible, but detect when conflicts happen and prioritize the preferences in that situation -- my guess is that this would let you get good performance across most Atari games.

Technical AI alignmentEmbedded agency sequence

Embedded Agency (full-text version) (Scott Garrabrant and Abram Demski): This is the text version of all of the previous posts in the sequence.

Iterated amplification sequence

The Steering Problem (Paul Christiano): The steering problem refers to the problem of writing a program that uses black-box human-level cognitive abilities to be as useful as a well-motivated human Hugh (that is, a human who is "trying" to be helpful). This is a conceptual problem -- we don't have black-box access to human-level cognitive abilities yet. However, we can build suitable formalizations and solve the steering problem within those formalizations, from which we can learn generalizable insights that we can apply to the problem we will actually face once we have strong AI capabilities. For example, we could formalize "human-level cognitive abilities" as Hugh-level performance on question-answering (yes-no questions in natural language), online learning (given a sequence of labeled data points, predict the label of the next data point), or embodied reinforcement learning. A program P is more useful than Hugh for X if, for every project using a simulation of Hugh to accomplish X, we can efficiently transform it into a new project which uses P to accomplish X.

Rohin's opinion: This is an interesting perspective on the AI safety problem. I really like the ethos of this post, where there isn't a huge opposition between AI capabilities and AI safety, but instead we are simply trying to figure out how to use the (helpful!) capabilities developed by AI researchers to do useful things.

If I think about this from the perspective of reducing existential risk, it seems like you also need to make the argument that AI systems are unlikely to pose an existential threat before they are human-level (a claim I mostly agree with), or that the solutions will generalize to sub-human-level AI systems.

Clarifying "AI Alignment" (Paul Christiano): I previously summarized this in AN #2, but I'll consider it in more detail now. As Paul uses the term, "AI alignment" refers only to the problem of figuring out how to build an AI that is trying to do what humans want. In particular, an AI can be aligned but still make mistakes because of incompetence. This is not a formal definition, since we don't have a good way of talking about the "motivation" of an AI system, or about "what humans want", but Paul expects that it will correspond to some precise notion after we make more progress.

Rohin's opinion: Ultimately, our goal is to build AI systems that reliably do what we want them to do. One way of decomposing this is first to define the behavior that we want from an AI system, and then to figure out how to obtain that behavior, which we might call the definition-optimization decomposition. Ambitious value learning aims to solve the definition subproblem. I interpret this post as proposing a different decomposition of the overall problem. One subproblem is how to build an AI system that is trying to do what we want, and the second subproblem is how to make the AI competent enough that it actually does what we want. I like this motivation-competence decomposition for a few reasons, which I've written a long comment about that I strongly encourage you to read. The summary of that comment is: motivation-competence isolates the urgent part in a single subproblem (motivation), humans are an existence proof that the motivation subproblem can be solved, it is possible to apply the motivation framework to systems without lower capabilities, the safety guarantees degrade slowly and smoothly, the definition-optimization decomposition as exemplified by expected utility maximizers has generated primarily negative results, and motivation-competence allows for interaction between the AI system and humans. The major con is that the motivation-competence decomposition is informal, imprecise, and may be intractable to work on.

An unaligned benchmark (Paul Christiano): I previously summarized this in Recon #5, but I'll consider it in more detail now. The post argues that we could get a very powerful AI system using model-based RL with MCTS. Specifically, we learn a generative model of dynamics (sample a sequence of observations given actions), a reward model, and a policy. The policy is trained using MCTS, which uses the dynamics model and reward model to create and score rollouts. The dynamics model is trained using the actual observations and actions from the environment. The reward is trained using preferences or rankings (think something like Deep RL from Human Preferences). This is a system we could program now, and with sufficiently powerful neural nets, it could outperform humans.

However, this system would not be aligned. There could be specification failures: the AI system would be optimizing for making humans think that good outcomes are happening, which may or may not happen by actually having good outcomes. (There are a few arguments suggesting that this is likely to happen.) There could also be robustness failures: as the AI exerts more control over the environment, there is a distributional shift. This may lead to the MCTS finding previously unexplored states where the reward model accidentally assigns high reward, even though it would be a bad outcome, causing a failure. This may push the environment even more out of distribution, triggering other AI systems to fail as well.

Paul uses this and other potential AI algorithms as benchmarks to beat -- we need to build aligned AI algorithms that achieve similar results as these benchmarks. The further we are from hitting the same metrics, the larger the incentive to use the unaligned AI algorithm.

Iterated amplification could potentially solve the issues with this algorithm. The key idea is to always be able to cash out the learned dynamics and reward models as the result of (a large number of) human decisions. In addition, the models need to be made robust to worst case inputs, possibly by using these techniques. In order to make this work, we need to make progress on robustness, amplification, and an understanding of what bad behavior is (so that we can argue that it is easy to avoid, and iterated amplification does avoid it).

Rohin's opinion: I often think that the hard part of AI alignment is actually the strategic side of it -- even if we figure out how to build an aligned AI system, it doesn't help us unless the actors who actually build powerful AI systems use our proposal. From that perspective, it's very important for any aligned systems we build to be competitive with unaligned ones, and so keeping these sorts of benchmarks in mind seems like a really good idea. This particular benchmark seems good -- it's essentially the AlphaGo algorithm, except with learned dynamics (since we don't know the dynamics of the real world) and rewards (since we want to be able to specify arbitrary tasks), which seems like a good contender for "powerful AI system".

Fixed point sequence

Fixed Point Exercises (Scott Garrabrant): Scott's advice to people who want to learn math in order to work on agent foundations is to learn all of the fixed-point theorems across the different areas of math. This sequence will present a series of exercises designed to teach fixed-point theorems, and will then talk about core ideas in the theorems and how the theorems relate to alignment research.

Rohin's opinion: I'm not an expert on agent foundations, so I don't have an opinion worth saying here. I'm not going to cover the posts with exercises in the newsletter -- visit the Alignment Forum for that. I probably will cover the posts about how the theorems relate to agent foundations research.

Agent foundations

Dimensional regret without resets (Vadim Kosoy)

Learning human intent

Reward learning from human preferences and demonstrations in Atari (Borja Ibarz et al): Summarized in the highlights!

Acknowledging Human Preference Types to Support Value Learning (Nandi, Sabrina, and Erin): Humans often have multiple "types" of preferences, which any value learning algorithm will need to handle. This post concentrates on one particular framework -- liking, wanting and approving. Liking corresponds to the experience of pleasure, wanting corresponds to the motivation that causes you to take action, and approving corresponds to your conscious evaluation of how good the particular action is. These correspond to different data sources, such as facial expressions, demonstrations, and rankings respectively. Now suppose we extract three different reward functions and need to use them to choose actions -- how should we aggregate the reward functions? They choose some desiderata on the aggregation mechanism, inspired by social choice theory, and develop a few aggregation rules that meet some of the desiderata.

Rohin's opinion: I'm excited to see work on dealing with conflicting preference information, particularly from multiple data sources. To my knowledge, there isn't any work on this -- while there is work on multimodal input, usually those inputs don't conflict, whereas this post explicitly has several examples of conflicting preferences, which seems like an important problem to solve. However, I would aim for a solution that is less fixed (i.e. not one specific aggregation rule), for example by an active approach that presents the conflict to the human and asks how it should be resolved, and learning an aggregation rule based on that. I'd be surprised if we ended up using a particular mathematical equation presented here as an aggregation mechanism -- I'm much more interested in what problems arise when we try to aggregate things, what criteria we might want to satisfy, etc.

InterpretabilityVerification

Evaluating Robustness of Neural Networks with Mixed Integer Programming (Anonymous): I've only read the abstract so far, but this paper claims to find the exact adversarial accuracy of an MNIST classifier within an L infinity norm ball of radius 0.1, which would be a big step forward in the state of the art for verification.

On a Formal Model of Safe and Scalable Self-driving Cars (Shai Shalev-Shwartz et al)

Robustness

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness (Anonymous) (summarized by Dan H): This paper empirically demonstrates the outsized influence of textures in classification. To address this, they apply style transfer to ImageNet images and train with this dataset. Although training networks on a specific corruption tends to provide robustness only to that specific corruption, stylized ImageNet images supposedly lead to generalization to new corruption types such as uniform noise and high-pass filters (but not blurs).

AI strategy and policy

AI development incentive gradients are not uniformly terrible (rk): This post considers a model of AI development somewhat similar to the one in Racing to the precipice paper. It notes that under this model, assuming perfect information, the utility curves for each player are discontinuous. Specifically, the models predict deterministically that the player that spent the most on something (typically AI capabilities) is the one that "wins" the race (i.e. builds AGI), and so there is a discontinuity at the point where the players are spending equal amounts of money. This results in players fighting as hard as possible to be on the right side of the discontinuity, which suggests that they will skimp on safety. However, in practice, there will be some uncertainty about which player wins, even if you know exactly how much each is spending, and this removes the discontinuity. The resulting model predicts more investment in safety, since buying expected utility through safety now looks better than increasing the probability of winning the race (whereas before, it was compared against changing from definitely losing the race to definitely winning the race).

Rohin's opinion: The model in Racing to the precipice had the unintuitive conclusion that if teams have more information (i.e. they know their own or other’s capabilities), then we become less safe, which puzzled me for a while. Their explanation is that with maximal information, the top team takes as much risk as necessary in order to guarantee that they beat the second team, which can be quite a lot of risk if the two teams are close. While this is true, the explanation from this post is more satisfying -- since the model has a discontinuity that rewards taking on risk, anything that removes the discontinuity and makes it more continuous will likely improve the prospects for safety, such as not having full information. I claim that in reality these discontinuities mostly don't exist, since (1) we're uncertain about who will win and (2) we will probably have a multipolar scenario where even if you aren't first-to-market you can still capture a lot of value. This suggests that it likely isn't a problem for teams to have more information about each other on the margin.

That said, these models are still very simplistic, and I mainly try to derive qualitative conclusions from them that my intuition agrees with in hindsight.

Other progress in AIReinforcement learning

Learning Latent Dynamics for Planning from Pixels (Danijar Hafner et al) (summarized by Richard): The authors introduce PlaNet, an agent that learns an environment's dynamics from pixels and then chooses actions by planning in latent space. At each step, it searches for the best action sequence under its Recurrent State Space dynamics model, then executes the first action and replans. The authors note that having a model with both deterministic and stochastic transitions is critical to learning a good policy. They also use a technique called variational overshooting to train the model on multi-step predictions, by generalising the standard variational bound for one-step predictions. PlaNet approaches the performance of top model-free algorithms even when trained on 50x fewer episodes.

Richard's opinion: This paper seems like a step forward in addressing the instability of using learned models in RL. However, the extent to which it's introducing new contributions, as opposed to combining existing ideas, is a little unclear.

Modular Architecture for StarCraft II with Deep Reinforcement Learning (Dennis Lee, Haoran Tang et al)

Deep learning

Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet (Anonymous) (summarized by Dan H): This paper proposes a bag-of-features model using patches as features, and they show that this can obtain accuracy similar to VGGNet architectures. They classify each patch and produce the final classification by a majority vote; Figure 1 of the paper tells all. In some ways this model is more interpretable than other deep architectures, as it is clear which regions activated which class. They attempt to show that, like their model, VGGNet does not use global shape information but instead uses localized features.

Machine learning

Formal Limitations on The Measurement of Mutual Information (David McAllester and Karl Stratos)

Discuss