Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 12 минут 2 секунды назад

Please use real names, especially for Alignment Forum?

29 марта, 2019 - 05:54
Published on March 29, 2019 2:54 AM UTC

As the number of AI alignment researchers increases over time, it's getting hard for me to keep track of everyone's names. (I'm probably worse than average in this regard.) It seems the fact that some people don't use their real names as their LW/AF usernames makes it harder than it needs to be. So I wonder if we could officially encourage people to use their real firstname and lastname as their username, especially if they regularly participate on AF, unless they're deliberately trying to keep their physical identities secret? (Alternatively, at least put their real firstname and lastname in their user profile/description?)


Parfit's Escape (Filk)

29 марта, 2019 - 05:31
Published on March 29, 2019 2:31 AM UTC

To the tune of "Escape" (The Piña Colada Song), with apologies to Rupert Holmes.

I was lost in the desert, hopelessly dying of thirst
And I thought to myself, this can't get any worse
I heard the roar of an engine, surely it was my hearse
But then a tall shadow cooled me, it spoke and I heard

"If you like living not dying, I got an offer for you
If you give me your money, I'll give you water and food
I'll take you away to salvation, I'll get you out of this scrape
So just promise you'll pay me, to make Parfit's escape"

I hadn't solved decision theory, boy I sure wish that I had
'Cause my savior was Omega, and if I lied I'd be had
But CDT said don't pay, just take the ride for free
And though it seemed kind of foolish, I went ahead and agreed

"Yes, I like living not dying, so I'll make your offer good
When we get back to town, I've got money for you
I've got to get out of this desert now, I'm so tired of this place
Yes I two box on Newcomb, and I take Parfit's escape"

Omega dragged me to the car, put a canteen to my lips
We drove into the sunset, I felt nothing but bliss
I caught a glimpse of his face, and to my wondering eyes
My driver wasn't Omega, but Singer in disguise

"You don't owe me any money, you're the life I can save
And you should know I would have helped you, if you had lied to my face
I really hope you've learned a lesson though, about the perils of this place
Decision problems are dangerous, it's moral luck you escaped"

I said "Oh thank you Mr. Singer, let me buy you a drink
At a bar called O'Malley's, they make their Coladas pink
'Cause I'm eternally grateful, for all that you've done
Now let's get out of this desert, and be done with this song"


The trolley problem, and what you should do if I'm on the tracks

29 марта, 2019 - 04:24
Published on March 29, 2019 1:24 AM UTC

Originally published in French. Translation by Épiphanie.

Trigger warning: Death, suicide, and murder. Trolley problem.

This is quite the conventional and ethical conundrum: You are near train tracks, and a train is rolling down the hill. It is going to run over 4 people who are tied to the rails of the main track. However, you can change the train's direction to a secondary track by pulling a lever; so that it runs over only one guy, also tied down the rails. Should you pull the lever?

I do believe there is a more interesting way to frame it: What would you choose if you are yourself tied to the rails, alone, while the train is not heading toward you yet. My own answer is very simple: I want the person deciding where the train should go to have no doubts they should pull the lever! Because, for lack of context, I assume that the other four people are just me, or rather copy of mes. That's a bit simplistic, of course they are not perfect clone. But as far as concrete predicates go, they are indistinguishable. That is to say I have odds of being on tracks alone of 1 in 5, and odds for being in the group of 4 in 5. And tell you what, I prefer dying with 20% probability because of what someone did, rather than to die with 80% probability because no one was ever willing to take the burden of responsibility.

I know many would not pull the lever, or at least be very reluctant to. That is precisely the reason I am writing this post: I wish to make it public that I believe people should pull the lever. More importantly, I wish that many many more people would also share publicly this opinion. Then, if it is of public notoriety, the one who has to pull the lever would know they can do it without any remorse, as they will not have to face any societal consequences for what they've done. So all in all, this would raise my odds of survival of 60 points! That's quite something

But what if it were to truly happen?

Be aware that I am not saying anything more than what I have literally written. I have full rights over inconsistencies. Were I to be on the tracks alone, there is no telling what I would do. Maybe I would cry and plead not to pull the lever, maybe I would depict the other four people as worse than Hitler, or I would indulge the person into thinking they'd bear the burden of my death. Once I know that the probability of me dying jumps from 20 to 100%, I would -I presume- fight hard for my survival. Similarly, if I am on the tracks all alone, and I am the one who has to decide where the train should go, I am not plaging at all I would pull the lever.

Or maybe I will. Maybe I will be consistent. Right now, up there in my apartment where the worst thing I can think of to happen is my contract not being renewed, I cannot begin to imagine what it would be like. I can say what I hope for: I wish to be consistent in this situation... It would be extremely unfair and pointless to have another person bear my death on their own hands along the guilt that goes with it.

I see however only one problem to this policy. If this situation were to to be iterated frequently enough such that there is a selection mechanism among the survivors' behavior, then eventually the altruists would dwindle out in favor of egoists. So I am going to be optimistic and assume in this post that this trolley situation happens only in an exceptional fashion.


As with many such similar problems, it seems to me that one of the main flaw of the question is in everything that is not said. Everything that makes up the problem in itself, even, is ignored in favor of something so theoretic that it turns the problem so abstract it has no solutions at all.

For instance, one of the important assumption for this problem is the one that people are chosen at random. Let us break this assumption: For instance, suppose the villain of this story has awaited knowing everyone's opinion before unfolding their scheme. The villain tie people of the opinion "do not change the train's direction" on the main track and on the second track, people of the opinion "change the train's direction". Then I would be on the secondary track, and if one were to listen to me, I would die. Yet I could have survived had I just thought like everyone "don't pull the lever". When depicting this situation, I realise there is something very paradoxical about it. But what I'm really interested in, is that by removing the hypothesis of randomness, the 20/80% ratio does not hold anymore. That is what I am a bit anxious about when writing this post. I wary it becomes fuel in favor of killing people having the same position as I do, while changing nothing for others.

Another assumption I did not take into account thus far: Who are the people tied to the rails? If the train kills me, and the 4 others are worse-than-Hitler people, I would be a bit embarrassed (and dead). Theoretically, if I were tied alone and the other group contains Hitler along with few other dictators, I would still like to be saved. Note that I am against the death penalty, no exceptions and in favor of a just trial for anyone, no exceptions. My question is different however, and I still am pondering over what answer would be mine. I have no intentions for my life to be sacrificed for that of some tyrants. But I have also no intentions to be the kind of person that says their life is better than that of four others', let alone that it can be decided in few mere seconds.

The last assumption that has been ignored is how did we get here in the first place. I mean, it's not your mundane day-to-day situation! And this question is important even more so if we want to avoid it happening once more. The villain scenario is rather far-fetched, it seems this kind of situation can happen in real life, where technical decisions have to be taken fast. It can be, for instance, whether to choose between one pedestrian and a car full of passengers. Or between 4 pedestrians and a car with only one passenger. Or one worker working on the rails versus several working together. And so on and so on...

Let's be even more cynical: imagine an accident in a company which may kill four managers (by destroying their office)m and you can prevent that by killing a single workperson instead. If people follow the policy and "always pull the lever", none of the managers would have any real and strong incentive to ensure that this kind of incident does not occur anymore. They are always being saved and workpeople are being killed one by one. If this incident ks going to occur five times or more, not pulling the switch at all may actually save more lives in the long run. Once again, I am not necessarily asking for you to pull the lever in this kind of situation.


This post is just long for "I am taking a very strong precommitment for a situation whose assumptions are so strong themselves that I cannot fathom they would ever hold". Still, I was about to write a post on Atlas Shrugged (spoilers:) and I foresaw I would have to explain why I'd never swear in good faith that "by my life and my love of it, [...] I will never live for the sake of another man, nor ask another man to live for mine.”


The Unexpected Philosophical Depths of the Clicker Game Universal Paperclips

29 марта, 2019 - 02:39

Announcing predictions

29 марта, 2019 - 00:23
Published on March 28, 2019 9:01 PM UTC

Today I unveiled predictions, a command-line program to score predictions you’ve written down in a YAML file. In my estimation, the program and its supporting documentation is alpha quality. You’ll need to build it yourself and the documentation isn’t all there and well-organized yet.

If you’re familiar with Go toolchains, have a look. I’d be happy to take feedback here in addition to on GitHub.



How much money/reliability would you need to answer "hard" LW questions?

28 марта, 2019 - 23:07
Published on March 28, 2019 8:07 PM UTC

One of the questions (ha) that we are asking ourselves on the LW team is "can the questions feature be bootstrapped into a scalable way of making intellectual progress."

We'd like to make it easier to attach bounties to questions, and to make it easier to break questions into smaller, easier chunks.

What would it take for you (you, personally), to start treating "answer LW questions" as a thing you do semi-regularly, for money?

My assumptions (possibly incorrect) here are that you need a few things:

  • Enough money (and reasonable expectation of earning it) for a given question that working on it is worth the hours spent directly on it
  • Enough reliability of such questions showing up in your browser that you can build a habit of doing so, such that you reallocate some chunk of your schedule (that formally went either to another paying job, or perhaps some intellectual hobby that trades off easily against question answering)
  • A clear enough framework for answering questions, that relies on skills you already have (and/or a clear path towards gaining them)

Some types of intellectual labor I'm imagining here (which may or may not all fit neatly into the "questions" framework).

  • Take an scientific paper that's written in confusing academic-speak PDF format, and translate it into "plain english blogpost."
    • bonus points/money if you can do extra interpretive work to highlight important facts in a way that lets me use my own judgment to interpret them
  • Do a literature review on a topic
  • If you already know a given field, provide a handy link to the paper that actually answers a given question.
  • Figure out the answer to something that involves research
    • (can include contributing to small steps like 'identify a list of articles to read' or 'summarize one of those articles' or 'help figure out what related sub-questions are involved')
  • Conduct a survey or psych experiment (possibly on mechanical turk)

"Serious" questions could range from "take an afternoon of your time" to "take weeks or months of research", and I'm curious what the actual going rate for those two ends of the spectrum are, for LW readers who are a plausible fit for this type of distributed work.


[Link] IDA: 11-14/14: Future Directions

28 марта, 2019 - 21:56
Published on March 28, 2019 6:56 PM UTC

This is a linkpost for https://app.grasple.com/#/level/1553

Every Thursday for 4 weeks, we've been posting lessons about Iterated Distillation and Amplification. They're largely based on Paul Christiano's sequence here on LW who has graciously allowed us to use his work.

This is the final section, explaining the next steps, the current holes in the plan and how they might be fixed.

Also look out for our continuing video sequence on IA which will be posted on Robert Miles' Youtube channel as well as on our course.

Note that access to the lessons requires creating an account here.

Have a nice day!


Alignment Newsletter #50

28 марта, 2019 - 21:10
Published on March 28, 2019 6:10 PM UTC

Alignment Newsletter #50 How an AI catastrophe could occur, and an overview of AI policy from OpenAI researchers View this email in your browser

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter.


More realistic tales of doom (Paul Christiano): This Vox article does a nice job of explaining the first part of this post, though I disagree with its characterization of the second part.

The typical example of AI catastrophe has a powerful and adversarial AI system surprising us with a treacherous turn allowing it to quickly take over the world (think of the paperclip maximizer). This post uses a premise of continuous AI development and broad AI deployment and depicts two other stories of AI catastrophe that Paul finds more realistic.

The first story is rooted in the fact that AI systems have a huge comparative advantage at optimizing for easily measured goals. We already see problems with humans optimizing for the easily measured goals (scientific malpractice, outrage-inducing social media, etc.) and with AI these problems will be severely exacerbated. So far, we have been able to use human reasoning to ameliorate these problems, by changing incentives, enacting laws, or using common sense to interpret goals correctly. We will initially be able to use human reasoning to create good proxies, but over time as AI systems become more capable our ability to do this will lag further and further behind. We end up "going out with a whimper": ultimately our values are no longer shaping society's trajectory.

The second story starts out like the first story, but adds in a new complication: the AI system could develop internal goals of its own. AI performs a huge search over policies for ones that score well on the training objective. Unfortunately, a policy that optimizes for the goal of "having influence" will initially score well on most training objectives: when you don't already have influence, a good strategy for gaining influence is to do what your overseers want you to do. (Here "influence" doesn't mean just social influence; control over nukes also counts as influence.) At some point the system will be powerful enough that gaining influence no longer means doing what the overseers want. We will probably know about this dynamic through some catastrophic AI failures (e.g. an AI-run corporation stealing the money it manages), but may not be able to do anything about it because we would be extremely reliant on AI systems. Eventually, during some period of heightened vulnerability, one AI system may do something catastrophic, leading to a distribution shift which triggers a cascade of other AI systems (and human systems) failing, leading to an unrecoverable catastrophe (think something in the class of a hostile robot takeover). Note that "failure" here means an AI system "intentionally" doing something that we don't want, as opposed to the AI system not knowing what to do because it is not robust to distributional shift.

Rohin's opinion: Note that Paul thinks these scenarios are more realistic because he expects that many of the other problems (e.g. wireheading, giving AI systems an objective such that it doesn't kill humans) will be solved by default. I somewhat expect even the first story to be solved by default -- it seems to rest on a premise of human reasoning staying as powerful as it is right now, but it seems plausible that as AI systems grow in capability we will be able to leverage them to improve human reasoning (think of how paper or the Internet amplified human reasoning). The second story seems much more difficult -- I don't see any clear way that we can avoid influence-seeking behavior. It is currently my most likely scenario for an AI catastrophe that was a result of a failure of technical AI safety (or more specifically, intent alignment (AN #33)).

Read more: AI disaster won’t look like the Terminator. It’ll be creepier.

80K podcast: How can policy keep up with AI advances? (Rob Wiblin, Jack Clark, Miles Brundage and Amanda Askell): OpenAI policy researchers Jack Clark, Amanda Askell and Miles Brundage cover a large variety of topics relevant to AI policy, giving an outside-view perspective on the field as a whole. A year or two ago, the consensus was that the field required disentanglement research; now, while disentanglement research is still needed, there are more clearly defined important questions that can be tackled independently. People are now also taking action in addition to doing research, mainly by accurately conveying relevant concepts to policymakers. A common thread across policy is the framing of the problem as a large coordination problem, for which an important ingredient of the solution is to build trust between actors.

Another thread was the high uncertainty over specific details of scenarios in the future, but the emergence of some structural properties that allow us to make progress anyway. This implies that the goal of AI policy should be aiming for robustness rather than optimality. Some examples:

  • The malicious use of AI report was broad and high level because each individual example is different and the correct solution depends on the details; a general rule will not work. In fact, Miles thinks that they probably overemphasized how much they could learn from other fields in that report, since the different context means that you quickly hit diminishing returns on what you can learn.
  • None of them were willing to predict specific capabilities over more than a 3-year period, especially due to the steep growth rate of compute, which means that things will change rapidly. Nonetheless, there are structural properties that we can be confident will be important: for example, a trained AI system will be easy to scale via copying (which you can't do with humans).
  • OpenAI's strategy is to unify the fields of capabilities, safety and policy, since ultimately these are all facets of the overarching goal of developing beneficial AI. They aim to either be the main actor developing beneficial AGI, or to help the main actor, in order to be robust to many different scenarios.
  • Due to uncertainty, OpenAI tries to have policy institutions that make sense over many different time horizons. They are building towards a world with formal processes for coordinating between different AI labs, but use informal relationships and networking for now.

AI policy is often considered a field where it is easy to cause harm. They identify two (of many) ways this could happen: first, you could cause other actors to start racing (which you may not even realize, if it manifests as a substantial increase in some classified budget), and second, you could build coordination mechanisms that aren't the ones people want and that work fine for small problems but break once they are put under a lot of stress. Another common one people think about is information hazards. While they consider info hazards all the time, they also think that (within the AI safety community) these worries are overblown. Typically people overestimate how important or controversial their opinion is. Another common reason for not publishing is not being sure whether the work meets high intellectual standards, but in this case the conversation will be dominated by people with lower standards.

Miscellaneous other stuff:

  • Many aspects of races can make them much more collaborative, and it is not clear that AI corresponds to an adversarial race. In particular, large shared benefits make races much more collaborative.
  • Another common framing is to treat the military as an adversary, and try to prevent them from gaining access to AI. Jack thinks this is mistaken, since then the military will probably end up developing AI systems anyway, and you wouldn't have been able to help them make it safe.
  • There's also a lot of content at the end about career trajectories and working at OpenAI or the US government, which I won't get into here.

Rohin's opinion: It does seem like building trust between actors is a pretty key part of AI policy. That said, there are two kinds of trust that you can have: first, trust that the statements made by other actors are true, and second, trust that other actors are aligned enough with you in their goals that their success is also your success. The former can be improved by mechanisms lie monitoring, software verification, etc. while the latter cannot. The former is often maintained using processes that impose a lot of overhead, while the latter usually does not require much overhead once established. The former can scale to large groups comprising thousands or millions of people, while the latter is much harder to scale. I think it's an open question in AI policy to what extent we need each of these kinds of trust to exist between actors. This podcast seems to focus particularly on the latter kind.

Other miscellaneous thoughts:

  • I think a lot of these views are conditioned on a gradual view of AI development, where there isn't a discontinuous jump in capabilities, and there are many different actors all deploying powerful AI systems.
  • Conditional on the military eventually developing AI systems, it seems worth it to work with them to make their AI systems safer. However, it's not inconceivable that AI researchers could globally coordinate to prevent military AI applications. This wouldn't prevent it from happening eventually, but could drastically slow it down, and let defense scale faster than offense. In that case, working with the military can also be seen as a defection in a giant coordination game with other AI researchers.
  • One of my favorite lines: "I would recommend everyone who has calibrated intuitions about AI timelines spend some time doing stuff with real robots and it will probably … how should I put this? … further calibrate your intuitions in quite a humbling way." (Not that I've worked with real robots, but many of my peers have.)
Technical AI alignment   Problems

More realistic tales of doom (Paul Christiano): Summarized in the highlights!

The Main Sources of AI Risk? (Wei Dai): This post lists different causes or sources of existential risk from advanced AI.

Technical agendas and prioritization

Unsolved research problems vs. real-world threat models (Catherine Olsson): Papers on adversarial examples often suggest that adversarial examples can lead to real world problems as their motivation. As we've seen (AN #19previously (AN #24), many adversarial example settings are not very realistic threat models for any real world problem. For example, adversarial "stickers" that cause vision models to fail to recognize stop signs could cause an autonomous vehicle to crash... but an adversary could also just knock over the stop sign if that was their goal.

There are more compelling reasons that we might care about imperceptible perturbation adversarial examples. First, they are a proof of concept, demonstrating that our ML models are not robust and make "obvious" mistakes and so cannot be relied on. Second, they form an unsolved research problem, in which progress can be made more easily than in real settings, because it can be formalized straightforwardly (unlike realistic settings). As progress is made in this toy domain, it can be used to inform new paradigms that are closer to realistic settings. But it is not meant to mimic real world settings -- in the real world, you need a threat model of what problems can arise from the outside world, which will likely suggest much more basic concerns than the "research problems", requiring solutions involving sweeping design changes rather than small fixes.

Rohin's opinion: I strongly agree with the points made in this post. I don't know to what extent researchers themselves agree with this point -- it seems like there is a lot of adversarial examples research that is looking at the imperceptible perturbation case and many papers that talk about new types of adversarial examples, without really explaining why they are doing this or giving a motivation that is about unsolved research problems rather than real world settings. It's possible that researchers do think of it as a research problem and not a real world problem, but present their papers differently because they think that's necessary in order to be accepted.

The distinction between research problems and real world threat models seem to parallel the distinction between theoretical or conceptual research and engineering in AI safety. The former typically asks questions of the form "how could we do this in principle, making simplifying assumptions X, Y and Z", even though X, Y and Z are known not to hold in the real world, for the sake of having greater conceptual clarity that can later be leveraged as a solution to a real world problem. Engineering work on the other hand is typically trying to scale an approach to a more complex environment (with the eventual goal of getting to a real world problem).

Learning human intent

Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning (Smitha Milli et al): In Cooperative Inverse Reinforcement Learning, we assume a two-player game with a human and a robot where the robot doesn't know the reward R, but both players are trying to maximize the reward. Since one of the players is a human, we cannot simply compute the optimal strategy and deploy it -- we are always making some assumption about the human, that may be misspecified. A common assumption is that the human is playing optimally for the single-player version of the game, also known as a literal human. The robot then takes the best response actions given that assumption. Another assumption is to have a pedagogic human, who acts as though the robot is interpreting her literally. The robot that takes the best response actions with this assumption is called a pedagogic or pragmatic robot.

However, any assumption we make about the human is going to be misspecified. This paper looks at how we can be robust to misspecification, in particular if the human could be literal or pedagogic. The main result is that the literal robot is more robust to misspecification. The way I think about this is that the literal robot is designed to work with a literal human, and a pedagogic human is "designed" to work with the literal robot, so unsurprisingly the literal robot works well with both of them. On the other hand, the pedagogic robot is designed to work with the pedagogic human, but has no relationship with the literal robot, and so should not be expected to work well. It turns out we can turn this argument into a very simple proof: (literal robot, pedagogic human) outperforms (literal robot, literal human) since the pedagogic human is designed to work well with the literal robot, and (literal robot, literal human) outperforms (pedagogic robot, literal human) since the literal robot is designed to work with the literal human.

They then check that the theory holds in practice. They find that the literal robot is better than the pedagogic robot even when humans are trying to be pedagogic, a stronger result than the theory predicted. The authors hypothesize that even when trying to be pedagogic, humans are more accurately modeled as a mixture of literal and pedagogic humans, and the extra robustness of the literal robot means that it is the better choice.

Rohin's opinion: I found this theorem quite unintuitive when I first encountered it, despite it being two lines long, which is something of a testament to how annoying and tricky misspecification can be. One way I interpret the empirical result is that the wider the probability distributions of our assumptions, the more robust they are to misspecification. A literal robot assumes that the human can take any near-optimal trajectory, whereas a pedagogic robot assumes that the human takes very particular near-optimal trajectories that best communicate the reward. So, the literal robot places probability mass over a larger space of trajectories given a particular reward, and does not update as strongly on any particular observed trajectory compared to the pedagogic robot, making it more robust.


SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability (Maithra Raghu et al)


Call for Papers: ICML 2019 Workshop on Uncertainty and Robustness in Deep Learning (summarized by Dan H): Topics of this workshop include out-of-distribution detection, calibration, robustness to corruptions, robustness to adversaries, etc. Submissions are due April 30th.

AI strategy and policy

80K podcast: How can policy keep up with AI advances? (Rob Wiblin, Jack Clark, Miles Brundage and Amanda Askell): Summarized in the highlights!

A Survey of the EU's AI Ecosystem (Charlotte Stix): This report analyzes the European AI ecosystem. The key advantage that Europe has is a strong focus on ethical AI, as opposed to the US and China that are more focused on capabilities research. However, Europe does face a significant challenge in staying competitive with AI, as it lacks both startup/VC funding as well as talented researchers (who are often going to other countries). While there are initiatives meant to help with this problem, it is too early to tell whether they will have an impact. The report also recommends having large multinational projects, along the lines of CERN and the Human Brain Project. See also Import AI.

Other progress in AI   Reinforcement learning

Assessing Generalization in Deep Reinforcement Learning (blog post) (Charles Packer and Katelyn Guo): This is a blog post summarizing Assessing Generalization in Deep Reinforcement Learning (AN #31).

Meta learning

Online Meta-Learning (Chelsea Finn, Aravind Rajeswaran et al)

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples (Eleni Triantafillou et al)

Copyright © 2019 Rohin Shah, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.


What are the boldest near/medium-term predictions made by Drexler's CAIS model?

28 марта, 2019 - 16:14
Published on March 28, 2019 1:14 PM UTC

Background and questions

Since Eric Drexler publicly released his “Comprehensive AI services model” (CAIS) there has been a series of analyses on LW, from rohinmshah, ricraz, PeterMcCluskey, and others.

Much of this discussion focuses on the implications of this model for safety strategy and resource allocation. In this question I want to focus on the empirical part of the model.

  • What are the boldest predictions the CAIS model makes about what the world will look in <=10 years?

“Boldest” might be interpreted as those predictions which CAIS gives a decent chance, but which have the lowest probability under other “worldviews” such as the Bostrom/Yudkowsky paradigm.

A prediction which all these worldviews agree on, but which is nonetheless quite bold, is less interesting for present purposes (possibly something like that we will see faster progress than places like mainstream academia expect).

Some other related questions:

  • If you disagree with Drexler, but expect there to be empirical evidence within the next 1-10 years that would change your mind, what is it?
  • If you expect there to be events in that timeframe causing you to go “I told you so, the world sure doesn’t look like CAIS”, what are they?

Clarifications and suggestions

I should clarify that answers can be about things that would change your mind about whether CAIS is safer than other approaches (see e.g. the Wei_Dai comment linked below).

But I suggest avoiding discussion of cruxes which are more theoretical than empirical (e.g. how decomposable high-level tasks are) unless you have a neat operationalisation for making them empirical (e.g. whether there will be evidence of large economies-of-scope of the most profitable automation services).

Also, it might be really hard to get this down to a single prediction, so it might be useful to pose a cluster of predictions and different operationalisations, and/or using conditional predictions.


General v. Specific Planning

27 марта, 2019 - 15:34
Published on March 27, 2019 12:34 PM UTC

Epistemic Status: Everyone already knows about it?

I've been thinking a bit about two different manners of pursuing a goal.

I haven't come up with a catchy jargon for them, and I don't know of any existing catchy jargon for them either. General v. specific planning is pretty bad, but at least for the purpose of this post I'll stick to it.

I know they've been discussed here in one form or other, probably many times, but I don't think they've really been explicitly contrasted. I thought doing that might be useful.

Here are some suggestive, if imperfect, contrasts, illustrating what I mean.

  • How I try to win Chess versus how Chess grandmasters and AlphaZero try to win Chess.
    • I tend to play Chess by trying to get an advantage in valuable pieces, because that is generally useful to me. I then try to use these pieces to eventually obtain checkmate.
    • On the other hand, AlphaZero seems to play to obtain a specific, although gradually accumulated, positional advantage that ultimately results in a resounding victory. It is happy to sacrifice "generally useful" material to get this.
    • This isn't simply a matter of just using different techniques to get to the end. It has more to do with my inability to identify strong positions and picture a game very far into the future.
  • Peter Thiel's "indefinite optimism" about career success versus "definite optimism" about career success.
    • According to this schema, the typical indefinite optimist's life-path consists in getting instrumentally useful things, such as education, status, or money, without committing to a definite course of action. The stereotypical career for such a person is finance or consulting or "business." Their success is supposed to follow the pursuit of optionality.
    • The definite optimist's life path, on the other hand, is more likely to consist in researching and shooting for a single, particular course of action. The stereotypical career for such a person is as an inventor or entrepreneur. Their success is supposed to follow after giving up a great deal of optionality.
    • The indefinite optimist accumulates generally useful resources, while the definite optimist will give up many "generally useful" things in order to reach a single goal.
  • OpenAI's strategy versus DeepMind's strategy of handling AGI successfully.
    • OpenAI seems like it is trying to position itself to be able to influence policy, influence other researchers in other institutions, and perhaps--although probably not--develop AGI itself, and thereby insure safe AGI. They recently made themselves into a "capped profit" company to gain more money, similarly.
    • On the other hand, DeepMind's strategy simply seems be to try to develop AGI itself. They seem to have other efforts but these seem noncentral. (I might be entirely wrong about this, though.)
  • Stereotypically conservative, "white" advice for achieving happiness in life against stereotypically adventurous "red" advice for achieving happiness in life.
    • White advice for life would be to get a good education, to save money, etc., while expecting these to be instrumentally useful for a relatively undefined happiness. You plan to get the means for happiness, not happiness itself. This turns out badly when the presumed means pin you down and prevent you from getting what you really wanted.
    • Red advice, on the other hand, is more about picturing for yourself what you want in your inmost core, and saying to pursue that kind of happiness directly. You plan to get happiness itself, not the penumbral material for happiness. This turns out badly when what you thought would make you happy turns out not to make you happy, but now you lack general mobility.
    • The former is about pursuing particular things which are often useful for happiness; the latter is about trying to imagine what happiness is for yourself and pursuing that directly.
    • (I feel like this is giving short shrift to both perfected white and perfected red ideals, but I think it gets across what I mean.)
  • The kind of planning for success which frequently occurs in startups against the kind of planning which frequently tends to occur in established companies.
    • Startups are known for sacrificing instrumentally useful goods -- generally, money -- to carry out a longer-term plan. While established companies are known for carrying out short-term plans to gain an instrumentally-useful good, and in many cases known for ignoring long-term risks for short term gains.
Sketches of Definitions

If we want to stop giving examples and start talking about featherless bipeds, there are a few different ways to describe the examples above.

  • The distinction might be in the kind of resource pursued, and as alternate techniques for obtaining a goal.
    • On one hand, you can pursue resources that are generally useful along multiple hypothetical paths towards one's goal.
    • On the other hand, you can pursue a resource that is mostly useful along a single path towards one's goal.
  • But you can also cast these as different phases in pursuit of a goal, and say that a great deal of goal-seeking behavior goes through both.
    • When you start pursuing a goal, you often seek out generally useful things.
    • Demis Hassabis (apparently) pursued money and experience managing before starting DeepMind. AlphaZero pursues general positional advantage towards the start of a Chess game. Anyone founding a startup might pursue general knowledge about the domain for the startup before beginning.
    • And then at some point while pursuing the goal, you cease to seek generically useful things, and begin to turn these things to your specific advantage.
    • Demis Hassabis uses his resources to get founding from Peter Thiel, and later get acquired by Google. AlphaZero pursues a particular strategy for checkmate (?). Anyone founding a startup stops pursuing general knowledge to actually, you know, start the startup.
  • There's obviously a continuum here.
  • For the sake of clarity, I take this distinction to be far different than the distinction between trying to try and actually trying.
    • There are certainly possible agents in possible worlds who, when trying to do their absolute best, must try to attain intermediate, generally instrumentally useful goals. If I were to try to play Chess like AlphaZero, I would do even worse than I actually do now.
Misc Notes
  • Generally, I've noticed a great deal of psychological resistance in myself to moving from the general to the specific.
    • General strategies feel safer (even if they aren't) both because they offer more visible options and there's a smaller social burden in following them. No one will fault you for pursuing money, influence, etc. Many people will think you're foolish for deciding to bet your life on a particular path.
  • On the other hand, the more specific strategy can be more fun.
    • It can be fun because specific planning is more mentally interesting than sort of punting off a strategy to the future by thinking "well, once we've gotten more influence / money / status, then we'll pursue X directly."
    • Tighter constraints can be more interesting to work with. The problem of solving with tighter constraints also exercises your mind more, perhaps.
    • (As rather an aside, I think many people only exercise the sort of general planning about life planning, while only exercising the specific about work, which could lead to a little life boredom. I could be wrong.)
    • The more general feels more epistemically humble (potentially in a bad way) while the more specific feels more epistemically interesting (also potentially in a bad way).
  • Each strategy is sometimes correct.
    • Sometimes you just need to study more math, and sometimes you should actually propose a solution to that RL problem.
    • And really this whole framework might be fake.


The low cost of human preference incoherence

27 марта, 2019 - 13:23
Published on March 27, 2019 10:23 AM UTC

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In aiming for an adequate synthesis of human preferences, I tend to preserve quirks and even biases of humans, rather than aiming for elegant simplicity as many philosophers do. The main reason is that, feeling that value is fragile, I fear losing a valid preference more than allowing an illegitimate bias to get promoted to preference.

But another point is that I don't feel that the cost of excessive complexity is very high.

The small cost of one-off restrictions

An analogy could be with government interventions in the market: most naive restrictions will only slightly raise the cost, but will not have a large effect, as the market routes around the restrictions. Consider a government that created price controls on all bread, for some reason. If that was a one-off rule, then we know what would happen: loaves of bread would get gradually smaller, while bakeries would start producing more cake or cake-like products, which now occupy the niche that more expensive bread products would otherwise have filled.

There is likely an efficiency loss here - the balance between cake and bread is different from the market would have given, for small losses to bakers and customers. But these are not huge losses, and they are certainly much smaller than would have happened if the bakers and customers had simply continued as before, except with the price control.

Note that if the government is dynamic with its regulations, then it can impose effective price controls. It could add regulations about bread size, then bread quality, repeatedly; it could apply price controls to cake as well or ban cakes; as the market adjusts, regulations could adjust too. This would impose large efficiency losses (possibly coupled with equity gains, in certain circumstances).

But the cost of one-off restrictions tends to be low.

Quirks and biases in human preferences

It seems to me that including quirks and biases in human preferences, is akin to one-off restrictions in the market. Firstly because these quirks will be weighted and balanced against more standard preferences, so would be weaker than "government regulations". They would be one-off: quirks and biases tend to be, by definition, less coherent than standard preferences, so they wouldn't be particularly consistent or generalisable, making it easy to "route around".

Consider for example how governments deal with these biases today. People tend to be prejudiced against utilitarian calculations for life-and-death issues, and in favour of deontological rules. Yet healthcare systems are very much run on utilitarian lines, sometimes explicitly, sometimes implicitly. We resolve trolley problems (which most people hate) by ensuring that individuals don't encounter important trolley-like problems in their everyday lives (eg the Hippocratic oath, rules against vigilantism), while larger organisations take trolley-like decisions in bureaucratic ways that shield individuals from the pain of taking any specific decision. In effect, large organisations have become skilled at both respecting and getting around human quirks and biases.

Now, there are a few examples where human biases impose a large cost - such as our obsession with terrorism (and to a lesser extent, crime), or our fear of flying. But, to extend the analogy, governments are far from perfectly efficient, and have their own internal dynamics, blame, and power struggles. An AI dedicated to a) maximising human survival, and b) respecting people's fear of terrorism, would be much more efficient that our current system - providing the most efficient and spectacular uses of security theatre to make people feel safer, while prioritising terrorism deaths over other deaths as little as possible. This resembles the nearest unblocked strategy approach: the AI is respecting the bias as little as it needs to.

And, of course, as the AI's power increased, fulfilling quirks and biases will become relatively cheaper to satisfy, as the space of possible strategies opens up. It is very hard to design a patch that AIs cannot route around; human quirks and biases are very far from being well-designed patches for this purpose.


So, while the issue is worth investigating further, it seems that the cost of extra preferences, especially of the quirks and biases types, may be rather low.



27 марта, 2019 - 01:49
Published on March 26, 2019 10:49 PM UTC

I have become very, very interested in developing a skill that I call Dependability.

I believe the skill exists on a spectrum, and you can have less or more of it. It’s not a binary where you either have it or you don’t.

I believe this skill can be trained on purpose.

I will briefly describe each attribute that makes up the overall skill.

( All the examples are made up. Also, assume that the examples are only talking about endorsed actions and goals. )


To have a vision of a skill or a desirable end state and then be able to strive with deliberate effort towards making that vision a reality.

Ex1: I see a dance routine on YouTube that I think would be awesome if I could perform myself. I’ve never done anything like this before, and I’m somewhat self-conscious or skeptical of how likely I am to succeed. Regardless, I take concrete steps towards learning it (study the video, practice the moves that I see, repeat for many days until I’ve achieved competency at performing the dance). There is some possibility I fail for whatever reason, but this doesn’t stop me from giving it my full effort for at least a week.

Ex2: There’s a job that I really want. I’m unclear about what steps I need to take to acquire the job, and I’m not sure I’m qualified. I research what kinds of skills and traits are desirable in the job by asking people, Googling, and looking through applications. (I am more encouraged than not by this initial research.) I sign up for workshops and classes that will give me relevant training. I read books. I practice in my free time. I make useful connections / network. I build whatever reputational capital seems useful via blogging, social media, in-person meetings, running events. I apply for the job. If I fail, I figure out what needs work, fix it, and try again until I obtain the position.


To form an intention to do something (generally on a longer time scale), be able to say it out loud to someone else, and then be certain it will happen one way or another, barring extreme circumstance.

Ex1: I commit myself via marriage to another person and promise that I will try everything to make the relationship work before giving up on it. I say it out loud as a vow to the other person in a marriage ceremony, in front of a bunch of people. Then I proceed to actually attempt to get as close to 100% chance of creating a permanent relationship situation with this person, using all the tools at my disposal.

Ex2: I tell someone that I will be there for them in times of emergency or distress, if they ask. I tell them I will make it a priority to me, over whatever else is going on in my life. A year or two later (possibly with very little contact with this person otherwise), they call me and ask for my help. I put everything aside and create a plan to make my way to them and provide my assistance.


To finish projects that you start, to not give up prematurely, to not lose the wind in your sails out of boredom, lack of short-term incentive or immediate reward, lack of encouragement, or feelings of uncertainty and fear.

Here’s an example of what it looks like to NOT have follow-through: I want to write a novel, but every time I start, I lose interest or momentum after initial drafting and planning. Maybe I manage to build the world, create characters, plan out a plot, but then I get to the actual writing, and I fail to write more than a few chapters. Or maybe I loosen the requirements and decide I don’t need to plan everything out in advance, and I just start writing, but I lose steam midway through. I know in my heart that I will never be able to finish it (at least, without some drastic change).

Having follow-through means having the ability to finish the novel to completion. It is somehow missing from the person I’ve described above.


To do what you say you’ll do (on a lesser scale than with commitment); to be where you say you’ll be, when you say you’ll be there; to cooperate proactively, consistently, and predictably with others when you’ve established a cooperative group dynamic.

This can also be summed up as: If you set an expectation in someone else, you don’t do something that would dramatically fail to meet their expectation. You either do the thing or you communicate about it.


If someone is expecting to meet me at a time and place, I show up at the time and place. If there are delays, I let them know ahead of time. I don’t ever fail to show up AND not tell them in advance AND not explain afterwards (this would count as dramatically failing to meet an expectation).

If someone asks me to complete a task within the month, and months later, I have both failed to do the task AND I have become incommunicado, this counts as dramatically failing to meet an expectation.

Note that it doesn’t actually matter if they feel upset by your failure to meet an expectation. They might be totally fine with it. But I still would not have the skill of reliability, by this definition.

The skill also includes an ability to “plug into” teams and cooperative situations readily. If you are on a team, you are relatively easy to work with. You communicate clearly and proactively. You take responsibility for the tasks that are yours.

Focused attention

To be able set an intention and then keep your attention on something for a set amount of time (maybe about up to 20 minutes).

Ex1: If someone I care about is speaking to me and what they’re saying is important to them, even if it isn’t that important to me, I am able to pay attention, hear their words, and not get lost in my own thoughts such that I can no longer attend to their words.

Ex2: If I am trying to complete a <20-min task, I do not get distracted by other thoughts. I do not follow every impulse or urge to check Facebook or play a game or get food, such that I cannot complete the task. I’m able to stay focused long enough to finish the task.

Being with what is

To not flinch away from what is difficult, aversive, or painful. To be able to make space for sensations and emotions and thoughts, even if unpleasant. To be able to hold them in your mind without following an automatic reaction to move away or escape.

Ex1: If I am trying to introspect on myself, and I encounter ughy, aversive, or uncomfortable feelings, thoughts, or realizations, I am able to make space for that in my mind and stay with them. (This probably involves distancing myself somewhat from them so that they’re not overwhelming.)

Ex2: If someone expresses a loud, big, “negative” emotion (anger, fear, sadness, pain), I don’t panic or freeze or dissociate. I can stay calm, embodied, and grounded. And then I stay open to their emotional state and not assume it means something bad about me (“They hate me!” “I’m doing something wrong!” “They don’t want me around!”). I’m not overwhelmed by anxieties or stories about what their emotion means, which might cause me to go away or stop caring about them. I instead make room in myself for my feelings and their feelings so that they can both exist. I maintain an open curiosity about them.

More thoughts on Dependability

I claim that all these skills are tied together and related in some important way, and so I bundle them all under the word Dependability. Although I do not myself understand exactly and precisely how they’re related.

My sense is that the smaller-scale skills (e.g. focused attention, which occurs on a moment-to-moment scale) add to your ability to achieve the larger-scale skills (e.g. commitment, which occurs on a month-to-month scale).

If I had to point to the core of the Dependability skill and what the foundation of it is, it is based on two things: the ability to set an intention and the ability to stay with what is. And all the above skills apply these two things in some way.

In general, people seem able to set intentions, but the “staying” is the tricky part. Most people I’ve encountered have some of the Dependability skill, to some extent. But the skill is on a spectrum, and I’d grade most people as “middling.”

I think I’m personally much worse at setting intentions than average. In certain domains (emotions, realizations), I’m above average at staying with what is. In other domains (failure, setbacks, physical discomfort), I’m much, much worse at staying with what is.

I suspect children are not born with the overall skill. They develop it over time. The marshmallow test seems to assess part of the skill in some way?

My stereotype of a typical high school or college kid (relative to an adult) is terrible at the overall skill, and especially reliability. I was a prime example. You couldn’t rely on me for anything, and I was really bad at communicating the ways in which I was unreliable. So I just fell through on people a lot, especially people with authority over me. I would make excuses, ask for extensions and exceptions, and drop the ball on things.

Over time, I learned to do that way less. I’ve drastically improved in reliability, which was helped by having a better self-model, learning my limitations, and then setting expectations more appropriately. I’ve also just obtained more object-level skills such that I can actually do more things. I’ve learned to extend my circle of caring to beyond just myself and my needs, so I can care about the group and its needs.

The other skills, however, I am still quite bad at. Some of them I’m completely incapable of (commitment, follow-through).

How do you train Dependability?

I personally feel crippled without the skill. Like I will never achieve my most important goals without it. And also, I feel particularly disabled in gaining the skill, because of how I reacted to childhood trauma. My way of being, so far, has completely avoided making commitments, trying, and having follow-through. I’ve found workarounds for all those things such that I’ve lived my life without having to do them. And I got by just fine, but I won’t be able to achieve many of my goals this way.

(It’s a blessing and a curse that an intelligent, precocious person can get by without the trying skill, but here we are...)

Fortunately for me, I currently believe the skill is trainable with deliberate practice. Possibly better in combination with introspective, therapeutic work.

I don’t know what kind of training would work for others, but for myself, I’ve found one plausible way to train the skill deliberately.

I spent a week at a place called MAPLE, aka the Monastic Academy for the Preservation of Life on Earth. The people I met there exhibited above average skill in Dependability, and I was notably surprised by it. I was so surprised by it that I’ve spent a lot of time thinking about MAPLE and talking to people about it. And now I’ll be spending a month there as a trial resident, starting in April.

But this post isn’t where I talk about MAPLE. I mention it primarily as a hint that maybe this skill is attainable through deliberate practice.

It kind of makes sense that very deliberate, regular meditation could contribute to the skill. Because maybe the micro-skill (setting lots of tiny intentions, being with what is on a moment-to-moment basis) contributes to the macro-skill (setting large intentions, staying with what is on a larger scale).

The monastic lifestyle also includes being tasked with all kinds of somewhat aversive things (cleaning bathrooms, managing people, being responsible for things you’ve never been responsible for before). You join the team and are expected to contribute in whatever ways are needed to maintain and run the monastery. And it is supposed to be hard, but you are training even then.

It seems possible that this month at MAPLE, I will set more deliberate intentions than I have collectively in my life until then. Which tells you just how little I’ve done things on purpose, deliberately, and with intention in my life. The process of how that got broken in me is probably another story for another time.

But basically, I expect to do a bunch of repetitions of training Dependability on a second-to-second level. And I will be doing this not just during meditation but also during daily work. I will also likely spend a lot of time introspecting and trying to gain insight into my blocks around Dependability. I hope to see at least a little movement in this area in the next month but may need to spend a longer period of time at MAPLE to fully develop the skill. (I noticed that residents who’d been at MAPLE for multiple years had more of the skill than those who had been there for less time.)

[ Note: The following section might trigger people who are scrupulous in a particular way. I want to make clear that I’m not speaking from a place of obligation or shouldy-ness or fear of being a bad or unworthy person or self-judgment. I don’t feel shame or guilt about not having Dependability. I’m speaking from a place of actively wanting to grow and feeling excited about the possibility of attaining something important to me. And I hope the same for other people, that they will be motivated towards having nice things. Dependability seems like a nice thing to have, but I’m not into judging people (or myself) about it. ]

Not having Dependability is a major bottleneck for me. My ultimate goal is to live a life of arete, or excellence in all things. And an especially important part of that for me is living a virtuous life.

I believe that without Dependability, I will not be able to live a virtuous life: Be the kind of person who makes correct but difficult choices. Be the kind of person who is reliably there for her friends and family. Be the kind of person who can become part of or contribute to something bigger than herself. Be the kind of person who wouldn’t sell out humanity for money, fame, power, convenience, security, legacy. Be the kind of person who doesn’t lie to herself about “being a good person” who “does things for the sake of progress or for the good of others”—when in truth the underlying behaviors, cruxes, and motives have little to do with the rationalizations.

I consider it my duty as a human being to develop into a virtuous person, rather just any kind of person. And I believe Dependability is an important feature of a virtuous person.

I notice I don’t meet my personal criteria for a virtuous person as of yet, and Dependability seems like a major missing piece.


Unsolved research problems vs. real-world threat models

27 марта, 2019 - 01:10

A Concrete Proposal for Adversarial IDA

26 марта, 2019 - 22:50
Published on March 26, 2019 7:50 PM UTC

Note: This post came out of a conversation with Geoffrey Irving and Buck Shlegeris.

Epistemic Status: I suspect Paul has already thought of most or all of the ideas presented here, though I nevertheless found the exercise of carefully specifying an IDA implementation helpful and suspect others may find reading it helpful as well.

This is a proposal for how to train a machine learning model to approximate HCH using Iterated Distillation and Amplification (IDA). This particular proposal came out of a desire to use a debate-like adversary to improve the amplification process, and the primary goal of this proposal is to show how one could do that. Though I have tried to retain a lot of the relevant detail, I have made two simplifications to make this proposal easier to specify: I am attempting to approximate something closer to weak HCH rather than strong HCH and I am only allowing the generation of two subquestions at a time. I am confident that those simplifications could easily be dropped, though I think doing so here would only make this presentation more complicated.

Before I proceed, I want to make one final note: this is not a proposal for how to build an aligned AGI. I think there are still a whole bunch of issues that would prevent this proposal from actually working.


We will start with some initial definitions:

  • Let .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} Q be the set of all questions in natural language.
  • Let A be the set of all answers in natural language.
  • Let M be the sum type of either Q×Q or A representing either an answer to the given question or two subquestions to help answer it.
  • Let H:Q→A be the answer that a human gives to the given question.
  • Let Hfan out:Q→M be the answer or subquestion pair generated by a human when asked what to do with the given question.
  • Let Hfan in:Q×(Q×A)×(Q×A)→M be the answer or two subquestions generated by a human to some question when given answers to two subquestions related to that question.
  • Let ML:Q→Δ(A) be a model (the training procedure for which we will describe below) from questions to a probability distribution over strings representing answers. Specifically, we will implement the probability distribution by having our model output an embedding vector which yields the probability distribution when fed into some trained language model (by repeatedly conditioning on previous characters and multiplying all the conditional probabilities).
  • Let MLfan out:Q→Δ(M) be the function from questions to an embedding vector representing a distribution over answers or subquestions generated by asking ML what to do with the given question. For the two subquestion case we enforce that the distribution is symmetric wrt interchange of the subquestions.
  • Let MLfan in:Q×(Q×A)×(Q×A)→Δ(M) be the function from two answers to an embedding vector representing a distribution over answers or subquestions generated by asking ML to integrate the given subquestion answers into an answer to the original question. We again enforce symmetry wrt interchange of the subquestions.
  • Let Adv:Q×A→R be an adversary model which we will train (as described below) to predict how good the given answer is to the given question.
  • Let Advfan out:Q×M→R be an adversary model for MLfan out generated by calling Adv using the same transformation as used to implement MLfan out from ML.
  • Let Advfan in:Q×(Q×A)×(Q×A)×M→R be an adversary model for MLfan in generated by calling Adv using the same transformation as used to implement MLfan in from ML.
  • Let ϵ be some Gumbel random variable such that each use of ϵ below is its own independent sample from a Gumbel distribution. We use a Gumbel random variable so that we can use the Gumbel-max trick to effectively do Boltzmann exploration.

The goal here is to use the adversary to determine which portions of the HCH tree we actually go down to maximize our sample efficiency wrt H. Specifically, the amplification procedure is as follows:

1. Generate some question Q∈Q either from the training distribution or from the output of some adversary.

2. Call Hfan out(Q). If we get an answer, let A=Hfan out(Q) and skip to the end. Otherwise, let SQ1,SQ2=Hfan out(Q).

3. Call Adv to generate scores S1=Adv(SQ1,ML(SQ1)) and S2=Adv(SQ2,ML(SQ2)).

4. Without loss of generality, let S1+ϵ≤S2+ϵ and recurse on SQ1 (by setting Q=SQ1 in the subcall) to generate A1 (from the A of the subcall) and let A2=ML(SQ2).

5. Call Hfan in(Q,(Q1,A1),(Q2,A2)). If we get an answer, let A=Hfan in(Q,(Q1,A1),(Q2,A2)). Otherwise, let SQ1,SQ2=Hfan in(Q,(Q1,A1),(Q2,A2)) and return to step 3.

This procedure should eventually produce an answer, though we then need to compute the following losses for ML,MLfan out,MLfan in,Adv,Advfan out,Advfan in:

L=−log(ML(Q)|A)Lfan out=−log(MLfan out(Q)|Hfan out(Q))Lfan in=−log(MLfan in(Q,(Q1,A1),(Q2,A2))|Hfan in(Q,(Q1,A1),(Q2,A2)))LAdv=∣∣S1−log(ML(SQ1)|A1)∣∣2LAdvfan out=|Advfan out(Q,(SQ1,SQ2))+Lfan out|2LAdvfan in=|Advfan in(Q,(SQ1,A1),(SQ2,A2),A)+Lfan out|2

Note that some of these losses are actually for the same model since MLfan out,MLfan in are implemented internally by calling ML and Advfan out,Advfan in are implemented internally by calling Adv.


For the distillation procedure, we want to improve ML by training it to approximate its own ML-consulting-ML tree. Specifically:

1. Generate some Q∈Q as in amplification.

2. Call MLfan out(Q). If we get an answer, let A=MLfan out(Q) and skip to the end. Otherwise, let SQ1,SQ2=MLfan out(Q).

3. Let A1=ML(SQ1) and A2=ML(SQ2).

4. Call MLfan in(Q,(Q1,A1),(Q2,A2)). If we get an answer, let A=MLfan in(Q,(Q1,A1),(Q2,A2)). Otherwise, let SQ1,SQ2=MLfan in(Q,(Q1,A1),(Q2,A2)) and return to step 3.

Then, compute the following losses for ML,Adv:


Finally, we want to iterate the above procedure by replacing H,Hfan out,Hfan in in the amplification procedure with some H′,H′fan out,H′fan in. First, let α be some confidence threshold. Then, we will define the following primed Hs:

  • Let H′(Q)=if Adv(Q,ML(Q))+ϵ<α+ϵ then H(Q) else ML(Q).
  • Let
H′fan out(Q)=if Advfan out(Q,MLfan out(Q))+ϵ<α+ϵ then Hfan out(Q) else MLfan out(Q)
  • Let
H′fan in(Q,(SQ1,A1),(SQ2,A2))=if Advfan in(Q,(SQ1,A1),(SQ2,A2),MLfan in(Q,(SQ1,A1),(SQ2,A2)))+ϵ<α+ϵ then Hfan in(Q,(SQ1,A1),(SQ2,A2)) else MLfan in(Q,(SQ1,A1),(SQ2,A2))

This procedure allows us to continue amplifying the model while using the adversary to require only minimal human data that is selected so as to be maximally helpful.


This proposal differs in a couple of ways from previous proposals made by Paul. First, Paul has recently moved away from discrete amplification/distillation steps. This proposal, however, provides a way to recover discrete steps while still collapsing the recursion. In practice, however, you might still just want to stick with the amplification procedure described here without doing the distillation step, as it isn't strictly necessary.

Second, this proposal uses an adversary to guide the training process. This technique is similar to the concept of importance sampling. The main benefit of this approach is that it takes advantage of active learning by allowing the system to choose which questions and subquestions would be most useful for it to have answered by a human.

Another benefit of the adversary, however, is that it could make transparency much easier. One of the major benefits of IDA is that ML gets trained to approximate its own ML-consulting-ML tree. As a result, understanding the reasoning that went into the final answer produced by ML can be recovered by unfolding its tree (at least in the limit of perfect training). However, unfolding the entire tree is very expensive, as it's linear in the size of the tree. With an adversary, however, you can choose which portions of the tree to unfold first by calling the adversary, enabling you to find errors much more quickly; for a perfect adversary, this reduces the problem of finding an error to O(logn) instead of O(n).

Thus, the hope is that the use of such an adversary could assist both in making IDA more competitive (by increasing sample efficiency and using active learning) and in making IDA safer (due to the increased ease of transparency).

It should be noted, however, that it is also possible that the use of such an adversary might make the safety situation for IDA worse. First, it introduces the possibility of a robustness to relative scale failure if either ML or Adv gets significantly stronger than the other. One possible way to resolve such an issue, however, might be to give Adv the ability to call ML and vice versa, allowing them to use each other to boost their own capabilities. Second, for an ML and Adv system that are themselves optimizers, with goals that don't perfectly match up with their loss functions, they could cooperate to make it arbitrarily unlikely that H is ever consulted on some specific question. Third, even if ML and Adv weren't cooperating, an RSA-2048-style failure could still prevent the identification of malicious cognition. Resolving failures of these second two types is still an open question (see "Inner Optimization," Hubinger, van Merwijk, Mikulik, Skalse, and Garrabrant, forthcoming).


Deck Guide: Biomancer’s Familiar

26 марта, 2019 - 18:20
Published on March 26, 2019 3:20 PM UTC

Biomancer’s Familiar is a great card. I wanted it to happen so bad. I spent a substantial portion of my preperations for Cleveland trying to make Biomancer’s Familiar happen. I tried two color versions. I tried three color versions. I tried going big, going small, going wide, and everything else I could think of. I came close enough to consider buying the cards.

Ultimately, I could not make it happen. Blue was too strong and too structurally tough. You had to give up Pelt Collector, had to pay real mana for your spells. The format wanted different things than the Familiar deck could provide. Sideboarding wasn’t as impactful for you as I wanted. The deck was good. It was fun as hell. But not good enough. I switched to blue, tore up the ladder with it, and never looked back. Things didn’t work out at the Pro Tour, but given the overall results, I am confident I made the right decision.

There are three reasons to share the deck now.

The first reason is, as noted, that the deck is great fun. As constructed, it’s a step behind where it needs to be to win major tournamens, but it’s still good enough to get five wins more often than not in Traditional Constructed. Its best draws bury people if not answered, and it gets them often.

The second reason is that perhaps I missed something. There’s a lot of good things going on, so the deck might be one card, or one idea or sideboard plan, away from competitive. That might be a two color or three color build. It might involve a card from War of the Spark.

The third reason is that now that I write it out, I think this deck plays fine against what’s out there right now. My results weren’t as good as with blue, but perhaps things have changed.

Here is the strongest version of the deck:

3 Adventurous Impulse

2 Quench

2 Negate


4 Llanowar Elves

4 Biomancer’s Familiar

4 Growth-Chamber Guardian

4 Incubation Druid

2 Druid of the Cowl

4 Sphinx of Foresight

4 Frilled Mystic

4 Biogenic Ooze

10 Forest

5 Island

4 Hinterland Harbor

4 Breeding Pool


4 Kraul Harpooner

1 Thorn Lieutenant

4 Entrancing Melody

2 Negate

2 Spell Pierce

2 Dive Down

There are only eight adapt creatures in the deck for Biomancer’s Familiar. This seems light, but you have a lot of search, and more of that is not what you need. There are places where a Xengara, Utopia Speaker or two would be welcome for additional power, but I found that if we were going large and finding ways to close things out, it was better to just go straight to Biogenic Ooze. Ooze does often get a substantial boost from Biomancer’s Familiar, although those games are usually (but not always!) yours anyway.

This deck has a lot of mana. You have 23 lands, 3 Adventurous Impulse that will almost always hit mana if you want them to, and ten mana creatures. You also have a lot of ways to use that mana. Adventurous Impulse often finds a good sink if you want one, Sphinx sends mana away once you’re set. Incubation Druid turns into a 3/5 even without a Biomancer’s Familiar. WIth Biomancer’s Familiar it turns into a giant mana sink. Growth-Chamber Guardian eats up a lot of mana. Other games, you have a lot of mana and use it to keep counters up while developing your board. That’s fine too.

Path one is to go for the quick easy win. You get a lot of easy wins from a quick Biomancer’s Familiar even without Llanowar Elves. Get one out on turn 2, play a Growth-Chamber Guardian on turn 3 and turn it to 4/4 on the spot. Next turn, you’ll have a 6/6 and a 4/4 (plus a 2/2) and from there it rapidly gets worse, so those two cards together will beat most draws that can’t remove the Biomancer’s Familiar, even without additional spells.

The other main path is to build to five or more mana, then go for it. On turn four, you can deploy the Biomancer’s Familiar and the Growth-Chamber Guardian, or boot up the Incubation Druid right away and tap it for mana to keep going, or both. Other times, you power out a quick Biogenic Ooze instead, which also works. Having Ooze gives you extra ‘packages’ to deploy if Kaya’s Wrath or Gates Ablaze sets you back. Given Incubation Druid, you can often do this reasonably early with counter backup.

Playing traditional aggro-control without the boost is also strong. Force them to either walk into your counters, especially Frilled Mystic, or you can adapt if they pass, and things get steadily worse.

Sphinx of Foresight is a very good card that doesn’t have a good home elsewhere. This is its chance to shine, as you highly value the scry to set up your combinations, and going mana creature into turn three Sphinx of Foresight is often quite strong. If you untap with it, you often don’t have to ever tap out again and the extra scry triggers are more impactful than they appear, as once your mana is set, and especially once you find the first Growth-Chamber Guardian, you have a few very high impact cards and a lot of very low impact cards. Other times, they tap out dealing with it and you stick a Biogenic Ooze.

A nice bonus for this deck is that you know when Biomancer’s Familiar or Growth-Chamber Guardian isn’t doing anything for you. Sometimes you have a duplicate. Sometimes you have other uses for your mana. Sometimes there’s nothing to use the Biomancer’s Familiar on, and you have enough mana to work without it when that changes. In those cases, you can expose your creatures and let them get killed, soaking up mana and removal to make way for later. Smart opponents know that Biomancer’s Familiar is a space bunny and that space bunnies must die.

Sideboarding poses a problem. The deck’s cards, other than Biogenic Ooze, are all either counters, mana, or working towards making your central engine happen. What can we take out? If we put in good cards from the sideboard, are we improving matters? That’s why sideboarding wasn’t impactful enough. The new cards were good, but you have to give up a lot to put them in.



There are two problems in the blue matchup. The first is that you have a hard time stopping Curious Obsession. The four cheap counters help but if you try that you get blown out by Spell Pierce or by them not having Curious Obsession in the first place, in games where deploying mana would have let you compete. Your best weapon against Curious Obsession therefore is Sphinx of Foresight, since there are probably not putting Obsession on Mist-Cloaked Herald or Tempest Djinn, and it’s often not possible for them to hold up a counter on turn three that stops a creature. Your other best weapon is to overpower them through it. If you have Biomancer’s Familiar and Growth-Chamber Guardian, or stick a Biogenic Ooze, it is not going to much matter that they draw two cards per turn. You can also try to use your counters to stop Temptest Djinn, which is difficult for them to have counter backup to defend. Without the Djinns, they can’t bring much power to the table.

The other problem is that they have all the control. With lots of one drops, flyers, chump blockers, Merfolk Tricksters and counters, they choose where the battle is fought. Often you will have a lot more power, and they find a way to win regardless, especially if you had to take turns two and three setting up before things get rolling.

If they hang back and don’t tap mana, usually the right thing to do is develop your mana. If they fight it, you’ll still have enough and run out of counters. If they don’t fight it, you can pick up tempo and start double casting or having counter backup later. You have a lot of threats that are quite frustrating for them, and can make playing a Tempest Djinn quite perilous. Once you have them on the board, force them to make a move.

That doesn’t mean the matchup is great. It’s definitely not, but it is winnable.

You can do more or less sideboarding on the margin, the detaul is something like:

In: +4 Kraul Harpooner, +4 Entrancing Melody, +1 Spell Pierce, +1 Negate

Out: -3 Adventurous Impulse, -2 Druid of the Cowl, -4 Frilled Mystic, -1 Biogenic Ooze

Sideboarding offers you some very strong cards. Kraul Harpooner is perfect and fits right into your strategy. You also are very good with Entrancing Melody, with so much mana as to cast it often with counter backup, while the Kraul Harpooner keeps Siren Stormcaller from getting in the way. Your core strategy is to deploy creatures, so it’s hard for them to have a lot of defenses for Entrancing Melody like Negate or Dive Down, and if they try to respond in kind then that’s the type of mana exchange that favors you quite a bit.

We can consider more copies of Negate, or Spell Pierce, althoguh the motivation for those cards lies elsewhere.

I tested Essence Capture in this and other places. It’s very cute and sometimes a true blowout when it turns on Incubation Druid or a smaller one on Growth-Chamber Guardian, but the double blue mana wasn’t quite compatible with our mana base once we see what we have to sideboard out.

Frilled Mystic is the easy cut. Playing a waiting game and refusing to tap mana against a deck full of one drops, where they can counter back, where the 3/2 body doesn’t have much impact, is not a good idea. The other cut turns out to be Adventurous Impulse. Even tapping one mana is often something you don’t have time for, and you’re bringing in a high impact spell for a high impact creature, which makes the card much worse. For similar reasons, you let go of Druid of the Cowl, as it doesn’t block anything and the draws it enables are often far too slow, or let them break us up with counters. That gives us room for a third Negate or first Spell Pierce. Depending on how you feel about where that leaves the mana and how much you are determined to fight Curious Obsession, you can then cut copies of Biogenic Ooze. Being on the play versus draw can also be a consideration.

If you wanted to improve matters further after board, you could play Crushing Canopy in the board, or have access to more copies of Quench. It’s not clear how else we can improve much.


They will kill as many creatures as they can on sight. This is wise. Your goal is to keep forcing them to do this until they run out of removal, or develop your mana so that you can slip in the engine or an Ooze while they’re tapped out. There’s a scary early phase where you can get overrun, and a scary later phase where you have to find a way to quickly turn the corner before you get burned out, and often won’t have good attacks that seem safe. Ooze is better than it looks here because it lets you close games quickly without the engine despite being low on life. Often you have to do this while holding up Frilled Mystic for many turns, which can make things tricky. Sometimes you get burned out before you can finish the job, or have to expose risk of that happening to avoid giving them too much time.

The other way you lose is Experimental Frenzy or Rekindling Phoenix against an unimpressive board. Ideally you have counters ready for that, and there is a point in the game where this becomes your primary concern.

Thus, the early turns are largely about preventing them from getting creature damage in and establishing a board that will let you sit on counters. Druid of the Cowl is very good on turn two, as they have to take time off to kill it or you get to play a Sphinx, or a two drop with counter backup, on turn three.

In: +1 Thorn Lieutenant, +4 Entrancing Melody, +2 Negate

Out: -3 Adventurous Impulse, -4 Biogenic Ooze

Thorn Lieutenant is in the board for this matchup in particular. Thorn Lieutenant does exactly what you want. If they try to attack into it with one and two drops, it is a perfect wall. If they kill it, you get a free 1/1 that is surprisingly annoying. Later on, it turns around and attacks and is another way to exploit Biomancer’s Familiar. Cutting the activation from six mana to four makes things a lot easier. It’s a nice to have, and it offers another long term threat in other matchups where you want that, but an easy cut from the board if you want something else badly.

You do already have a ton of perfectly good two drops. But many of them are long term valuable, and you want the option to hold them for the right later opportunity. You also often want to cast two of them on your four mana turn, or one per turn while holding up counters.

Entrancing Melody gives you coverage against Rekindling Phoenix, and is also very strong when it takes Goblin Chainwhirler. That frees up the ground for you to attack, as they lost a good blocker and a good attacker and you picked up a great additional blocker, letting you close the game out quickly. Even taking a small creature prevents them from going wide.

With that, you no longer need Biogenic Ooze as much, which means Adventurous Impulse gets worse, so it comes out too. The extra counters let you stop Experimental Frenzy or prevent you from being burned out later.

Putting in Dive Down is reasonable as well, and can lead to the engine coming online, but is a way things can go wrong if they start aiming all their removal at your head, giving you dead cards. Watch how they play and act accordingly.

This matchup is quite good as configured. If you don’t care about it much, you can trim an additional Druid of the Cowl and give up Thorn Lieutenant, and things worse but still fine. If you care about it a lot, you can have access to more Thorn Lieutenants, including in the main, or a third maindeck Druid of the Cowl.


They can deploy a lot of power quickly. Your best draws go over the top of that fast enough to not die, unless they go completely nuts. Once you turn the corner, you can be very patient, as not much threatens you, but there is risk that they gain the ability to go wide and kill you with an Alpha strike. There are games where you spend a lot of time pumping up your team but can’t get through and they’re making tokens or keep playing creatures, and closing it out gets tricky. That got a lot easier once we added a full four Biogneic Ooze, and either that or Sphinx of Foresight can close things out.


In: +4 Entrancing Melody, +1 Thorn Lieutenant

Out: -1 Negate, -4 Frilled Mystic

There is nothing you need to counter. There are things you’d like to counter, especially removal spells, but not enough to be thrilled about holding up mana. If they show a bunch of flyers that Kraul Harpooner can pick off, I don’t mind putting a few in. It’s also a solid blocker for the early turns.

Esper Control:

Kaya’s Wrath is your enemy. They can hit you with discard and then wipe your board. That is the most common way you lose. You also need to watch out for Cry of the Crenarium if you deploy creatures in the wrong order. The other way is they counter or kill everything one by one and you run out of threats. Biogenic Ooze gives you extra good threats, especially after they Kaya’s Wrath.

Once you have enough stuff, sit back on counters and don’t use them on spells that don’t change the path of the game. What matters is mostly Kaya’s Wrath. Know when you need to walk into it, when you can afford to play around it, and when they’ll get enough counter backup for it.


In: +2 Negate, +2 Spell Pierce, +2 Dive Down, +1 Thorn Lieutenant

Out: -2 Druid of the Cowl, -4 Sphinx of Foresight, -1 Adventurous Impulse

Sphinx is easy to answer, doesn’t hit hard, and costs too much to protect properly. Giving up the scry at the start of the game is unfortunate, but that isn’t enough to justify its presence. Adventurous Impulse gets substantially worse, but we still have a lot of strong hits and love finding Frilled Mystic, so it mostly stays despite Sphinx leaving. The Thorn Lieutenant gives you a threat that can close things out, and I’ve found it plays surprisingly well against control. But if you don’t have it, you won’t miss it much here.

The counters shore you up against Kaya’s Wrath and Teferi, Hero of Dominaria. Dive Down protects your key creatures against removal.

Nexus of Fate:

You’re playing a similar game to blue. They’re better at it here, as your extra power is mostly overkill, but even a worse version of this strategy still works well. Always counter Search for Azcanta, and almost always hang back on counters once they get to four.


In: +2 Spell Pierce, +2 Negate

Out: -4 Biogenic Ooze

You don’t need the power Ooze provides, so take it out, deploy stuff early then sit on counters. If you aren’t happy with the matchup, add more counters to the sideboard until you are satisfied.


You have a few different fears to worry about. Hostage Taker on your creatures is often quite bad. In corner cases it is so bad that you need to consider holding Biogenic Ooze. If a WIldgrowth Walker goes large, it can buy a lot of time. If the game goes long enough without you closing it out with your engine or a Biogenic Ooze or Sphinx of Foresight, they will cast Hydroid Krasis one time too many for escalating sizes.

Then there’s Finality. You need to be continuously aware of Finality. Sphinx of Foresight and Biogenic Ooze are both vulnerable, as are many of your cheaper creatures. Once you are clearly ahead, prioritize getting creatures to five toughness. Push a Growth-Chamber Guardian to 6/6 and leave one at 2/2 for now, which is usually right anyway. Get Incubation Druid to 3/5 even if it feels unnatural or slows things down a bit. If you can’t, consider paying a lot to hold up counters, and/or hold some creatures back. Holding up counters is how the last few turns are best handled most of the time in any case, if you have them available.


In: +4 Entrancing Melody, +2 Dive Down

Out: -3 Adventurous Impulse, -2 Druid of the Cowl, -1 Biogenic Ooze

You love the spells coming in, and need to make room. Their plan is mostly to trade cards with you in various forms and grind you out, so flooding on mana is a danger. Druid of the Cowl does not useful blocking and Adventurous Impulse can miss, while Entrancing Melody mostly only costs two mana and Dive Down costs one, and they rarely kill Llanowar Elves or a non-adapted Incubation Druid, so you’re not overly mana light.

I’m not sure how many copies of Negate you want. Finality is important, but so is Hostage Taker, and playing too many spells is how you run into trouble. I’m pretty unhappy that we’re cutting a Biogenic Ooze as it is to stay at eight answers.


Gruul smash. You build up. Who will do it better? Back when I tested no one was playing Gruul, so I don’t know. They can certainly deploy a lot of threats fast and pick off your creatures before you can do your thing, Pelt Collector is super efficient and Rekindling Phoenix is tough. If you can do your thing in full, you’ll win.


In: +4 Entrancing Melody, +2 Dive Down

Out: -2 Druid of the Cowl, -1 Biogenic Ooze, -1 Adventurous Impulse, -2 Negate

Dive Down is a better Negate, so it’s an easy swap. I can see going either up or down on answers, but I don’t think you have time for Negate, and Spell Pierce won’t play in context. Druid of the Cowl does not actually block, so go with your other two drops. That leaves two cards to bring out. Biogenic Ooze seems slow so I’m fine bringing one out, which in turn makes me like Adventurous Impulse less given how many spells I’m bringing in. This is a place to start, but it’s likely wrong.

Other matchups follow similar principles.

If you get a chance, take this deck for a spin and see what you think.


[Method] The light side of motivation: positive feedback-loop

26 марта, 2019 - 13:56
Published on March 26, 2019 10:56 AM UTC

I want to share this method I use sometimes to stay focused on my tasks, earn rewards from them, and build up a positive feedback-loop to do more difficult things. It's nothing new and has probably been written about a few times, but I have been using it subconsciously for years, and wanted to do an explicit representation for future use. If this sounds completely wrong to you, please ignore it or tell me in the comments.

It should go without mentioning that this is just one part of a well-tuned system. It works because other parts work and support it. If supportive systems are wired differently or broken, this approach may not work at all.

What you need
  • the ability to motivate yourself to some degree
  • some preparation, or else that you have a mental list of tasks that can be done
  • the ability to complete simple or moderately difficult tasks when you are already motivated (Side-note: Forcing yourself to do things when you really, really don't want to might be effective one or twice. But in the long run, it's going to build up an even stronger aversion. Then making yourself do the thing will be even more difficult. How to change your feelings about a task is not the main topic of this post.)
How it works
  • Step 1: Induce happiness. Make yourself feel confident and hopeful.
  • Step 2: Choose the (simple to moderately difficult) action you want to complete.
  • Step 3: Complete the action and earn reward for this! With the "evidence" of being able to complete actions, stir up your confidence of being able to complete tasks.
  • Step 4: Choose the next stack of actions. They should be moderately difficult and involve clear steps to the solution (no 'meta-actions'). Begin them immediately, before your confidence and motivation fades; in doing so, focus on the short-term future when you are going to feel accomplished about having completed them. Use your recently activated confidence for this. Don't focus on your potential dislike of them or any other feelings of avoidance. If they crop up regardless, ignore them and tell them that they are going to be defeated soon, so they should just leave. (I know how this sounds. But if you are anything like me, treating your mind like a dog to be trained can be really helpful in getting it to do things!)
  • Step 5: Relax from having successfully completed a stack of necessary and useful actions. Bask in the reward, but don't linger more than 10 minutes. Even if you are slightly exhausted, this is not the time to stop! Focus on building your resolve to tackle a more difficult action next; one that you know you can complete, but may have to work harder for.
  • Step 6: Make sure to possess the necessary energy to do this more difficult task. Eat or drink something small and healthy if you don't; take a short (5 to 15 min) nap.
  • Step 7: Sit down in a clear and organised workspace. Be determined to do this! Don't stop until you are finished. Persist through exhaustion; this is a sign of working hard, not failure. When you are done, take the time to clean up your workspace.
  • Step 8: Rest a lot! Go to sleep or take a nap, eat something, read a light novel. Well done! You have completed your goal!

Caution! Do not use this to work yourself to exhaustion over time. This is meant to help in keeping up a healthy work-mentality; don't use it to trick your body or mind into giving more than it has. Take steps to make sure this doesn't happen; perhaps set up a reminder for some weeks later, checking that your habits don't stray into forbidden territory. The danger lies in not noticing until it is to late. Be prepared!

How to reward yourself

This might be really different for different people. I build up some habits over the years where, after completing some chosen task or thingy, I would internally congratulate myself, focus on the positive feelings this evoked, etc.

It might take some experimentation. Physical rewards, like a pleasant sound and light effect, a 'Well done!' stamp on a paper (humans are weird, but if it produces the desired results...), can also be effective. This works for children, pets and games, which is why I started using it.

These small rewards don't really matter at all, of course. They are just tools to build up the desired habits. Eventually, when you are working on the things that are important to you and making progress, that may become a reward of its own.


"Moral" as preference label

26 марта, 2019 - 13:30
Published on March 26, 2019 10:30 AM UTC

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In my quest to synthesise human preferences, I've occasionally been asked whether I distinguish moral preferences from other types of preferences - for example, whether preferences for Abba or Beethoven, or avocado or sausages, should rank as high as human rights or freedom of speech.

The answer is, of course not. But these are not the sort of things that should be built into the system by hand. This should be reflected in the meta-preferences. We label certain preferences "moral", and we often have the belief that these should have priority, to some extent, over merely "selfish" preferences (the extent of this belief varies from person to person, of course).

I deliberately wrote the wrong word there for this formalism - we don't have the "belief" that moral preferences are more important, we have the meta-preference that a certain class of beliefs, labelled "moral", whatever that turns out to mean, should be given greater weight. This is especially the case as there are a lot of cases where it is very unclear if a preference is moral or not (many people have strong moral-ish preferences over mainstream cultural and entertainment choices).

This is an example of the sort of challenges that a preference synthesis process should be able to figure out on its own. If the method needs to be constantly tweaked to get over every small problem of definition, then it cannot work. As always, however, it need not get everything exactly right; indeed, it needs to be robust enough that it doesn't change much if a borderline meta-preference such as "everyone should know their own history" gets labelled as moral or not.


What I've Learned From My Parents' Arranged Marriage

26 марта, 2019 - 09:40
Published on March 26, 2019 6:40 AM UTC

When I tell people my parents had an arranged marriage, I get a number of different reactions. Most people have the wrong idea of exactly what that looks like, and those who do have the right idea often wonder if my parents can even understand what dating is like, given they've never experienced it. I've heard people assume that my parents' arranged marriage meant they were completely unable to help or give advice when it came to my dating life, and I've found the opposite to be the case; the advice my parents gave me about dating was as valuable as anything I found anywhere else, and allowed me to pass that advice on to my friends. Growing up hearing their story taught me a lot about what was important to know about myself before I started dating anyone, and how a good couple functions and grows together. I found that much of this is less commonly talked about when it comes to Western dating, and so I want to share their story and what I learned from it with you. For background, I'll start with telling you what arranged marriage is actually like.
Although some parts of India still do the traditional "bride and groom don't meet until the wedding", these tend to be remote and rural parts. Most arranged marriages today function a little more like a blind date, but with your parents and their network finding you a match rather than your friends. On the more traditional end, families may set up a "bride viewing", which today functions like a first meeting where the parents introduce each half of the couple, then leave them alone to get to know each other. They later tell their parents if they agree to the marriage or not. On the more liberal end, a couple may go on many dates before agreeing. In some cases, young people will date and fall in love, and the parents will meet after and decide to "arrange" the marriage if all parties agree to it. In the case of my parents, my dad's cousin (who he was very close with) met my mother and thought they would be a good match due to compatible philosophical interests and tastes in literature. My mother had, at that point, not dated at all, despite being in graduate school; it is normal for young people in India to feel marriage is not something they have to worry too much about because they trust their families will find someone good for them. The fact that my dad's cousin met my mother and immediately thought of my father points at another way arranged marriages affect the culture: people are always on the lookout for a good match.
When you ask someone who has had an arranged marriage about love, the first thing they say is that the love will come naturally once the couple is married. As a child, I always found this thought strange. As I grew older, though, I noticed the truth of this in the stories my mother told me about her relationship early on with my father. When they married, he was living in the US, and she was finishing her master's in India; for the year it took to finish her degree, they wrote letters. The way they did this nourished their love for each other, and fostered growth in their relationship. Western romance is described as something that happens on accident, but arranged romance happens on purpose. Even relationships that start with falling in love can benefit from growing and deepening that bond in the same way. This happens because you water love like a plant, and give it the right kinds of nutrients so it can grow.
One of the values that my mom spoke to me about more explicitly is that of cultural compatibility. In India, marriage is arranged through the social network of the parents. Traditionally, this focused a lot on social standing and religion, because of the idea that families of the same groups will raise their children similarly, and have similar values. My parents both grew up valuing learning and knowledge. They would have been far less compatible with people who were more focused on material wealth, or spiritual minimalism. Because their families had similar values, they were each instilled with similar values. This is reinforced by the fact that India is a more collectivist culture, and thus it is thought that your family knows you better than anyone else. Those who know you best are more likely to have a sense of who you would get along with, whether they're related to you or not. Further, getting along with the people your partner cares about most is important in any long term relationship. The fact that my mom got along well with my dad's cousin was a good sign; my mom connected more with the rest of my dad's family after the marriage, even though my dad had to go back to the US. Whether the relationship is arranged or not, fostering individual relationships with the people your partner cares about helps strengthen your relationship.
Compatibility includes not only what you value, but also what you want. Around the time my mother was getting married, many people her age were talking about wanting to move to the US. She was one of the few who wasn't fussed; she felt she'd be just as happy continuing to live in India. Of course, when she met my dad, that changed. For the right person, she was willing to move. There are people who wouldn't have been willing to make that move for anything, and there are those who wanted to move so badly that they didn't want to marry anyone willing to stay. This can be applied to anything one might want out of life, from living situation to religion to children, and more. In Western romantic media, this is often portrayed as being heartless. Ultimately, though, it's about trade-offs. Does your love for the person really overpower how much you want something? That answer differs for everyone. You can say that love conquers all, but a mismatch in this type of compatibility is one of the most common causes for divorce in the US. Knowing what you want your life to look like before you find the person to spend it with is going to be easier than trying to convince someone else to change what they want.
Of course, compatibility is nothing if you're not also complementary. This is where modern dating begins to look like marketing: know your target audience, and know what they want. If you know what kind of values you want your partner to have, you might already have a vague sense of what they would be like as a person. Knowing what you provide is crucial, especially when it comes to things like online dating. Traditional gender roles cover this well if you fit neatly in to one or the other, but things don't work that way for everyone. Give that my dad lived in the US, the fact that he could provide citizenship was huge. But he would not have been satisfied with a marriage with someone who saw this as his biggest asset. The fact that my mother was not obsessed with moving to the US meant that their complementary focus had to happen elsewhere. They shared the value of intellectual engagement, but my dad was always more focused on abstract ideas, while my mother tended to think more concretely. Here was where they were able to complement each other, which gave their life together more balance, and helped foster their growth individually as well. Finding someone whose traits and skills complement yours can help cover areas of life you struggle with, provide perspective when needed, and encourage you to grow and learn new things.
As a child, I didn't see the story of my parents as a love story. Love stories were about falling madly, hopelessly, and deeply, all at once, and my parents never really had that. But as I grew older, I noticed the details of their relationship. When my dad bought her a nice dress, it was as much because he wanted to see her in it as it was because he knew she hated shopping. When she challenged his ideas, it was out of love and respect, more than anything else. When we did things together as a family, they made sure to take time to connect with each other as a couple, even if it was only briefly. And as I became more independent, they were able to spend more and more time together. Love that lasts over a lifetime doesn't stay the same; it grows and changes with you as you grow and change. Falling in love doesn't happen once, but again and again.


Do you like bullet points?

26 марта, 2019 - 07:30
Published on March 26, 2019 4:30 AM UTC

I think more naturally in bullet points, and I (sometimes) like reading posts that are written in bullet style. (This website is one of my favorites, and is written entirely in bullets).

(Disclaimer, although I wrote this post in bullet points because it was cute, I don't think it's the best exemplar of them. Or rather, it's an example of using bullet points to do rough thinking, rather than an example of using them to illustrate a complex argument)

I like bullet points because:

  • It's easier to skim, and build up a high level understanding of a post's structure. If you understand a concept you can skip it and move on, if you want to drill down and understand it better you can do so.
    • Relatedly, it exposes your cruxes more readily. You can pick out and refute points, in a way that can be harder with meandering prose.
  • It's easier to hash out early stage ideas. When I'm first thinking about something, my brain is jumping around and forming connections, developing a model at multiple levels of resolution. Bullet lists make this easier to keep track of.
    • I like this for other people's posts as well, since it feels more playful, like I can be part of their early generation process. I think LessWrong would be better if more people wrote more unpolished things to get early feedback on them, and bullet lists are a nice way to signal that something is still in development.
  • Prose often adds unnecessary cruft. In the transition from bullets-to-prose, posts can go 2x-3x as long (or, when I go to write a short bullet summary of something I wrote in prose, it turns out to be much shorter, and the prose mostly unnecessary)

I had assumed this was a common experience, and that it was in fact a weakness of humanity that we didn't have better, more comprehensive bullet-point tools.

But, alas, Typical Mind Fallacy. It turned out a couple people on the LessWrong team reacted very negatively to bullet points. Concerns include:

  • It's easy to think you've communicated more clearly than you have, because you didn't bother writing the connecting words between paragraphs.
  • They're harder to read straight through. If you include bold words, readers might not bother reading the non-bold words, and miss nuance.
  • "I like numbered arguments, since that makes it easier to respond to individual points. But unnumbered bullet lists are just hard to parse."
    • [Alas, the LessWrong website currently doesn't enable this very well because our Rich Editor's implementation of numbered lists was annoying]
  • "I dunno man it's just really hard to read. My brain keeps trying to collapse the bullets like they're code."

I asked a couple more people, and they said "I dunno, bullet points seem fine. Depends on the situation?"


I am curious what the LessWrong userbase thinks about them overall. Raise your hand if you think bullet points are fine? Terrible? Great? Any particular types of posts you prefer reading bullet-style, and types of posts you think fare poorly if not written in prose?


DanielFilan's Shortform Feed

26 марта, 2019 - 02:32
Published on March 25, 2019 11:32 PM UTC

Rationality-related writings that are more comment-shaped than post-shaped. Please don't leave top-level comments here unless they're indistinguishable to me from something I would say here.