Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 6 минут 50 секунд назад

Multi-agent predictive minds and AI alignment

13 декабря, 2018 - 02:48
Published on December 12, 2018 11:48 PM UTC

Abstract: An attempt to map a best-guess model of how human values and motivations work to several more technical research questions. The mind-model is inspired by predictive processing / active inference framework and multi-agent models of the mind.

The text has slightly unusual epistemic structure:

1st part: my current best-guess model of how human minds work.

2nd part: explores various problems which such mind architecture would pose for some approaches to value learning. The argument is: if such a model seems at least plausible, we should probably extend the space of active research directions.

3rd part: a list of specific research agendas, sometimes specific research questions, motivated by the previous.

I put more credence in the usefulness of research questions suggested in the third part than in the specifics of the model described the first part. Also, you should be warned I have no formal training in cognitive neuroscience and similar fields, and it is completely possible I’m making some basic mistakes. Still, my feeling is even if the model described in the first part is wrong, something from the broad class of “motivational systems not naturally described by utility functions” is close to reality, and understanding problems from the 3rd part can be useful.

How minds work

As noted, this is a “best guess model”. I have large uncertainty about how human minds actually work. But if I could place just one bet, I would bet on this.

The model has two prerequisite ideas: predictive processing and the active inference framework. I'll give brief summaries and links for elsewhere.

In the predictive processing / the active inference framework, brains constantly predict sensory inputs, in a hierarchical generative way. As a dual, action is also “generated” by the same machinery (changing environment to match “predicted” desirable inputs and generating action which can lead to them). The “currency” on which the whole system is running is prediction error (or something in style of free energy, in that language).

Another important ingredient is bounded rationality, i.e. a limited amount of resources being available for cognition. Indeed, the specifics of hierarchical modelling, neural architectures, principle of reusing and repurposing everything, all seem to be related to quite brutal optimization pressure, likely related to brain’s enormous energy consumption (It is unclear to me if this can be also reduced to the same “currency”. Karl Friston would probably answer "yes").

Assuming this whole, how do motivations and “values” arise? The guess is, in many cases something like a “subprogram” is modelling/tracking some variable, “predicting” its desirable state, and creating the need for action by “signalling” prediction error. Note that such subprograms can work on variables on very different hierarchical layers of modelling - e.g. tracking a simple variable like “feeling hungry” vs. tracking a variable like “social status”. Such sub-systems can be large: for example tracking “social status” seems to require lot of computation.

How does this relate to emotions? Emotions could be quite complex processes, where some higher-level modelling (“I see a lion”) leads to a response in lower levels connected to body states, some chemicals are released, and this interoceptive sensation is re-integrated in the higher levels in the form of emotional state, eventually reaching consciousness. Note that the emotional signal from the body is more similar to “sensory” data - the guess is body/low level responses are a way how genes insert a reward signal into the whole system.

How does this relate to our conscious experience, and stuff like Kahneman's System 1/System 2? It seems for most people the light of consciousness is illuminating only a tiny part of the computation, and most stuff is happening in the background. Also, S1 has much larger computing power. On the other hand it seems relatively easy to “spawn background processes” from the conscious part, and it seems possible to illuminate larger part of the background processing than is usually visible through specialized techniques and efforts (for example, some meditation techniques).

Another ingredient is the observation that a big part of what the conscious self is doing is interacting with other people, and rationalizing our behaviour. (Cf. press secretary theory, elephant in the brain.) It is also quite possible the relation between acting rationally and the ability to rationalize what we did is bidirectional, and significant part of motivation for some rational behaviour is that it is easy to rationalize it.

Also, it seems important to appreciate that the most important part of the human “environment” are other people, and what human minds are often doing is likely simulating other human minds (even simulating how other people would be simulating someone else!).

Problems with prevailing value learning approaches

While the above sketched picture is just a best guess, it seems to me at least compelling. At the same time, there are notable points of tension between it and at least some approaches to AI alignment.

No clear distinction between goals and beliefs

In this model, it is hardly possible to disentangle “beliefs” and “motivations” (or values). “Motivations” interface with the world only via a complex machinery of hierarchical generative models containing all other sorts of “beliefs”.
To appreciate the problems for the value learning program, consider a case of someone who’s predictive/generative model strongly predicts failure and suffering. Such person may take actions which actually lead to this outcome, minimizing the prediction error.

Less extreme but also important problem is that extrapolating “values” outside of the area of validity of generative models is problematic and could be fundamentally ill-defined. (This is related to “ontological crisis”.)

No clear self-alignment

It seems plausible the common formalism of agents with utility functions is more adequate for describing the individual “subsystems” than the whole human minds. Decisions on the whole mind level are more like results of interactions between the sub-agents; results of multi-agent interaction are not in general an object which is naturally represented by utility function. For example, consider the sequence of game outcomes in repeated PD game. If you take the sequence of game outcomes (e.g. 1: defect-defect, 2:cooperate-defect, ... ) as a sequence of actions, the actions are not representing some well behaved preferences, and in general not maximizing some utility function.

Note: This is not to claim VNM rationality is useless - it still has the normative power - and some types of interaction lead humans to approximate SEU optimizing agents better.

One case is if mainly one specific subsystem (subagent) is in control, and the decision does not go via too complex generative modelling. So, we should expect more VNM-like behaviour in experiments in narrow domains than in cases where very different sub-agents are engaged and disagree.
Another case is if sub-agents are able to do some “social welfare function” style aggregation, bargain, or trade - the result could be more VNM-like, at least in specific points of time, with the caveat that such “point” aggregate function may not be preserved in time.

On the contrary, cases where the resulting behaviour is very different from VNM-like may be caused by sub-agents locked in some non-cooperative Nash equilibria.

What we are aligning AI with

Given this distinction between the whole mind and sub-agents, there are at least four somewhat different notions of what alignment can mean.

1. Alignment with the outputs of the generative models, without querying the human. This includes for example proposals centered around approval. In this case, generally only the output of the internal aggregation has some voice.

2. Alignment with the outputs of the generative models, with querying the human. This includes for example CIRL and similar approaches. The problematic part of this is, by carefully crafted queries, it is possible to give voice to different sub-agenty systems (or with more nuance, give them very different power in the aggregation process). One problem with this is, if the internal human system is not self-aligned, the results could be quite arbitrary (and the AI agent has a lot of power to manipulate)

3. Alignment with the whole system, including the human aggregation process itself. This could include for example some deep NN based black-box trained on a large amount of human data, predicting what would the human want (or approve).

4. Adding layers of indirection to the question, such as defining alignment as a state where the “A is trying to do what H wants it to do.”

In practice, options 1. and 2. can collapse into one, as far as there is some feedback loop between the AI agent actions and the human reward signal. (Even in case 1, the agent can take an action with the intention to elicit feedback from some subpart.)

We can construct a rich space of various meanings of "alignment" by combining basic directions.

Now, we can analyze how these options interact with various alignment research programs.

Probably the most interesting case is IDA. IDA-like schemes can probably carry forward arbitrary properties to more powerful systems, as long as we are able to construct the individual step preserving the property. (I.e. one full cycle of distillation and amplification, which can be arbitrarily small).

Distilling and amplifying the alignment in sense #1 (what the human will actually approve) is conceptually easiest, but, unfortunately, brings some of the problems of potentially super-human system optimizing for manipulating the human for approval.

Alignment in sense #3 creates a very different set of problems. One obvious risk are mind-crimes. More subtle risk is related to the fact that as the implicit model of human “wants” scales (becomes less bounded), I. the parts may scale at different rates II. the outcome equilibria may change even if the sub-parts scale at the same rate.

Alignment in sense #4 seems more vague, and moves the burden of understanding the problem in part to the side of the AI. We can imagine that at the end the AI will be aligned with some part of the human mind in a self-consistent way (the part will be a fixed point of the alignment structure). Unfortunately, it is a priori unclear if a unique fixed point exists. If not, the problems become similar to case #2. Also, it seems inevitable the AI will need to contain some structure representing what the human wants the AI to do, which may cause problems similar to #3.

Also, in comparison with other meanings, it is much less clear to me how to even establish some system has this property.

Rider-centric and meme-centric alignment

Many alignment proposals seem to focus on interacting just with the conscious, narrating and rationalizing part of mind. If this is just a one part entangled in some complex interaction with other parts, there are specific reasons why this may be problematic.

One: if the “rider” (from the rider/elephant metaphor) is the part highly engaged with tracking societal rules, interactions and memes. It seems plausible the “values” learned from it will be mostly aligned with societal norms and interests of memeplexes, and not “fully human”.

This is worrisome: from a meme-centric perspective, humans are just a substrate, and not necessarily the best one. Also - a more speculative problem may be - schemes learning human memetic landscape and “supercharging it” with superhuman performance may create some hard to predict evolutionary optimization processes.

Metapreferences and multi-agent alignment

Individual “preferences” can often in fact be mostly a meta-preference to have preferences compatible with other people, based on simulations of such people.

This may make it surprisingly hard to infer human values by trying to learn what individual humans want without the social context (necessitating inverting several layers of simulation). If this is the case, the whole approach of extracting individual preferences from a single human could be problematic. (This is probably more relevant to some “prosaic” alignment problems)


Some of the above mentioned points of disagreements point toward specific ways how some of the existing approaches to value alignment may fail. Several illustrative examples:

  • Internal conflict may lead to inaction (also to not expressing approval or disapproval). While many existing approaches represent such situation only by the outcome of the conflict, the internal experience of the human seems to be quite different with and without the conflict
  • Difficulty with splitting “beliefs” and “motivations”.
  • Learning inadequate societal equilibria and optimizing on them.

On the positive side, it could be expected the sub-agents still easily agree on things like “it is better not to die a horrible death”.

Also, the mind-model with bounded sub-agents which interact only with their local neighborhood and do not actually care about the world may be a viable design from the safety perspective.

Suggested technical research directions

While the previous parts are more in backward-chaining mode, here I attempt to point toward more concrete research agendas and questions where we can plausibly improve our understanding either by developing theory, or experimenting with toy models based on current ML techniques.

Often it may be the case that some research was already done on the topic, just not with AI alignment in mind, and a high value work could be “importing the knowledge” into safety community.

Understanding hierarchical modelling.

It seems plausible the human hierarchical models of the world optimize some "boundedly rational" function. (Remembering all details is too expensive, too much coarse-graining decreases usefulness. A good bounded rationality model can work as a principle for how to select models. In a similar way to the minimum description length principle, just taking some more “human” (energy?) costs as cost function.)

Inverse Game Theory.

Inverting agent motivations in MDPs is a different problem from inverting motivations in multi-agent situations where game-theory style interactions occur. This leads to the inverse game theory problem: observe the interactions, learn the objectives.

Learning from multiple agents.

Imagine a group of five closely interacting humans. Learning values just from person A may run into the problem that big part of A’s motivation is based on A simulating B,C,D,E (on the same “human” hardware, just incorporating individual differences). In that case, learning the “values” just from A’s actions could be in principle more difficult than observing the whole group, trying to learn some “human universals” and some “human specifics”. A different way of thinking about this could be by making a parallel with meta-learning algorithms (e.g. REPTILE) but in IRL frame.

What happens if you put a system composed of sub-agents under optimization pressure?

It is not clear to me what would happen if you, for example, successfully “learn” such a system of “motivations” from a human, and then put it inside of some optimization process selecting for VNM-like rational behaviour.

It seems plausible the somewhat messy system will be forced to get more internally aligned; for example, one way how it can happen is one of the sub-agent systems takes control and “wipes out the opposition”.

What happens if you make a system composed of sub-agents less computationally bounded?

It is not clear that the relative powers of sub-agents will scale the same with the whole system becoming less computationally bounded. (This is related to MIRI’s sub-agents agenda)

Suggested non-technical research directions

Human self-alignment.

All other things being equal, it seem safer to try to align AI with humans which are self-aligned.

Notes & Discussion


Part of my motivation for writing this was an annoyance: there is a plenty of reasons to believe the view

  • human mind is a unified whole,
  • at first approximation optimizing some utility function,
  • this utility is over world-states,

is neither a good model of humans, nor the best model how to think about AI. Yet, it is the paradigm shaping a lot of thoughts and research. I hope if the annoyance surfaced in the text, it is not too distractive.

Multi-part minds in literature

There are dozens of schemes describing mind as some sort of multi-part system, so there is nothing original about this claim. Based on a very shallow review, it seems the way how psychologists often conceptualize the sub-agents is as subpersonalities, which are almost fully human. This seems to err on the side of sub-agents being too complex, and anthropomorphising instead of trying to describe formally. (Explaining humans as a composition of humans is not much useful for AI alignment). On the other hand, Minsky’s Society of Mind has sub-agents which often seem to be too simple (e.g. similar in complexity to individual logic gates). If there is some literature having sub-agent complexity right, and sub-agents being inside predictive processing, I’d be really excited about it!


When discussion the draft, several friends noted something along the line: “It is overdetermined that approaches like IRL are doomed. There are many reasons for that and the research community is aware of them”. To some extent, I agree this is the case, on the other hand 1. the described model of mind may pose problems even for more sophisticated approaches 2. My impression is many people still have something like utility-maximizing agent as a the central example.

The complementary objection is that while interacting sub-agents may be a more precise model, it seems in practice it is often enough to think about humans as unified agents is good enough, and may be good enough even for the purpose of AI alignment. My intuitions on this is based on the connection of rationality to exploitability: it seems humans are usually more rational and less exploitable when thinking about narrow domains, but can be quite bad when vastly different subsystems are in in play (imagine on one side a person exchanging stock and money, on the other side some units of money, free time, friendship, etc.. In the second case, many people are willing to trade in different situations by very different rates)

I’d like to thank Linda Linsefors , Alexey Turchin, Tomáš Gavenčiak, Max Daniel, Ryan Carey, Rohin Shah, Owen Cotton-Barratt and others for helpful discussions. Part of this originated in the efforts of the “Hidden Assumptions” team on the 2nd AI safety camp, and my thoughts about how minds work are inspired by CFAR.


What went wrong in this interaction?

13 декабря, 2018 - 00:52
Published on December 12, 2018 7:59 PM UTC

I'm curious about an interaction I had a few weeks ago with someone in the rationality community, I was wondering if someone here can look at the conversation and evaluate what 'went wrong' so to speak.

It began with some comments I made on a blog post, where I disagreed with the author that 'metoo' was good, but rather than discuss the entire point I wanted just to address some counterexamples to something the author said about metoo never having gone too far. The post is here: http://benjaminrosshoffman.com/metoo-is-good/

After a bit of back and forth, it seemed like I should try take it to private chat before it turned into a demon thread. This was the ensuing conversation: https://pastebin.com/epQmxZK2

It seems to me like my points were understood, and likewise I didn't quite get what the author was trying to make me understand.

To me, it seems to me like my points were understood, and likewise I didn’t quite get what the author was trying to make me understand.

The author also seemed hostile and unwilling to engage, and how he disengaged from the conversation seemed like a personal attack that was unjustified. But I’m biased, so I was wondering if it was something about my comments or behavior or tone that I was missing that provoked that response, or if I misread the hostility at all.

And any thoughts about why the author had that kind of a reaction? It was not what I expected since I thought most rational community members would welcome a honest discussion like the one I was trying to start.


Internet Search Tips: how I use Google/Google Scholar/Libgen

12 декабря, 2018 - 17:50
Published on December 12, 2018 2:50 PM UTC


Should ethicists be inside or outside a profession?

12 декабря, 2018 - 04:40
Published on December 12, 2018 1:40 AM UTC

Originally written in 2007.

Marvin Minsky in an interview with Danielle Egan for New Scientist:

Minsky: The reason we have politicians is to prevent bad things from happening. It doesn’t make sense to ask a scientist to worry about the bad effects of their discoveries, because they’re no better at that than anyone else. Scientists are not particularly good at social policy.Egan: But shouldn’t they have an ethical responsibility for their inventionsMinsky: No they shouldn’t have an ethical responsibility for their inventions. They should be able to do what they want. You shouldn’t have to ask them to have the same values as other people. Because then you won’t get them. They’ll make stupid decisions and not work on important things, because they see possible dangers. What you need is a separation of powers. It doesn’t make any sense to have the same person do both.

The Singularity Institute was recently asked to comment on this interview - which by the time it made it through the editors at New Scientist, contained just the unvarnished quote “Scientists shouldn’t have an ethical responsibility for their inventions. They should be able to do what they want. You shouldn’t have to ask them to have the same values as other people.” Nice one, New Scientist. Thanks to Egan for providing the original interview text.

This makes an interesting contrast with what I said in my “Cognitive biases” chapter for Bostrom’s Global Catastrophic Risks:

Someone on the physics-disaster committee should know what the term “existential risk” means; should possess whatever skills the field of existential risk management has accumulated or borrowed. For maximum safety, that person should also be a physicist. The domain-specific expertise and the expertise pertaining to existential risks should combine in one person. I am skeptical that a scholar of heuristics and biases, unable to read physics equations, could check the work of physicists who knew nothing of heuristics and biases.

Should ethicists be inside or outside a profession?

It seems to me that trying to separate ethics and engineering is like trying to separate the crafting of paintings into two independent specialties: a profession that’s in charge of pushing a paintbrush over a canvas, and a profession that’s in charge of artistic beauty but knows nothing about paint or optics.

The view of ethics as a separate profession is part of the problem. It arises, I think, from the same deeply flawed worldview that sees technology as something foreign and distant, something opposed to life and beauty. Technology is an expression of human intelligence, which is to say, an expression of human nature. Hunter-gatherers who crafted their own bows and arrows didn’t have cultural nightmares about bows and arrows being a mechanical death force, a blank-faced System. When you craft something with your own hands, it seems like a part of you. It’s the Industrial Revolution that enabled people to buy artifacts which they could not make or did not even understand.

Ethics, like engineering and art and mathematics, is a natural expression of human minds.

Anyone who gives a part of themselves to a profession discovers a sense of beauty in it. Writers discover that sentences can be beautiful. Programmers discover that code can be beautiful. Architects discover that house layouts can be beautiful. We all start out with a native sense of beauty, which already responds to rivers and flowers. But as we begin to create - sentences or code or house layouts or flint knives - our sense of beauty develops with use.

Like a sense of beauty, one’s native ethical sense must be continually used in order to develop further. If you’re just working at a job to make money, so that your real goal is to make the rent on your apartment, then neither your aesthetics nor your morals are likely to get much of a workout.

The way to develop a highly specialized sense of professional ethics is to do something, ethically, a whole bunch, until you get good at both the thing itself and the ethics part.

When you look at the “bioethics” fiasco, you discover bioethicists writing mainly for an audience of other bioethicists. Bioethicists aren’t writing to doctors or bioengineers, they’re writing to tenure committees and journalists and foundation directors. Worse, bioethicists are not using their ethical sense in bio-work, the way a doctor whose patient might have incurable cancer must choose how and what to tell the patient.

A doctor treating a patient should not try to be academically original, to come up with a brilliant new theory of bioethics. As I’ve written before, ethics is not supposed to be counterintuitive, and yet academic ethicists are biased to be just exactly counterintuitive enough that people won’t say, “Hey, I could have thought of that.” The purpose of ethics is to shape a well-lived life, not to be impressively complicated. Professional ethicists, to get paid, must transform ethics into something difficult enough to require professional ethicists.

It’s, like, a good idea to save lives? “Duh,” the foundation directors and the review boards and the tenure committee would say.

But there’s nothing duh about saving lives if you’re a doctor.

A book I once read about writing - I forget which one, alas - observed that there is a level of depth beneath which repetition ceases to be boring. Standardized phrases are called “cliches” (said the author of writing), but murder and love and revenge can be woven into a thousand plots without ever becoming old. “You should save people's lives, mmkay?” won’t get you tenure - but as a theme of real life, it’s as old as thinking, and no more obsolete.

Boringly obvious ethics are just fine if you’re using them in your work rather than talking about them. The goal is to do it right, not to do it originally. Do your best whether or not it is “original”, and originality comes in its own time; not every change is an improvement, but every improvement is necessarily a change.

At the Singularity Summit 2007, several speakers alleged we should “reach out” to artists and poets to encourage their participation in the Singularity dialogue. And then a woman went to a microphone and said: “I am an artist. I want to participate. What should I do?”

And there was a long, delicious silence.

What I would have said to a question like that, if someone had asked it of me in the conference lobby, was: “You are not an ‘artist’, you are a human being; art is only one facet in which you express your humanity. Your reactions to the Singularity should arise from your entire self, and it’s okay if you have a standard human reaction like ‘I’m afraid’ or ‘Where do I send the check?’, rather than some special ‘artist’ reaction. If your artistry has something to say, it will express itself naturally in your response as a human being, without needing a conscious effort to say something artist-like. I would feel patronized, like a dog commanded to perform a trick, if someone presented me with a painting and said ‘Say something mathematical!’”

Anyone who calls on “artists” to participate in the Singularity clearly thinks of artistry as a special function that is only performed in Art departments, an icing dumped onto cake from outside. But you can always pick up some cheap applause by calling for more icing on the cake.

Ethicists should be inside a profession, rather than outside, because ethics itself should be inside rather than outside. It should be a natural expression of yourself, like math or art or engineering. If you don’t like trudging up and down stairs you’ll build an escalator. If you don’t want people to get hurt, you’ll try to make sure the escalator doesn’t suddenly speed up and throw its riders into the ceiling. Both just natural expressions of desire.

There are opportunities for market distortions here, where people get paid more for installing an escalator than installing a safe escalator. If you don’t use your ethics, if you don’t wield them as part of your profession, they will grow no stronger. But if you want a safe escalator, by far the best way to get one - if you can manage it - is to find an engineer who naturally doesn’t want to hurt people. Then you’ve just got to keep the managers from demanding that the escalator ship immediately and without all those expensive safety gadgets.

The first iron-clad steamships were actually much safer than the Titanic; the first ironclads were built by engineers without much management supervision, who could design in safety features to their heart’s content.  The Titanic was built in an era of cutthroat price competition between ocean liners.  The grand fanfare about it being unsinkable was a marketing slogan like “World’s Greatest Laundry Detergent”, not a failure of engineering prediction.

Yes, safety inspectors, yes, design reviews; but these just verify that the engineer put forth an effort of ethical design intelligence. Safety-inspecting doesn’t build an elevator. Ethics, to be effective, must be part of the intelligence that expresses those ethics - you can’t add it in like icing on a cake.

Which leads into the question of the ethics of AI. “Ethics, to be effective, must be part of the intelligence that expresses those ethics - you can’t add it in like icing on a cake.” My goodness, I wonder how I could have learned such Deep Wisdom?

Because I studied AI, and the art spoke to me.  Then I translated it back into English.

The truth is that I can’t inveigh properly on bioethics, because I am not myself a doctor or a bioengineer. If there is a special ethic of medicine, beyond the obvious, I do not know it. I have not worked enough healing for that art to speak to me.

What I do know a thing or two about, is AI. There I can testify definitely and from direct knowledge, that anyone who sets out to study “AI ethics” without a technical grasp of cognitive science, is absolutely doomed.

It’s the technical knowledge of AI that forces you to deal with the world in its own strange terms, rather than the surface-level concepts of everyday life. In everyday life, you can take for granted that “people” are easy to identify; if you look at the modern world, the humans are easy to pick out, to categorize. An unusual boundary case, like Terri Schiavo, can throw a whole nation into a panic: Is she “alive” or “dead”? AI explodes the language that people are described of, unbundles the properties that are always together in human beings. Losing the standard view, throwing away the human conceptual language, forces you to think for yourself about ethics, rather than parroting back things that sound Deeply Wise.

All of this comes of studying the math, nor may it be divorced from the math. That’s not as comfortably egalitarian as my earlier statement that ethics isn’t meant to be complicated. But if you mate ethics to a highly technical profession, you’re going to get ethics expressed in a conceptual language that is highly technical.

The technical knowledge provides the conceptual language in which to express ethical problems, ethical options, ethical decisions. If politicians don’t understand the distinction between terminal value and instrumental value, or the difference between a utility function and a probability distribution, then some fundamental problems in Friendly AI are going to be complete gibberish to them - never mind the solutions. I’m sorry to be the one to say this, and I don’t like it either, but Lady Reality does not have the goal of making things easy for political idealists.

If it helps, the technical ethical thoughts I’ve had so far require only comparatively basic math like Bayesian decision theory, not high-falutin’ complicated damn math like real mathematicians do all day. Hopefully this condition does not hold merely because I am stupid.

Several of the responses to Minsky’s statement that politicians should be the ones to “prevent bad things from happening” were along the lines of “Politicans are not particularly good at this, but neither necessarily are most scientists.” I think it’s sad but true that modern industrial civilization, or even modern academia, imposes many shouting external demands within which the quieter internal voice of ethics is lost. It may even be that a majority of people are not particularly ethical to begin with; the thought seems to me uncomfortably elitist, but that doesn’t make it comfortably untrue.

It may even be true that most scientists, say in AI, haven’t really had a lot of opportunity to express their ethics and so the art hasn’t said anything in particular to them.

If you talk to some AI scientists about the Singularity / Intelligence Explosion they may say something cached like, “Well, who’s to say that humanity really ought to survive?” This doesn’t sound to me like someone whose art is speaking to them. But then artificial intelligence is not the same as artificial general intelligence; and, well, to be brutally honest, I think a lot of people who claim to be working in AGI haven’t really gotten all that far in their pursuit of the art.

So, if I listen to the voice of experience, rather to the voice of comfort, I find that most people are not very good at ethical thinking. Even most doctors - who ought properly to be confronting ethical questions in every day of their work - don’t go on to write famous memoirs about their ethical insights. The terrifying truth may be that Sturgeon’s Law applies to ethics as it applies to so many other human endeavors: “Ninety percent of everything is crap.”

So asking an engineer an ethical question is not a sure-fire way to get an especially ethical answer. I wish it were true, but it isn’t.

But what experience tells me, is that there is no way to obtain the ethics of a technical profession except by being ethical inside that profession. I’m skeptical enough of nondoctors who propose to tell doctors how to be ethical, but I know it’s not possible in AI. There are all sorts of AI-ethical questions that anyone should be able to answer, like “Is it good for a robot to kill people? No.” But if a dilemma requires more than this, the specialist ethical expertise will only come from someone who has practiced expressing their ethics from inside their profession.

This doesn’t mean that all AI people are on their own. It means that if you want to have specialists telling AI people how to be ethical, the “specialists” have to be AI people who express their ethics within their AI work, and then they can talk to other AI people about what the art said to them.

It may be that most AI people will not be above-average at AI ethics, but without technical knowledge of AI you don’t even get an opportunity to develop ethical expertise because you’re not thinking in the right language. That’s the way it is in my profession. Your mileage may vary.

In other words:  To get good AI ethics you need someone technically good at AI, but not all people technically good at AI are automatically good at AI ethics. The technical knowledge is necessary but not sufficient to ethics.

What if you think there are specialized ethical concepts, typically taught in philosophy classes, which AI ethicists will need? Then you need to make sure that at least some AI people take those philosophy classes. If there is such a thing as special ethical knowledge, it has to combine in the same person who has the technical knowledge.

Heuristics and biases are critically important knowledge relevant to ethics, in my humble opinion. But if you want that knowledge expressed in a profession, you’ll have to find a professional expressing their ethics and teach them about heuristics and biases - not pick a random cognitive psychologist off the street to add supervision, like so much icing slathered over a cake.

My nightmare here is people saying, “Aha! A randomly selected AI researcher is not guaranteed to be ethical!” So they turn the task over to professional “ethicists” who are guaranteed to fail: who will simultaneously try to sound counterintuitive enough to be worth paying for as specialists, while also making sure to not think up anything really technical that would scare off the foundation directors who approve their grants.

But even if professional “AI ethicists” fill the popular air with nonsense, all is not lost. AIfolk who express their ethics as a continuous, non-separate, non-special function of the same life-existence that expresses their AI work, will yet learn a thing or two about the special ethics pertaining to AI. They will not be able to avoid it. Thinking that ethics is a separate profession which judges engineers from above, is like thinking that math is a separate profession which judges engineers from above. If you’re doing ethics right, you can’t separate it from your profession.


A hundred Shakespeares

12 декабря, 2018 - 02:11
Published on December 11, 2018 11:11 PM UTC

In his post on science slowing down, Scott said:

  • "Are there a hundred Shakespeare-equivalents around today? This is a harder problem than it seems – Shakespeare has become so venerable with historical hindsight that maybe nobody would acknowledge a Shakespeare-level master today even if they existed – but still, a hundred Shakespeares?"

I'd argue that there are way more than a hundred Shakespeares around today, and there were several in Shakespeare's time. By Shakespeares, I mean authors who could have produced works of comparable quality to Shakespeare, by some reasonable measure of quality.

This seems surprising; there do not seem to be hundred living authors that are almost universally agreed to be must-reads in the same way that Shakespeare was.

But this lack hints at a resolution of the paradox: we just don't have space for a hundred authors with the same fervour as we make space for Shakespeare. Neither as individuals nor as cultures can we fit these in. Shakespeare was a literary superstar. And superstars are rare, due to network effects and the power law of fame.

So my thesis would be that:

  • There are many non-superstars who could plausibly have become superstars, and if they had done, they would produce works of comparable quality to the superstars.

Part of this is the halo effect: superstars just get judged as better than anyone else.

Also, just by being famous, the interpretation of their work is altered. Bits of Shakespeare have permeated popular culture, and many articles and theories have been created about him. When we watch a Shakespeare play, we don't just see the words; we see the layers of cultural meaning and interpretation that have accumulated on it.

I'd argue that, just by knowing that a play is by Shakespeare, we assume that it's deep and meaningful, and read in deeper interpretations and symbolism than we would otherwise. If we rediscovered two old plays, and they were word for word identical, but one was believed to be by Shakespeare and the other by some forgotten minor playwright, I'd expect that the first one would be a better play, just by what the audience would bring to it.

Apart from those effects, superstars have the unique ability to focus more on their own vision. They have great self-confidence, and they can afford to trust that their audiences will have the patience to follow them where they want to go - rather than expecting immediate literary gratification. This would tend to result in works that are better than the average work of someone of equivalent skill, and more likely to be "deep", "insightful", or "timeless". This effect might be even more obvious with bloggers than with authors.

So, though the number of superstars is severely limited, the number of potential superstars of equivalent skill can and most likely does increase with population.

Superstars in science

I'd argue that there's also a superstar effect in science. But here it combines with Scott's explanation 3: low hanging fruit. Newton did not come up with general relativity; Einstein didn't find quantum field theory; Tesla didn't invent the laser. You can't develop an idea until certain pre-requisites are met.

And, unlike those solitary geniuses, most of science and technology is collaborative. Superstars get to be part of the best teams, interact with the best other scientists, and are more free to focus on the biggest, sexiest problems. I expect that there are many non-superstars who would have developed a certain part of theory, if a superstar hadn't got there first. It seems plausible to me that a single scientific superstar could have done the equivalent of derailing a hundred promising careers, just by getting to the key insight faster - without necessarily being much smarter (if at all) than the ones they preempted.

Then, as discoveries pour in from superstars, and the far less productive non-superstars, the domain of science changes, and new avenues of discovery open up. And these new avenues are going to be claimed by the next generation of superstars, who will get there first. I expect that if we removed every single superstar of science in the last two hundred years, that we'd get roughly comparable scientific progress, with alternate superstars rising to the fore.


Norms of Membership for Voluntary Groups

12 декабря, 2018 - 01:10
Published on December 11, 2018 10:10 PM UTC

Epistemic Status: Idea Generation

One feature of the internet that we haven’t fully adapted to yet is that it’s trivial to create voluntary groups for discussion.  It’s as easy as making a mailing list, group chat, Facebook group, Discord server, Slack channel, etc.

What we don’t seem to have is a good practical language for talking about norms on these mini-groups — what kind of moderation do we use, how do we admit and expel members, what kinds of governance structures do we create.

Maybe this is a minor thing to talk about, but I suspect it has broader impact. In past decades voluntary membership in organizations has declined in the US — we’re less likely to be members of the Elks or of churches or bowling leagues — so lots of people who don’t have any experience in founding or participating in traditional types of voluntary organizations are now finding themselves engaged in governance without even knowing that’s what they’re doing.

When we do this badly, we get “internet drama.”  When we do it really badly, we get harassment campaigns and calls for regulation/moderation at the corporate or even governmental level.  And that makes the news.  It’s not inconceivable that Twitter moderation norms affect international relations, for instance.

It’s a traditional observation about 19th century America that Americans were eager joiners of voluntary groups, and that these groups were practice for democratic participation.  Political wonks today lament the lack of civic participation and loss of trust in our national and democratic institutions. Now, maybe you’ve moved on; maybe you’re a creature of the 21st century and you’re not hoping to restore trust in the institutions of the 20th. But what will be the institutions of the future?  That may well be affected by what formats and frames for group membership people are used to at the small scale.

It’s also relevant for the future of freedom.  It’s starting to be a common claim that “give people absolute ‘free speech’ and the results are awful; therefore we need regulation/governance at the corporate or national level.”  If you’re not satisfied with that solution (as I’m not), you have work to do — there are a lot of questions to unpack like “what kind of ‘freedom’, with what implementational details, is the valuable kind?”, “if small-scale voluntary organizations can handle some of the functions of the state, how exactly will they work?”, “how does one prevent the outcomes that people consider so awful that they want large institutions to step in to govern smaller groups?”

Thinking about, and working on, governance for voluntary organizations (and micro-organizations like online discussion groups) is a laboratory for figuring this stuff out in real time, with fairly low resource investment and risk. That’s why I find this stuff fascinating and wish more people did.

The other place to start, of course, is history, which I’m not very knowledgeable about, but intend to learn a bit.  David Friedman is the historian I’m familiar with who’s studied historical governance and legal systems with an eye to potential applicability to building voluntary governance systems today; I’m interested in hearing about others. (Commenters?)

In the meantime, I want to start generating a (non-exhaustive list) of types of norms for group membership, to illustrate the diversity of how groups work and what forms “expectations for members” can take.

We found organizations based on formats and norms that we’ve seen before.  It’s useful to have an idea of the range of formats that we might encounter, so we don’t get anchored on the first format that comes to mind.  It’s also good to have a vocabulary so we can have higher-quality disagreements about the purpose & nature of the groups we belong to; often disagreements seem to be about policy details but are really about the overall type of what we want the group to be.

Civic/Public Norms

  • Roughly everybody is welcome to join, and free to do as they like in the space, so long as they obey a fairly minimalist set of ground rules & behavioral expectations that apply to everyone.
  • We expect it to be easy for most people to follow the ground rules; you have to be deviant (really unusually antisocial) to do something egregious enough to get you kicked out or penalized.
  • If you dislike someone’s behavior but it isn’t against the ground rules, you can grumble a bit about it, but you’re expected to tolerate it. You’ll have to admit things like “well, he has a right to do that.”
  • Penalties are expected to be predictable, enforced the same way towards all people, and “impartial” (not based on personal relationships). If penalties are enforced unfairly, you’re not expected to tolerate it — you can question why you’re being penalized, and kick up a public stink, and it’s even praiseworthy to do so.
  • Examples: “rule of law”, public parks and libraries, stores and coffeeshops open to the public, town hall meetings

Guest Norms

  • The host can invite, or not invite, anyone she chooses, based on her preference.  She doesn’t have to justify her preferences to anyone.  Nobody is entitled to an invitation, and it’s very rude to complain about not being invited.
  • Guests can also choose to attend or not attend, based on their preferences, and they don’t have to justify their preferences to anyone either; it’s rude to complain or ask for justification when someone declines an invitation.
  • Personal relationships and subjective feelings, in particular, are totally legitimate reasons to include or exclude someone.
  • The atmosphere within the group is expected to be pleasant for everyone.  If you don’t want to be asked to leave, you shouldn’t do things that will predictably bother people.
  • Hosts are expected to be kind and generous to guests; guests are expected to be kind and generous to the host and each other; the host is responsible for enforcing boundaries.
  • Criticizing other people at the gathering itself is taboo. You’re expected to do your critical/judgmental pruning outside the gathering, by deciding whom you will invite or whether you’ll attend.
  • We don’t expect that everyone will be invited to be a guest at every gathering, or that everyone will attend everything they’re invited to. It can be prestigious to be invited to some gatherings, and embarrassing to be asked to leave or passed over when you expected an invitation, but it’s normal to just not be invited to some things.
  • Examples: private parties, invitation-only events, consent ethics for sex

Kaizen Norms

  • Members of the group are expected to be committed to an ideal of some kind of excellence and to continually strive to reach it.
  • Feedback or critique on people’s performance is continuous, normal, and not considered inherently rude. It’s considered praiseworthy to give high-quality feedback and to accept feedback willingly.
  • Kaizen groups may have very specific norms about the style or format of critique/feedback that’s welcome, and it may well be considered rude to give feedback in the wrong style.
  • Receiving some negative feedback or penalties is normal and not considered a sign of failure or shame.  What is shameful is responding defensively to negative feedback.
  • You can lose membership in the group by getting too much negative feedback (in other words, failing to live up to the minimum standards of the group’s ideal.)  It’s not expected to be easy for most people to meet these standards; they’re challenging by design.  The group isn’t expected to be “for everyone.”
  • The feedback and incentive processes are supposed to correlate tightly to the ideal. It’s acceptable and even praiseworthy to criticize those processes if they reward and punish people for things unrelated to the ideal.
  • Conflict about things unrelated to the ideal isn’t taboo, but it’s somewhat discouraged as “off-topic” or a “distraction.”
  • Examples: competitive/meritocratic school and work environments, sports teams, specialized religious communities (e.g. monasteries, rabbinical schools)

Coalition Norms

  • The degree to which one is “welcome” in the coalition is the degree to which one is loyal, i.e. contributes resources to the coalition.  (Either by committing one’s own resources or by driving others to contribute their resources.   The latter tends to be more efficient, and hence makes you more “welcome.”)
  • Membership is a matter of degree, not a hard-and-fast boundary.  The more solidly loyal a member you are, the more of the coalition’s resources you’re entitled to.  (Yes, this means membership is defined recursively, like PageRank.)
  • People can be penalized or expelled for not contributing enough, or for doing things that have the effect of preventing the coalition gaining resources (like making it harder to recruit new members.)
  • Conflict, complaint, and criticism over the growth of the coalition (and whether people are contributing enough, or whether they’re taking more than their fair share) is acceptable and even praiseworthy; criticisms about other things are discouraged, because they make people less willing to contribute resources or pressure others to do so.
  • Membership in the coalition is considered praiseworthy.  Non-membership is considered shameful.
  • Examples: political coalitions, proselytizing religions

Tribal Norms

  • Membership in the group is defined by an immutable, unchosen characteristic, like sex or heredity (or, to a lesser extent, geographic location.)  It is difficult to join, leave, or be expelled from the group; you are a member as a matter of fact, regardless of what you want or how you behave.
  • It’s not considered shameful not to be a member of the group; after all, it isn’t up to you.
  • Since expulsion is difficult, behavioral norms for the group are maintained primarily by persuasion/framing, reward, and punishment, so these play a larger role than they do in voluntary groups.  Important norms are framed as commandments or simply how things are.
  • Examples: families, public schools, governments, traditional cultures

Some comparisons-and-contrasts:

Honor and Shame

Kaizen and Guest group norms say that being a member of the group is an honor and comes with high expectations, but that not being a member is normal and not especially shameful.

Civic norms say that being a member of the group is normal and easy to attain, but not being a member is shameful, because it indicates egregiously bad behavior.

Coalition norms say that being a member is an honor and comes with high expectations and that not being a member is shameful.  This means that most people will have something to be ashamed of.

Tribal norms say that being a member is not an honor (though it may be a privilege), and that not being a member is no shame.


Civic and Kaizen norms say that it’s okay to protest “unfair” treatment by the governing body.  In a Civic context, “fair” means “it’s possible for everyone to stay out of trouble by following the rules” — it’s okay for rules to be arbitrary, but they should be clear and consistent and not so onerous that most people can’t follow them.  In a Kaizen context, “fair” means “corresponding to the ideal” — it’s okay to “not do things by the book”  if that gets you better performance, but it’s not okay if you’re rewarding bad performance and punishing good.

Guest and Coalition norms say that it’s not okay to protest “unfair” treatment; if you get kicked out, arguing can’t help you get back in.  Offering the decisionmakers something they value might work, though.

In Tribal norms, protest and argument can be either licit or taboo; it depends on the specific tribe and its norms.

Examples of debates that are about what type of group you want to be in:

Asking for “inclusiveness” is usually a bid to make the group more Civic or Coalitional.

Making accusations of “favoritism” is usually a bid to make the group more Civic or Kaizen.

Complaining about “problem members” is usually a bid to make the group more Coalitional, Guest, or Kaizen.

Not A Taxonomy

I don’t think these are the definitive types of groups. The idea is to illustrate how you can have different starting assumptions about what kind of thing the group is for. (Is it for achieving a noble goal? For providing a public forum or service open to all? For meeting the needs of its members?)

I suspect these kinds of aims are prior to mechanisms (things like “what is a bannable offense” or “what incentive systems do we set up”?)  Before diving into the technical stuff about the rules of the game, you want to ask what kinds of outcomes or group dynamics you want the “game structure” to achieve.



Quantum immortality: Is decline of measure compensated by merging timelines?

11 декабря, 2018 - 22:39
Published on December 11, 2018 7:39 PM UTC

I wrote an article about the quantum immortality which, I know, is a controversial topic, and I would like to get comments on it. The interesting twist, suggested in the article, is the idea of measure increase which could compensate declining measure in quantum immortality. (There are other topics in the article, like the history of QM, its relation to the multiverse immortality, the utility of cryonics, impossibility of euthanasia and the relation of QI to different decision theories.)

The standard argument against quantum immortality in MWI runs as following. One should calculate the expected utility by multiplying the expected gain on the measure of existence (roughly equal to the one's share of the world’s timelines).  In that case, if someone expects to win 10.000 USD in the  Quantum suicide lottery with 0.01 chance of survival, her actual expected utility is 100 USD (ignoring negutility of death).  So, the rule of thumb is that the measure declines very quickly after series of quantum suicide experiments, and thus this improbable timeline should be ignored. The following equation could be used for U(total) = mU, where m is measure and U is expected win in the lottery. 

However, if everything possible exists in the multiverse, there are many my pseudo-copies, which differ from me in a few bits, for example, they have a different phone number or different random child memory. The difference is small but just enough for not regard them as my copies.

Imagine that this different child memory is 1kb (if compressed) size. Now, one morning both me and all my pseudo-copies forget this memory, and all we become exactly the same copies. In some sense, our timelines merged. This could be interpreted as a jump in my measure, which will as high as 2power1024 = (roughly) 10E300. If I use the equation U(total) = mU I can get an extreme jump of my utility. For example, I have 100 USD and now my measure increased trillion of trillion of times, I supposedly get the same utility as if I become mega-multi-trillioner. 

As a result of this absurd conclusion, I can spend the evening hitting my head with a stone and thus losing more and more memories, and getting higher and higher measure, which is obviously absurd behaviour for a human being - but could be a failure mode for an AI, which uses the equation to calculate the expected utility. 

In case of the Quantum suicide experiment, I can add to the bomb, which kills me with 0.5 probability, also a laser, which kills just one neuron in my brain (if I survive), which - let's assume it - is equal to forgetting 1 bit of information. In that case, QS reduces my measure in half, but forgetting one bit increases it in half. Obviously, if I play the game for too long, I will damage my brain by the laser, but anyway, brain cells are dying so often in aging brain (millions a day), that it will be completely non-observable.

BTW, Pereira suggested the similar idea as an anthropic argument against existence of any superintelligence https://arxiv.org/abs/1705.03078


Bounded rationality abounds, not explicitly defined

11 декабря, 2018 - 22:34
Published on December 11, 2018 7:34 PM UTC

Last night, I did not register a patent to cure all forms of cancer. Even though it’s probably possible to figure such a cure out, from basic physics and maybe a download of easily available biology research papers.

Can we then conclude that I don’t want cancer to be cured – or, alternatively, that I am pathologically modest and shy, and thus don’t want the money and fame that would accrue?

No. The correct and obvious answer is that I am boundedly rational. And though an unboundedly rational agent – and maybe a superintelligence – could figure out a cure for cancer from first principles, poor limited me certainly can’t.

Modelling bounded rationality is tricky, and it is often accomplished by artificially limiting the action set. Many economic models feature agents that are assumed to be fully rational, but who are restricted to choosing between a tiny set of possible goods or lotteries. They don’t have the options of developing new technologies, rousing the population to rebellion, going online and fishing around for functional substitutes, founding new political movements, begging, befriending people who already have the desired goods, setting up GoFundMe pages, and so on.

There’s nothing wrong with modelling bounded rationality via action set restriction, as long as we’re aware of what we’re doing. In particular, we can’t naively conclude that because a such a model fits with observation, that therefore humans actually are fully rational agents. In particular, though economists are right that humans are more rational than we might naively suppose, thinking of us as rational, or “mostly rational”, is a colossally erroneous way of thinking. In terms of achieving our goals, as compared with a rational agent, we are barely above agents acting randomly.

Another problem with using small action sets, is that it may lead us to think that an AI might be similarly restricted. That is unlikely to be the case; an intelligent robot walking around would certainly have access to actions that no human would, and possibly ones we couldn’t easily imagine.

Finally, though action set reduction can work well in toy models, it is wrong about the world and about humans. So as we make more and more sophisticated models, there will come a time when we have to discard it, and tackle head-on the difficult issue of defining bounded rationality properly. And it’s mainly for this last point I’m writing this post; we’ll never see the necessity of better ways of defining bounded rationality, unless we realise that modelling it via action set restriction is a) common, b) useful, and c) wrong.


Figuring out what Alice wants: non-human Alice

11 декабря, 2018 - 22:31
Published on December 11, 2018 7:31 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

I’ve shown that we cannot deduce the preferences of a potentially irrational agent. Even simplicity priors don’t help. We need to make extra ‘normative’ assumptions in order to be able to say anything about these preferences.

I then presented a more intuitive example, in which Alice was playing poker, and had two possible beliefs about Bob’s hand, and two possible preferences: wanting money, or wanting Bob (which, in that situations, translated into wanting to lose to Bob).

That example illustrated the impossibility result, within the narrow confines of that situation – if Alice calls, she could be a money-maximiser expecting to win, or a love-maximiser expecting to lose.

As has been pointed out, this uncertainty doesn’t really persist if we move beyond the initial situation. If Alice was motivated by love or money, we would expect to be able to tell which one, by seeing what she does in other situations – how does she respond to Bob’s flirtations, what does she confess to her closest friends, how does she act if she catches a peek of Bob’s cards, etc…

So if we look at her more general behaviour, it seems that we have two possible versions of Alice. First, Am, who clearly wants money, and A♡, who clearly wants Bob. The actions of these two agents match up in the specific case I described, but not in general. Doesn’t this undermine my claim that we can’t tell the preferences of an agent from their actions?

What’s actually happening here is that we’re already making a lot of extra assumptions when we’re interpreting Am or A♡’s actions. We model other humans in very specific and narrow ways, and other humans do the same – and their models are very similar to ours (consider how often humans agree that another human is angry, or that being drunk impairs rationality). The agreement isn’t perfect, but is much better than random.

If we set those assumptions aside, then we can see what the theorem implies. There is a possible agent A′m, whose preference is for love, but that nevertheless acts identically to Am (and the reverse for money-loving A′♡ versus A♡). A′m and A′♡ are perfectly plausible agents – they just aren’t ‘human’ according to our models of what being human means.

It’s because of this that I’m somewhat optimistic we can solve the value learning problem, and why I often say the problem is “impossible in theory, but doable in practice”. Humans make a whole host of assumptions that allow them to interpret the preferences of other humans (and of themselves). And these assumptions are quite similar from human to human. So we don’t need to solve the value learning problem in some principled way, nor figure out the necessary assumptions abstractly. Instead, we just need to extract the normative assumptions that humans are already making and use these in the value learning process (and then resolve all the contradictions within human values, but that seems doable if messy).


Assuming we've solved X, could we do Y...

11 декабря, 2018 - 21:13
Published on December 11, 2018 6:13 PM UTC

The year is 1933. Leó Szilárd has just hypothised the nuclear chain reaction. Worried researchers from proto-MIRI or proto-FHI ask themselves "assuming we've solved the issue of nuclear chain reactions in practice, could we build a nuclear bomb out of it"?

Well, what do we mean by "assuming we've solved the issue of nuclear chain reactions"? Does it mean that "we have some detailed plans for viable nuclear bombs, including all the calculations needed to make them work, and everything in the plans is doable by a rich industrial state"? In that case, the answer to "could we build a nuclear bomb out of it?" is a simple and trivial yes.

Alternatively, are we simply assuming "there exists a collection of matter that supports a chain reaction"? In which case, note that the assumption is (almost) completely useless. In order to figure out whether a nuclear bomb is buildable, we still need to figure out all the details of chain reactions - that assumption has bought us nothing.

Assuming human values...

At the recent AI safety unconference, David Krueger wanted to test, empirically, whether debate methods could be used for creating aligned AIs. At some point in the discussion, he said "let's assume the question of defining human values is solved", wanting to move on to whether a debate-based AI could then safely implement it.

But as above, when we assume that an underdefined definition problem (human values) is solved, we have to be very careful what we mean - the assumption might be useless, or might be too strong, and end up solving the implementation problem entirely.

In the conversation with David, we were imagining a definition of human values related to what humans would answer if we could reflexively ponder specific questions for thousands of years. One could object to that definition on the grounds that people can be coerced or tricked into giving the answers that the AI might want - hence the circumstances of that pondering is critical.

If we assume X="human values are defined in this way", could an AI safely implement X via debate methods? Well, what about coercion and trickery by the AI during the debate process? It could be that X doesn't help at all, because we still have to resolve all of the same issues.

Or, conversely, X might be too strong - it might define what trickery is, which solves a lot of the implementation problem for free. Or, in the extreme case, maybe X is expressed in computer code, and solve all the contradictions within humans, dealing with ontology issues, population changes, what an agent is, and all other subtleties. Then the question "given X, could an AI safely implement it?" reduces to "can the AI run code?"

In summary, when the issue is underdefined, the boundary between definition and implementation is very unclear, and assuming that one of them is solved is very unclear.

How to assume (for the good of all of us)

The obvious way around this issue is to be careful and precise in what we're assuming. So, for example, we might assume "we have an algorithm A, if run for a decade, would compute what humans would decide after a thousand years of debate". Then we have two practical and well defined subproblems to work on: can we approximate the output of A within reasonable time, and is "what humans would decide after a thousand years of debate" a good definition of human values?

Another option, when we lack a full definition, is to focus on some of the properties of that definition that we feel are certain or likely. For example, we can assume that "the total extinction of the all intelligent beings throughout the cosmos" is not a desirable feature according to most human values, and argue whether debate methods will lead to that outcome. Or, at smaller scale, we might assume that telling us informative truths is compatible with our values, and check whether the debate AI would do that.


Who's welcome to our LessWrong meetups?

10 декабря, 2018 - 16:31
Published on December 10, 2018 1:31 PM UTC

As part of announcing meetups publically, it's good to write in the meetup description about what kind of people would likely be a good match for the meetup. I still haven't gotten a good description myself.

How would you describe the kind of people we are in words that are clear to outsiders?


How Old is Smallpox?

10 декабря, 2018 - 13:50
Published on December 10, 2018 10:50 AM UTC

The conventional view is that smallpox has been around since antiquity, but more recent evidence has suggested it's actually only around 500 years old.

So I have a research/rationality question: how conclusive is the "500 years old hypothesis"? I don't really have the expertise to evaluate it.

The wikipedia entry briefly notes the new findings, but doesn't seem to have rewritten the overall history section:

The earliest credible clinical evidence of smallpox is found in the smallpox-like disease in medical writings from ancient India (as early as 1500 BC),[54][55]Egyptian mummy of Ramses V who died more than 3000 years ago (1145 BC)[56] and China (1122 BC).[57] It has been speculated that Egyptian traders brought smallpox to India during the 1st millennium BC, where it remained as an endemic human disease for at least 2000 years. Smallpox was probably introduced into China during the 1st century AD from the southwest, and in the 6th century was carried from China to Japan.[26] In Japan, the epidemic of 735–737 is believed to have killed as much as one-third of the population.[14][58] At least seven religious deities have been specifically dedicated to smallpox, such as the god Sopona in the Yoruba religion. In India, the Hindu goddess of smallpox, Sitala Mata, was worshiped in temples throughout the country.[59]A different viewpoint is that smallpox emerged 1588 AD and the earlier reported cases were incorrectly identified as smallpox.[60][61]Paper: 17th Century Variola Virus Reveals the Recent History of Smallpox

The paper arguing the 500 years hypothesis is here.


• Variola virus genome was reconstructed from a 17th century mummified child• The archival strain is basal to all 20th century strains, with same gene degradation• Molecular-clock analyses show that much of variola virus evolution occurred recently


Smallpox holds a unique position in the history of medicine. It was the first disease for which a vaccine was developed and remains the only human disease eradicated by vaccination. Although there have been claims of smallpox in Egypt, India, and China dating back millennia [1, 2, 3, 4], the timescale of emergence of the causative agent, variola virus (VARV), and how it evolved in the context of increasingly widespread immunization, have proven controversial [4, 5, 6, 7, 8, 9]. In particular, some molecular-clock-based studies have suggested that key events in VARV evolution only occurred during the last two centuries [4, 5, 6] and hence in apparent conflict with anecdotal historical reports, although it is difficult to distinguish smallpox from other pustular rashes by description alone. To address these issues, we captured, sequenced, and reconstructed a draft genome of an ancient strain of VARV, sampled from a Lithuanian child mummy dating between 1643 and 1665 and close to the time of several documented European epidemics [1, 2, 10]. When compared to vaccinia virus, this archival strain contained the same pattern of gene degradation as 20th century VARVs, indicating that such loss of gene function had occurred before ca. 1650. Strikingly, the mummy sequence fell basal to all currently sequenced strains of VARV on phylogenetic trees. Molecular-clock analyses revealed a strong clock-like structure and that the timescale of smallpox evolution is more recent than often supposed, with the diversification of major viral lineages only occurring within the 18th and 19th centuries, concomitant with the development of modern vaccination.


Why should EA care about rationality (and vice-versa)?

10 декабря, 2018 - 01:03
Published on December 9, 2018 10:03 PM UTC

There's a lot of overlap between the effective altruism movement and the LessWrong rationality movement in terms of their membership, but each also has many people who are part of one group and not the other. For those in the overlap, why should EA care about rationality and rationality care about EA?


Measly Meditation Measurements

9 декабря, 2018 - 23:54
Published on December 9, 2018 8:54 PM UTC

A few months ago, I decided to start meditating regularly, around an hour a day. It seemed like a good opportunity to measure possible effects, so I asked for advice on what to measure. This post summarizes the results. In short, while the subjective effects of meditation were strong, the measurements didn't show anything. This is a fine place to stop reading; I'm mostly posting this because I promised to.

What I Measured What I Measured
  • My performance on the tasks looked entirely random. It wasn't better or worse after meditating, and it didn't get better or worse over time.
  • I have no idea how to do experience sampling. I understand that some people have moods. I'm almost always in a neutral mood, and so wasn't sure what to put most of the time. Also, I'm apparently often away from my phone, and missed many (most?) pings.
What I Learned
  • The Mind Illuminated is as good of a guide as I hoped it would be.
  • A few measly months of meditation isn't going to change anything like your performance on reaction-time-like tasks.
  • A few measly months of meditation will give you a fascinating look into your own mind. It's not what you think. I'd say more, but I'm deeply confused and don't have a good model.
  • Meditation retreats are great. I went on a two-day one, whose format wasn't particularly well-suited for me, and even this had a large effect on my practice.


Review: Slay the Spire

9 декабря, 2018 - 23:40
Published on December 9, 2018 8:40 PM UTC

Epistemic Status: Many hours played

Spoiler-Free Bottom Line: Slay the Spire is an amazing single-player roguelike deckbuilding game. When I wrote that Artifact was the most fun I’ve had gaming in a long time, the only alternative to give me pause was Slay the Spire. Each game, you work your way up the spire, with each room an opportunity to improve your deck, either with rewards from battle or other opportunities. Each turn of each battle, you see what the enemy is going to do, and by default you have three energy to spend on any combination of five drawn cards, to prepare to block their attacks while dealing damage back. If you die, that’s it, time to start over.

Early plays ideally involve discovery of what cards are out there, what decks are possible to assemble, what enemies there are and what they do, and everything else the spire has to offer. As you gain in skill and experience, you play it on additional levels and in new ways.

I highly recommend playing the game, and I highly recommend not learning more or reading further before doing so. Figuring the game out is half the fun.

My Mostly-Spoiler-Free Journey Through the Spire

I started off knowing the basics above, but nothing else. The game was in (earlier) early access, so a bunch of the details were different, but aside from missing the third class (The Defect) the game was largely the same as it is now.

I played my first few games as The Ironclad. At first things were tough, but a little experience went a long way. My first run ended on the Act I boss. My second run ended on the Act II boss. In my third run, I managed to get all the way through and win.

That surprised me quite a bit. Rogelike games are supposed to be way harder than that! I put it up to a lot of luck and a lot of deckbuilding game experience, and moved on to the second character class, The Outcast.

Once again, there was a learning curve, but once again it didn’t seem that hard, and on my second try I got all the way through. I assumed I was fortunate to win so fast, but it seemed powerful things would come my way reasonably often.

At that point, I stopped playing. What more was there to do? I saw some talk of trying to win *more consistently*, and there was the option to use ‘Ascension’ to make the game harder, but I did not see the appeal in either approach. When The Defect became available, I tried it and won on the third try. So after eight games, I had won with all three classes. It had been fun. At $20 I felt I’d had more than my money’s worth, but I figured that was it.

Later articles on the wesbite Rock Paper Shotgun, which I use as my main source of computer gaming news, convinced me to give the daily climb a shot. In each daily climb, all players are given the same random seed, which contains the contents of the spire and a bunch of modifications to spice things up. Then you compete for the high score, as determined by whether you made it the whole way but also by how elegantly you did it. You get rewards for killing extra elite monsters, for not taking damage, for building a bigger deck and so forth. With points to maximize, there’s a constant balance between going for more points, strengthening yourself for later on, and not dying. I spent a few weeks playing the daily climb each day, but after a while that too started to feel repetitive, and once again I was ready to move on.

Then, a few weeks ago, the game released the ending. Five games later I had won with each of the three characters again, and it was time to start gathering keys on my climb to the final boss. On my second try, I reached the fourth and final act… and promptly got completely destroyed. I’d brought a relatively poor deck that was fortunate to get that far, so I tried again. Two games later I was back with a much stronger deck… and I got completely destroyed again.

Finally, we had a challenge I could get behind. If you came with a relatively normal deck, it was clear you were going to have a bad time.

Further games were not about the first three acts. The first three acts contained checkpoints, and ways you could die if you got too aggressive, but they were not the point. The point was to win that final fight. A third try did a little better, but was still not close. A fourth had a lot going for it, I thought I had it, and then I had to use one card too many on the last turn, couldn’t find what I needed, and died to exact damage the turn before I was going to win.




Several tries later, and after several important lessons learned, the plan came together and the heart died in a barrage of Static Lightning.


Two attempts later, The Outcast too was victorious, thanks to a truly absurd amount of poison damage.


I still haven’t quite won with the Ironclad. I actually should have, but I forgot that the heart had an artifact, chose the wrong attack, and came up exactly two points short on the last turn. The Ironclad has the toughest path, but there are doubtless still ways.


Perhaps I’ll try some games in ascension mode. Interestingly, the first ascension level is more likely to kill you, but arguably makes it easier to kill the heart, since you end up with extra relics.

What Makes Slay the Spire Work

The player has all the fun.

Even when you are first discovering the game, it is easy to understand what is about to happen and why. You get a steady stream of meaningful choices. If you choose wisely, you get to do lots of cool things.

Slay the Spire’s central innovation is enemy intents. Giving the player all the fun is its genius. Each turn, you can see what each enemy is planning to do – attack you for some amount, defend, use a buff, inflict a status. At first actions other than attacking and blocking can be mysterious, but you still have a general idea of what is happening, and in time you learn the patterns of each enemy and how they tick.

At first, I thought lack of enemy diversity was a fatal flaw. There were only so many fights, so I would quickly tire of them. Later, I came around to this lack of diversity being actively good. Consider the difference between planning for a wide-open Magic metagame, where you could face anything at all, and planning for a particular metagame with a handful of opponents. Both are interesting in their own way. You get to enjoy both, with a wide open and unknown metagame early on, then a known set of enemies to target later on.

Slay the Spire offers the same. In your first explorations anything can happen, then later on you are planning for the exact enemies and patterns you will face. Your own deck is constantly changing, it is good, once you have enough experience to use the information, to know exactly what you are up against and must do. That is why the game shows you, at the start of each act, which final boss you will face at the end of that act, to allow you to plan and prepare. Later plays of Slay the Spire are all about having a plan, getting what you need to face down exactly the challenges coming your way, and pushing yourself as far as you can but no farther. In my recent playthroughs, there was be a laser focus on what my deck must do to claim victory in the final fight, knowing exactly the attack patterns and challenges I will face.

Another huge advantage of Slay the Spire is simplicity. The game could be simpler, but not without sacrifice. Every bit of complexity counts. You draw five cards a turn, you can play three energy worth of cards (most cost one, some zero or two, a few cost more or scale with what you spend), they mostly do damage or stop the enemy from doing damage, and the complexity is added slowly from there by the cards and relics.

Slay the Spire also lets you do tons of good, powerful things all the time. You start with a basic deck, and every move makes you stronger. Relics give you special abilities and advantages, cards are upgraded at forges, you get a new card after each battle and so on, and there is no attempt whatsoever to balance those cards. The good cards are already a welcome relief compared to the starting cards. The great cards are fantastic.

You get a mostly random set of relics, and can choose what path to take and which of a few options or cards to take at other junctures. You have enough customization to have a ton of influence over how your deck develops, but you are also at the mercy of events and forced to make the most of what you are offered. Again, there is zero attempt to balance things other than to make them fun, so often you’ll face a choice between the more powerful thing and the thing you actually want. Other times, you’ll be handed a huge gift, and other times still you’ll have no use for the relics and cards you’ll find and even sometimes intentionally pass them up (which you are allowed to do). In one recent playthrough I chose not to take a boss relic from the Act 2 boss, which is a huge kick in the nuts, but that’s the way it goes. Building your deck around the relics you are granted is a huge part of what keeps Slay the Spire interesting and fresh.

This general idea of ‘give you lots of choices, each a randomized multiple choice’ pays big dividends. You get a choice of three cards, or a dozen things for sale at the merchant, or which of your fifteen cards to upgrade. Random events usually give you two or three choices. The story of the sum of these random choices becomes the story of the climb. So is the general story of figuring out how to get super powerful things out of your deck when you get the chance.

Hearthstone’s Arena pioneered a similar simplified form of drafting, giving only three choices at a time and not forcing you to adjust to what others around you are doing. It lacks the richness of Magic booster drafts or Artifact drafts, but is much richer and more interesting than it first appears. There is likely much room to enrich such formats while retaining this simple essential nature. Even in Magic booster draft, you still are choosing one from up to fifteen options, so the difference there is mostly in degree – the lack of dynamic opponents is the bigger fundamental distinction.

Slay the Spire also does a great job giving you lots of goals each climb. If you’re not sure if you will beat the Act 3 boss, that’s the goal. If you know you can’t, you can try to get as high as you can. If you know you’ll beat the Act 3 boss, you can try to score more points, or later to set up for the finale. It’s up to you and I found all the goals and the battles satisfying. More than anything, the game does a great job of making the battles fun, and not giving you too many with any one deck or against one type of opponent, before the game ends.


Slay the Spire is highly recommended. It shows how to use simple choices and abilities that combine in unique ways to create varied, interesting and fun puzzles. Its emphasis on letting the player have all the fun, and ensuring there is lots of fun to be had, is even more central than I had previously realized. Slay the Spire offers lots of lessons and innovations that can be used by other games, including multiplayer customizable card games. This is especially true in their limited formats, and for the creation of unique and interesting leagues and special events.

I have strategic thoughts on the game as well, but have cut them from this review. I may or may not choose to write them out at another time.


Kindergarten in NYC: Much More than You Wanted to Know

9 декабря, 2018 - 18:36
Published on December 9, 2018 3:36 PM UTC

Kindergarten in NYC: Much More than You Wanted to Know

My son is turning five next year, which means one of the most important transitions in his childhood and potentially his life: starting Kindergarten. I always thought New York City moms who obsessed over this were clearly crazy.

Now I am one of those moms.

Why do we do this to ourselves? It’s not the one year of kindergarten. It’s securing that spot in the school where you want them to stay until middle school and potentially high school, and probably send your other kids to as well. It’s all of the social and class insecurities that come with choosing a school and its associated peer group. It’s the fear that if you choose poorly, your child will age 100 years and his face will melt off in front of you.

Not quite that severe. Still, you worry you’ll mess up their life and they’ll become drug addled sociopaths living on your couch until you kick them out when they bring back that prostitute.

Maybe going overboard again. They’ll go to State College, move to the suburbs, and work in retail.

Wo wo wo, lets not be unrealistic. Retail won't be around in 10 years. Your kid will be horribly miserable for the next 14 years, go through depressive episodes, and blame you for all of it. That’s what I’m actually worried about. Both my husband and I had horrible elementary school experiences. We still carry scars. We don’t want that for our sons.

So why not home school? All the cool kids are doing it. We have personal reasons why this would not work for our family. Our son has some social deficits, but is extremely bright. Literally everyone we’ve spoken to who knows our son agrees that he would do better in a structured environment with peers. We have observed his profound social-emotional growth upon starting the school year. We saw back-sliding over the summer when he lacked structure or regular peer interactions. He will not listen to us when we teach him. He is a different child in the school setting, soaking up knowledge.

People can rant all they like about how horrible school is philosophically, but that does not negate what we’ve personally witnessed in our own child. Philosophy aside, home-schooling is a lot of work and coordination. We both work full-time. While we would pick home-school over the horrid elementary school experiences we had, we hope we can do better and find a school where he will be happy.

That is much easier said than done. Especially for unique children. Our son has done well in a private preschool with 15 children and 3 teachers. A public kindergarten in NYC has a class of 26 children and one teacher. This goes up to as high as 32 in first grade. That is a lot of kids in a small space. It presents two options. Either you get a very noisy and unruly class, or a strictly controlled group which conforms precisely with everyone sitting quietly and doing the same thing at the same time. We have seen both. Neither is pretty. Our son has sensory issues, and will not tolerate a very noisy classroom. We expect he also would not tolerate a conformist one. Him tolerating it would scare us even more.

If he went to public school, we might well be pressured to put him into a resource room, with children much worse off than himself. Children with emotional disturbance, severe autism, retardation and other severe problems. My mother has worked in such classrooms and what she describes is unacceptable. Those are her stories to tell, but I would not put him there. Ever.

So what can we do? Sue the city! That’s what everyone told us to do. Say the public schools can’t meet your kid’s needs, since they clearly cannot do so. Find a nice, private special needs school, and sue for tuition.

So we saw some special needs schools. Like public schools, they varied a fair bit and we liked some more than others. What they all had in common was a severely impaired peer group. He would be one of the most functional students in the class. We don’t want that for him. We want him to be challenged and learn from peers who can be models for him.

So what next? Private school! Private schools also vary a lot, but have one thing in common. They are expensive.

I’m not sure you understand how bad this situation is. I spent time looking around. The average private elementary school charges about $45,000 per year.

Yup. You saw that right, $45,000. That’s more than most students' college tuition. Before aid or loans. And it’s post-tax income. And we have more than one child.

With two (and perhaps more) children, that would be most if not all of my post-tax income as a psychiatrist.

People have the audacity to say “But you can afford it.” Don’t get Zvi started on that phrase.

Even if you want to send your kid to private school, you have to apply and be accepted. Most good private schools are selective. Most do not want to deal with a child with special needs.

We have been lucky to find one nearby private school that charges considerably less (though still far from cheap) and happens to have an educational philosophy we think would suit our son. It’s a Waldorf school. It emphasizes practical skills such as cooking, gardening, carpentry, foreign language, and trade. Since we believe our son is gifted academically, being less academic does not concern us. He will learn that stuff at home whether we want him to or not. Thus, we wait with baited breath for his trial period there to see if they’ll accept him. We don’t have a back-up option that comes close at present.

What’s been really interesting to me through this process is how vastly schools differ from each other. Often people speak about ‘school’ as if it is one thing. Either you agree with sending kids to ‘school’ or you don’t. This is not the case. One reason New York City moms go berserk over this is that there are *vast* differences between schools even a few blocks away from each other. Within the public schools, class is everything. Most children go to their ‘zoned' school, and so people will pay higher rents near the ‘good’ schools to get their kids in. One of the public schools we saw looked and felt like a prison, had no music or art program, and only let the kids outside for 20 minutes a day. Another 10 blocks north in the neighboring district collected $500K/yr from the PTA and had full music and art programs, book fairs, a large library, and extra in-classroom assistants.

We live in a district which has weird rules about admissions. Instead of having a zoned school, you make a rank-list of schools in the district and apply to all of them. In an attempt to ingrate the schools more, the city has imposed rules about who can be admitted by class. The schools are required to accept 67% of ‘diversity’ applicants who qualify either for low income, English as second language, or living in shelters (i.e. homeless). There is a lot of evidence supporting that peer group is a major factor in child development and life outcome. Political incorrectness aside, this is not a wonderful peer group. It also far reduces the chances that your child will get into the particular school you want them to go to. Since priority is first given to siblings, the ‘nice’ school in this district (that we would have previously been zoned for) now only has four ‘non-diversity’ spots open for admission this year. Even if we were willing to send him there, he probably wouldn’t get in. Because of this, many better-off families are moving out of the district entirely. This is reflected in the rents within our community – rent jumps considerably right at the district line. People respond to incentives. If we sent our kids to public school we would be forced to do the same. If you have any money at all, you go to the district where the PTA funds the nice art program, not the one with the metal detector in the lobby.

Going private for education hopefully means you avoid true disaster, and the peer group is relatively wealthy and educated. But even private schools differ vastly in their philosophy towards education. Some are super academic, drilling kids to get high SAT scores and become doctors and lawyers. Some are more laid back. Some hardly seem to teach anything at all. There are small schools with one class per grade, others that are much larger. Religious and secular schools. Science schools and arts schools. If you’re willing to pay for it odds are there is some school that you would like. That’s a big if though.

My practical advice: If your only option is public school, move to an area that has a nice school at least one full school year before you intend to apply. You can tour schools just by saying you have a kid in the district, and they don’t force you to prove it. Once you find a school you like, you can move to that school’s zone, and you will have a high chance of admission. To be safe, you should make sure there are 1-2 back up schools you find acceptable in the district. If you cannot afford to live any places with reasonable public schools, you should seriously consider leaving the city. I am told of reasonable schools in NJ…

If you can’t stand public school, because at the end of the day they all follow common core, take those tests, and have 32 kids in a class, then you have to consider what you can afford. Home school has no tuition, but will require all-day child care, any educational materials/classes you want to use, and a large coordination effort on your part. If you’re a stay at home parent this might appeal to you anyway. For the most part the people who choose to do it are happy with it.

Private school is expensive, but requires less advance planning, since they don’t care what district you’re in as long as you can pay. You might still need to consider moving for private school if you don’t want your child to have an infinitely long commute. The city will pay for busing to private schools for bus routes which are 0.25 – 2.0 miles. Keep in mind that they are measuring distance along bus routes and not geographically. Even if you are physically within 2 miles of the school, the bus route might be over 2 miles and you will be out of luck. To be fair, if you’re willing to spend $50,000/year on a school, then what’s another $40/day to hire someone to take them to school?

I am now going to write some school reviews. I will leave out specific names, but if you are interested you can message me privately, and I will let you know which is which. Zvi saw some schools I did not, which I haven’t written about, and we still have some tours planned at local public schools.

Public Schools:

District 1 (our district – the one with the integration)

Public School A:

I was pleasantly surprised by this school’s philosophy of education. They were laid back and progressive. Kids sit at tables instead of desks. Group conversations and creative expression was encouraged. No mandatory homework. Starting in 1st grade, kids learn chess and have the opportunity in 3rd and 4th grade to compete in tournaments. In 3rd grade the kids learn basic computer programming. There is a year of free music lessons. They have a theater and a roof-top garden. Gym is non-competitive until 4th grade. 45 minutes of daily outdoor time. I really liked everything they *said* and the principal was super cool. However, the actual classrooms were tiny and crammed full of students. It was loud. I felt claustrophobic there, and I don’t have sensory issues in general. Plus, the district just implemented the diversity criteria this year, so the students I was seeing are not the peer group my son is going to have if he went. And, of course, they only have four non-sibling, non-diversity spots available.

Public School B:

This place is a prison. There is an angry security guard at the entrance to the grime-encrusted orange walls. Multiple signs above the guard state ‘theft is a crime.’ The slit-like windows at the top of the rooms let in thin beams of daylight to an otherwise flickering-fluorescent landscape. This is hell. There is no music or art program – no room in the budget. So ‘we do that within our lessons’. 20 minutes of yard time a day. Everything is centered around standardized tests. The only white faces were part of a special program. No one with any choice would ever let their kid set foot in this place unless they were in the special program. Not worth it. It’s social control of minorities. Straight up. If SJWs want a cause, here’s one for you. And no, forcing white or wealthy children to go there is not going to work. They won’t.

District 2 (the nice one)

Public School C:

The platonic ideal of school. When you think school, you think this school. The people who designed it thought ‘what is school?’ and then based the design off of every trope and meme about school, ever. Charts of everything on the walls. ‘Task leaders.’ Bulletin boards. Window decals. Those weird cartoon people you only see in school ever. Worksheets, worksheets, worksheets. Chalk boards. White boards. This place has it all! The place felt nice. Larger rooms, more light. Nice enrichment activities. A music and art program. A nice library and computer lab. Several outdoor spaces and playground equipment. The place gets $500k/yr from the PTA to keep the place great. Mostly white faces sitting quietly in circles while the teacher spoke to them in exaggerated tones with big faces while pointing to a white board.

Looked like the children of the corn. Completely conformist. But conformists at least a year ahead academically. It is disturbing to see kindergarteners completing reading worksheets and pushing papers around, but they were able to do it. This is the place for upper-middle class white people who move into the ‘good’ part of the neighborhood.

Private Schools:

Private School A: Preparatory School

EXPENSIVE. Beautiful school and facility. It is a ‘Quaker’ school, but mostly secular. Has a beautiful chapel where kids have ‘community assembly and quiet time’ once a week. Other parents were very well dressed – a lot of suits and jewelry. Academically rigorous without being oppressively conformist. Perhaps because the class size is 20 instead of 30, so there is more room to maneuver. A fine school as schools go, but not that much of an upgrade from PSC given the price. Also difficult to get into and unwilling to accommodate special needs.

Private School B: Jewish School

I loved this school! I really did. It’s a progressive, laid-back atmosphere that is still academically oriented. It is very Jewish. The boys wear keepas and the curriculum is fully bilingual with one teacher speaking English and the other speaking Hebrew. They have all the usual stuff such as music and art. They go outside for 1 hr/day. They are willing to work with special needs. They know how to work with gifted and talented kids and make special assignments for children who are ahead. LOVE IT. Problem was, it is about 1 hour away by bus and it’s a 7.5 hr day. Not doing that to my kid. Not willing to move close enough to make it work. At least not this coming year.

Private School C: Waldorf School

This is a very unique nearby school that happens to be less expensive than the others. It has a unique education philosophy (a Waldorf school) which emphasizes embodiment and practical skills over academic ones. The curriculum includes foreign languages, cooking, washing, gardening, carpentry, and trade. The kindergarten is entirely non-academic and includes copious time for free play and an hour of outdoor activity. The later grades teach traditional academics, but do so in somewhat unusual ways, which I don’t have a strong opinion on at present. Since the main reason we are sending our son to school is for socialization, and since he’s already brilliant, I’m less worried about academics, especially in the younger grades. The school requested a drastic reduction in our child’s screen time, which at first freaked me out (who are they to tell me what to do in my own home), but I kind of understand. It’s a very small school (only 1 class per grade) and they are currently considering whether or not they can accommodate his needs. This is our top choice at present.

Special Needs Schools:

SNS A: Social Justice Away!

This school is an ‘integrated’ private school – meaning it’s a private school for regular kids which also accepts children with learning disabilities and has services for them. This means you can get the tuition paid by the city, unlike regular private schools, with a relatively normal peer group. It’s a great idea. The school itself is beautiful and has All The Things.

However there is a catch. The school has an agenda. It’s a social justice school. In the sense that other schools are reading and math schools. They call themselves ‘Advocates for Social Justice’ in their opening lines. I wouldn’t have thought this mattered for elementary age children. Sure, loving each other is wonderful! Accepting your neighbors is wonderful! But this is not where they draw the line. Social Justice is taught in every aspect of the curriculum. There are 7 year olds discussing their ‘identities’, an 8 year old talking about how his hero is Colin Kaepernick, that guy who keeled for the national anthem. The teachers then praise his 'activism' for writing about it. The other sample lesson is on how Christopher Columbus was a white colonialist oppressor. And the children absorb this. The school is accepts all kinds – unless you happen to be a *gasp* Republican. No diversity of thinking. If you don’t fully swallow the SJW philosophy in all its forms, or don't want them forced down your child's throat, this is not the place for you.

SNS B: Soothing Gardens…

Beautiful place. Therapeutic environment. Has the things. Didn’t want us to see the children – which was strange. When we peaked in at them, they were, well, very special. Seems like a great place for very special kids. If I have one that needed all that, I’d consider sending him there.

SNS C: Jews with learning problems

While not specifically a Jewish school, there were clearly a lot of Jewish children and teachers. I actually liked this place a lot. It was very laid back and gave the kids a lot of lee-way to be who they are. It didn’t feel at all oppressive. They group kids into separate reading and math groups not by age, but by reading and math level, which I liked. The kids seemed less special than at SNS B, but still clearly special. The school didn’t have its own outdoor space and so kids only go outside twice week with a bunch of parent-volunteers, since they want one adult per kid when crossing the streets. What was particularly disappointing was that they were clearly quite academically behind. The classes were so laid back that there didn’t seem to be a challenge, and the teachers were fine with whatever they produced. I can imagine certain children this would be very good for. I have vastly higher hopes for our son.


New Ratfic: Nyssa in the Realm of Possibility

9 декабря, 2018 - 08:00
Published on December 9, 2018 5:00 AM UTC

For NaNoWriMo, I decided to do a rationality themed pastiche of the Phantom Tollbooth. It is complete and serializing at http://nyssa.elcenia.com on Saturdays and Wednesdays. There are three chapters up as of this posting.


What precisely do we mean by AI alignment?

9 декабря, 2018 - 05:23
Published on December 9, 2018 2:23 AM UTC

We sometimes phrase AI alignment as the problem of aligning the behavior or values of AI with what humanity wants or humanity's values or humanity's intent, but this leaves open the questions of just what precisely it means for an AI to be "aligned" with just what precisely we mean by "wants," "values," or "intent". So when we say we want to build aligned AI, what precisely do we mean to accomplish beyond vaguely building an AI that does-what-I-mean-not-what-I-say?


What is "Social Reality?"

8 декабря, 2018 - 20:41
Published on December 8, 2018 5:41 PM UTC

Eliezer's sequences touch upon this concept but I'm not sure they actually use the phrase. Much of my understanding of it came from in-person conversations. Various comments and posts have discussed it but to my knowledge there isn't a clear online writeup.


Prediction Markets Are About Being Right

8 декабря, 2018 - 17:00
Published on December 8, 2018 2:00 PM UTC

Response To (Marginal Revolution): If you love prediction markets you should love the art world.

Previously on prediction markets: Prediction Markets: When Do They Work?Subsidizing Prediction Markets

I’ll quote the original in full, as it is short, and I found it interestingly and importantly wrong. By asking the question of why this perspective is wrong, we see what is so special about prediction markets versus other markets.

Think of art markets, and art collecting, as an ongoing debate over what is beautiful and also what is culturally important.  But unlike most debates, you have a very direct chance to “put your money where your mouth is,” namely by buying art (it is very difficult to sell art short, however).  In this regard, debates over artistic value may be among the most efficient debates in the world.  At least if you are persuaded by the basic virtues of prediction markets.  The prices of various art works really do aggregate information about their perceived values.

I have, however, noted a correlation, how necessary or contingent I am not sure.  The “white male nerd types” who are enamored of prediction markets tend to be especially skeptical of the market judgments of particular art works, most of all for conceptual and contemporary art.

In my view, discussions about the value of art, as they occur in the off-the-record, proprietary sphere, are indeed of high value and they deserve to be studied more closely.  Imagine a bunch of people competing to make “objects that are interesting but not interesting for reasons related to their practical value.”  And then we debate who has succeeded, or not.  And those debates reflect many broader social, political, and economic issues.  And it is all done with very real money on the line.  The money concerns not just the value of individual art works, but also the prestige and social capital value that arises from having assembled a prestigious and insightful collection.

That’s exactly why (almost) everyone who loves prediction markets hates the high-end, expensive art markets, even if they love art and artists and buy original paintings to hang on their walls. This goes beyond ‘skepticism of the market judgments.’ Expensive art markets are not fundamentally markets. They are fundamentally a political status game.

Consider three (non-exhaustive) types of markets: Consumption markets, commercial markets and prediction markets.

Consumption markets are where the buyer is buying the item in order to use it.

The buyer who pays more than necessary is sad in one sense, and the one who got the best deal is happy in that sense. But that sense isn’t the important one for the buyer. If you are ‘right’ it is because you indeed got good use of the item that justified the purchase. If you are ‘wrong’ it is because you didn’t.

Thus, we can point to a ‘naive’ participant who doesn’t ‘play the game’ of that market, and say ‘look how much they could have saved’, or did ‘save’, but that doesn’t actually impact them.

Liquid commercial markets are where the buyer plans to sell the item to someone else.

Middle men, arbitrage, investment, greater fools, that sort of thing. Buy low, sell high.

If you buy a stock, or a commodity piece of art, or inventory for your store, or a cryptocurrency, and others want to buy it for more, it goes up in price and you make money. If they want to sell it for less, it goes down in price and you lose money.

The buyer who pays more than necessary is sad, and loses money, in the only sense that matters. If the price goes down, that too makes the buyer sad. Paying a locally good price, or having the price go up, makes the buyer happy. The key is to buy before others buy, so they drive the price up.

You might reply, no, the bigger key is to buy what is cheap and sell what is expensive, based on fundamentals, and that will bear out over time.

Well, maybe.

Yes, often buyers and sellers are driven by fundamentals. But in an important sense, that is a coincidence. What is actually good news is often considered bad news, and vice versa. Prices are often largely driven by who is thinking about what and the emotional state and financial needs of participants. The market can stay insane longer than you can stay solvent. The people who say such non-fundamental movements are random, are mostly saying they aren’t good enough to understand and predict them in this case.

Yes, eventually fundamentals might take over.  Or they might not. Low prices cause damage or make items impossible to justify storing or stocking. High prices trigger media attention and create opportunity. Low prices trigger margin calls, gets the company bought out or its employees and partners to quit. High prices trigger short squeezes and make everyone want to work with and for you. And so on. Momentum trading works, damn it (like everything else on this blog, not investment advice!).

Ideally the commercial market is anchored by connection to a consumption market – someone wants the goods, or is willing to collect the profits from the stock, or what not. The stronger that anchor versus speculative factors, the more accurate the prices.

Prediction markets have elements of both.

Prediction market traders can choose to mostly act like traders. If you think that others will think that the Patriots will win next week, you can bet on the Patriots now and then bet against them later when the odds change, and make money. You can be a market maker, or a block trader, or any other traditional market role.

In doing this, a trader cares about future social reality. They are people predicting what others will, in the future, predict that others will predict that others will predict, and so on. World events can help or hurt them, as they change perception, but they care about that perception and not the reality. By the time reality sets in, who knows what positions the trader will have?

In prediction markets there is another option. You can care about future reality. The market predicts a future outcome, and importantly you can stay solvent longer than the market can stay insane. Either the Patriots will win next week, or they will not. You can do better by using your commercial market tactics to grab the best possible price on the Patriots winning or losing, but the important thing is that you win if you are right about the concrete thing, and you lose if you are wrong. 

This works because there is an objective outcome, and it occurs quickly. Thus it functions in its own way like a consumption market.

Truth matters. 

If you choose, only truth matters. I don’t have to care what other people think. They don’t determine if I win or lose.

That’s what I love, more than anything, about prediction markets. That’s the reason behind many of the requirements of well-functioning prediction markets: They enable this sole reliance on truth, without imposing virtual taxes via long lock-up periods. This also enables prediction markets to output accurate predictions.

That’s also a lot of what I love about trading. With a sufficiently deep and liquid market, you win if and only if you are right. No one gets to take that away from you and decide who gets the credit and the money. Only your skill mattered, and you reap what you deserve.

strongly encourage the type of people who read this blog to strive to identify and work in such realms. Be where being right, rather than being approved of, is rewarded. 

The world mostly does not work like this.

The world mostly hates prediction markets, because they predict concrete consequences and outcomes accurately without taking into account what those in power, with high social status, want to be the prediction. 

Mostly, winners and losers are determined by social processes, status, coalitions, power, money and so on.

Credit and compensation mostly isn’t based on who knew the truth and predicted accurately, or who did the work or created the value, or even what was stated in the contract. It is based on who has power and what they decide, based on what is good for them. History, along with everything else, gets decided by the winners.

That’s life.

That’s also expensive art, and expensive art markets, of the type Tyler speaks of. Only more so.

As I understand it (from, mostly, following Marginal Revolution links and posts) a small group determines who succeeds and fails, and buys art from each other, and manipulates the social reality of the art world and its prices to suit its fancies. Its fancies are mostly about the pursuit of conspicuous consumption, high social status and its associated rewards, wealth storage, money laundering and tax evasion, plus suckering outsiders and scamming them out of their money. Artistic merit, or aesthetics, are mostly a minor consideration.

Recall Tyler’s description:

Imagine a bunch of people competing to make “objects that are interesting but not interesting for reasons related to their practical value.”  And then we debate who has succeeded, or not.  And those debates reflect many broader social, political, and economic issues.  And it is all done with very real money on the line.  The money concerns not just the value of individual art works, but also the prestige and social capital value that arises from having assembled a prestigious and insightful collection.

In this context, what does it mean for an object to ‘be interesting’? It means having a high price, but mostly it means being judged as interesting by a high social status cabal that is primarily designed as an alliance of the high status connected people against everyone else. This need not be explicit at all – it is how such people instinctively operate, and you either learn those instincts or you never make it into the club.

There is no reason think any of this will ever “return to fundamentals” in any sense. The system sustains itself. There is (almost) no there, there. There never will be.

Thus, if I buy art, and people don’t like me, they will find ways to charge me a lot more then they’d have charged an insider, and then they say therefore my art is not so valuable. Because I was buying it, and now I own it.

If I hadn’t bought that piece, would it have become valuable? We’ll never know. Was it valuable before I bought it? Also impossible to say.

That game is rigged, man. The only way to win is not to play.

If I think those people are wrong, I can consume the art by displaying it in my house and admiring it. If I want to spend a few hundred or thousand dollars on something I love, by all means I should go for it, but have zero illusions about the work becoming ‘valuable.’

What I cannot do is predict that they are wrong, and wait for events to prove me right. There is no judgment day. No profit stream. No right. No wrong.

There are only cliques who watch each other to see if they are favoring the others in the clique, and use this to exploit others, because that’s what winners and clique members with power and money do. It’s sort of a market, like everything else. But in important senses, it is badly named, and something people like me despise. It is our failure mode and our doom, the way that prediction markets are our success mode and our hope.

Thus, if you love art markets you likely despise prediction markets, at least outside of their designated safe areas like sports and elections. And if you love prediction markets, you likely despise art markets whether or not you find them informative and fascinating in their own way.

What none of the people, whether they love or hate either market type, should be fooled by, is in accepting in a non-skeptical fashion the ‘market prices’ of ‘art’ in the art market. That is flat out not what is going on, at all. Such trades are not about the exchange of cash value for art value. Trying to use them to value the artwork misses the point entirely.

Are these art-market games worth understanding for what they can teach us about the world and how people work? Absolutely. Such shadowy practices do not get the light shined on them, that they deserve. Scams and exploitation and manipulation should be exposed. Political games as well. To blame and ideally punish those responsible, to protect people against them and against having to play such games to succeed. But more than that, to educate us about how people, and how such systems, work. Mostly, those who do understand how such things work only understand them from the inside, and do so in a non-intellectual fashion. With exposure, and as they see such actions succeed, they adopt their actions, views, instincts and very identity towards perpetuating such systems through imitation, usually without ever understanding what is going on in either themselves or the system at large.

Actually understanding how such things work might be the first step towards containing or overcoming such systems, or at least minimizing the damage they inflict on our lives, our status, our wealth and our souls.

It is also possible that such systems are in fact how anything actually gets done at all, and the exposure of more and more hypocritical and exploitative systems is making society unable to function, which would be far worse.

That’s a risk I am willing to take.