Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 53 минуты 59 секунд назад

Reading/listening list for the US failing or other significant shifts?

13 ноября, 2020 - 18:34
Published on November 13, 2020 3:34 PM GMT

I have a tentative hypothesis that the US is quite precarious. Although I believe we have likely avoided the worst outcomes this year, I think the next 10-30 yrs have some reasonable chance of going very badly. I don't know what that means, nor what the implications are for top cause areas.

 

I want to create a reading/listening list for exploring all this. Things I'm interested in:

  • history, particularly broad arcs/analysis of history (e.g., maybe I should read some Peter Turchin stuff)
  • history, particularly significant changes in global power dynamics (where power also involves technological advancement or similar)
  • ....other things I'm not thinking of that feel relevant

 

I have no background here at all. So even very basic recommendations would be appreciated.



Discuss

Misalignment and misuse: whose values are manifest?

13 ноября, 2020 - 13:10
Published on November 13, 2020 10:10 AM GMT

Crossposted from world spirit sock puppet.

AI related disasters are often categorized as involving misaligned AI, or misuse, or accident. Where:

  • misuse means the bad outcomes were wanted by the people involved,
  • misalignment means the bad outcomes were wanted by AI (and not by its human creators), and
  • accident means that the bad outcomes were not wanted by those in power but happened anyway due to error.

In thinking about specific scenarios, these concepts seem less helpful.

I think a likely scenario leading to bad outcomes is that AI can be made which gives a set of people things they want, at the expense of future or distant resources that the relevant people do not care about or do not own.

For example, consider autonomous business strategizing AI systems that are profitable additions to many companies, but in the long run accrue resources and influence and really just want certain businesses to nominally succeed, resulting in a worthless future. Suppose Bob is considering whether to get a business strategizing AI for his business. It will make the difference between his business thriving and struggling, which will change his life. He suspects that within several hundred years, if this sort of thing continues, the AI systems will control everything. Bob probably doesn’t hesitate, in the way that businesses don’t hesitate to use gas vehicles even if the people involved genuinely think that climate change will be a massive catastrophe in hundreds of years.

When the business strategizing AI systems finally plough all of the resources in the universe into a host of thriving 21st Century businesses, was this misuse or misalignment or accident? The strange new values that were satisfied were those of the AI systems, but the entire outcome only happened because people like Bob chose it knowingly (let’s say). Bob liked it more than the long glorious human future where his business was less good. That sounds like misuse. Yet also in a system of many people, letting this decision fall to Bob may well have been an accident on the part of others, such as the technology’s makers or legislators.

Outcomes are the result of the interplay of choices, driven by different values. Thus it isn’t necessarily sensical to think of them as flowing from one entity’s values or another’s. Here, AI technology created a better option for both Bob and some newly-minted misaligned AI values that it also created—‘Bob has a great business, AI gets the future’—and that option was worse for the rest of the world. They chose it together, and the choice needed both Bob to be a misuser and the AI to be misaligned. But this isn’t a weird corner case, this is a natural way for the future to be destroyed in an economy.

Thanks to Joe Carlsmith for conversation leading to this post.



Discuss

Sunday November 15th, 12:00PM (PT) — talks by Abram Demski, Daniel Kokotajlo and (maybe) more!

13 ноября, 2020 - 03:53
Published on November 13, 2020 12:53 AM GMT

This Sunday at 12pm (PT), we're running another session of "lightning talks" by curated LessWrong authors (see here for previous weeks' transcripts). Again fully in Gather Town.

  • For the first hour, we will have a series of lightning talks each lasting about 5 minutes followed by discussion. The talks will be short and focus on presenting one core idea well, rather than rushing through a lot of content.
  • From 1PM to 2PM, we'll just be casually hanging out in Gather Town. If you are not interested in the talks, feel free to just show up for this part (or the other way around).
  • We want to give top LessWrong writers an interesting space to discuss their ideas, and have more fruitful collaboration between users. Think of it like a cross between an academic colloquium and some friends chatting by a whiteboard.

If you're a curated author and interested in giving a 5-min talk at a future event, which will then be transcribed and edited, sign up here.

Speakers
  • Abram Demski: "Maybe something about a potential direction for raising the sanity waterline which I've been pondering"
  • Daniel Kokotajlo: Why we should build an AI Alignment Hub in Singapore
Details

When? Sunday November 15, 12:00PM (PT)

Where? garden.lesswrong.com



Discuss

Notes on Respect-for-Others

13 ноября, 2020 - 02:33
Published on November 12, 2020 11:33 PM GMT

This post examines the virtue of respect-for-others. It is meant mostly as an exploration of what other people have learned about this virtue, rather than as me expressing my own opinions about it, though I’ve been selective about what I found interesting or credible, according to my own inclinations. I wrote this not as an expert on the topic, but as someone who wants to learn more about it. I hope it will be helpful to people who want to know more about this virtue and how to nurture it.

What is this virtue?

The word “respect” is ambiguous; it covers several different things. For example: You can respect a person’s position or rank by granting them authority. You can respect a person’s reputation or skills or character or taste. You can respect the threat a potentially dangerous person or thing poses to you. You can show respect for someone as a form of showing submission to them.

The virtue of respect-for-others I mean to cover in this post is different. It has to do with understanding that other people have lives just as subjectively rich as your own, that they have their own perspectives, goals, desires, and priorities, and so forth, and that your own do not have objective priority over theirs.

This virtue is well summed up by the version of Kant’s Categorical Imperative that goes like this: “So act that you treat humanity… always at the same time as an end, never merely as a means” (Groundwork of the Metaphysics of Morals).

There are a couple of ways people tend to describe how this variety of respect works. These are not mutually-exclusive, but people may emphasize one more than the other:

  1. “I give every person some minimum baseline of respect that everyone deserves just by virtue of being a member of the human family, no matter who they are or what they’ve done.”
  2. “I give everyone I meet a certain default amount of respect, and then adjust that amount up or down as I get to know them better.”
Related virtues

It seems odd to me that there isn’t a word in English that precisely encapsulates this virtue. Some related virtues that touch on respect-for-others include:

  • concern, consideration, thoughtfulness, compassion
  • sympathy, empathy
  • civility, politeness, tact
  • acknowledgement, recognition
  • liberality, tolerance
  • humility (in the sense of not overvaluing oneself compared with others)

It also harmonizes with “humanism” and “philanthropy” in the sense of valuing human beings highly, relative to institutions, ambitions, or other-worldly values.

In human development

Children develop respect for others slowly, in stages, over many years. Early on, children have difficulty imagining that other people have their own perspectives and viewpoints or even their own versions of knowledge. Young children may see other people as extensions of themselves, and try to learn to manipulate them in the same spirit as they try to learn how to coordinate the movements of their bodies.

People with autism and Asperger’s tend to have more difficulty forming a “theory of other minds,” as do some people with schizophrenia.

But beyond just having the awareness of other independent minds, respect-for-others requires that those other minds be ungrudgingly allowed some independence from one’s own projects and preferences. People with narcissistic personality disorder exaggerate their own importance or centrality relative to other people and expect other people to go along with that. People with antisocial personality disorder act as though they do not believe other people have any inherent value or that their preferences and pursuits are worthy of respect.

At the other extreme, people subject to abuse may become so fixated on understanding the motives of their abuser (in order to try to fend off the abuse) that they end up suppressing their own egos and becoming extensions of the ego of their abuser. The Stockholm Syndrome is one astonishing way this can play out.

Too much respect for others’ points of view can lead to conformity pathologies such as those pointed out in the Asch conformity experiment.

Some people seem never to confidently develop their own identities and viewpoints, and feel the need to assume off-the-shelf identities instead, or to merge their own identities with a charismatic figure or leader. They express borrowed opinions, assume fashionable tastes, speak in clichés, and so forth, seemingly under the delusion that they are not entitled to an identity of their own or that it would be too much trouble to maintain one. A particularly grotesque version of this is the sort of internalized führerprinzip displayed for example by Adolf Eichmann, who adopted Hitler’s values in place of his own and later tried to claim that he could for that reason assign the guilt for his actions in implementing the Holocaust to Hitler while remaining innocent himself.

What good is it? And the egoist objection

Respect for others is a sort of things-I-learned-in-kindergarten virtue. It’s implied in the Golden Rule that has emerged in some form or other in folk ethics just about everywhere.

It plays an important role in other social virtues (e.g. friendship, love, trust, justice), and in some moral systems is the foundation on which the other virtues rest. For example, a person may be honest not so much because they love the truth as because they respect the person they are communicating with. Respect for others also is often found at the core of theories of political justice, in forms like “human rights,” “inalienable rights,” “equality before the law,” and other such formulations.

But in spite of all of these credentials, is there a case for stopping short of respect-for-others? What if you were to acknowledge that other people have their own subjective experiences, projects, and priorities that are just as important to them as yours are to you, but not see this as any reason not to prioritize your own as being the only really important ones? At the very least, when the chips are down isn’t it true that “every man for himself” rules the day? A straightforward egoism seems at first like it might be a reasonably strategic choice.

But even Ayn Rand, who disparaged altruism at every opportunity, included respect for others in her virtues. One’s own self-interested pursuits ought to be undertaken, she wrote, with the understanding that other people are also entitled to their own such pursuits, and you should not expect them to be mere ingredients in your own plans: “[E]very living human being is an end in himself, not the means to the ends or the welfare of others — and, therefore… man must live for his own sake, neither sacrificing himself to others nor sacrificing others to himself” (The Objectivist Ethics).

Simone de Beauvoir pointed out how lonely and pointless the utterly egoistic viewpoint is. “If I were really everything, there would be nothing beside me; the world would be empty. There would be nothing to possess, and I myself would be nothing” (The Ethics of Ambiguity). She felt that “man can find a justification of his own existence only in the existence of other men,” and this only if we see other people as complete people like ourselves, not mere props or extras. What good is the admiration, love, and so forth, of people whose viewpoints we do not respect or cannot disentangle from our own? The egoist attitude that “winning isn’t everything; it’s the only thing” often seems to result in winning comparatively small and silly things in the grand scheme of things.

On the other hand, if you oppose egoism on the grounds that it actually won’t work out well for you, that kind of sounds like you aren’t really opposing egoism but some malformed and failed attempt at it. Maybe what you mean to say is that egoism and respect for others are compatible after all.

Who deserves respect?

If you buy that you ought to have respect for others, whom do you include in the set of such others? foreigners? heathen? children? babies? fetuses? animals? the disabled? bad people? sacred objects? the dead? nature? φ-zombies? nations? Do different classes of beings get different varieties of respect, or is it more all-or-nothing? What is it about others that makes them respect-worthy in this way, and do some people not have whatever that is, or do some non-people have it? Can you gain or lose it, or is it yours to keep once you have it?

Children are one tricky case. On the one hand, parents probably ought to respect their children as independent beings with their own preferences and characters, rather than trying inflexibly to fit them into preconceived molds. On the other hand, you wouldn’t trust an immature child with a barely-formed view of the world to make major decisions about his or her destiny. In such a case, respect for the child seems to include an evolving and tentative sort of respect for a slowly-emerging autonomy. But we may want to pay more attention to the small ways in which we may show (and teach) disrespect for children, such as lying to them (e.g. about Santa or about where babies come from) or tickling them without their consent.

But as important and interesting as such questions are, in this post I want to side-step them. Assuming you believe that you ought to have respect for others, and assuming you have some adequate way of determining who those others are, what next?

What does “respect” entail?

Assuming you do respect someone in this manner, what does that amount to in a practical way? If you want to treat someone “as an end” rather than a means merely, how do you go about it?

Part of respecting someone is to respect them as a person: that is, being fully cognizant of their humanity rather than considering them as, for instance, a physical obstacle on the sidewalk between you and your destination. A friend of mine told me she is in the habit of giving people a little nod “in acknowledgement of their individual who-ness” as she passes them. “Often there is no response, but sometimes folks break out in a big grin and I feel like they appreciated having their selves respected, just for being.”

Another part of respecting someone is to respect them as an individual as opposed to a unit in an aggregate or a sample of a type. If you are thinking of someone primarily as a voter, a Native American, a human resource, or something of that nature, this can mean that you’ve already shoehorned them into some schema or project of your own as an interchangeable part, without allowing their own choices and interests to enter into it.

Another part of respecting someone is to respect them on their terms. This requires insightful attention. It might also present obstacles (for instance, if someone seems to demand respect in an unreasonable or unethical or overtaxing way).

Some ways people show respect: being courteous, giving people the benefit of the doubt (and being on guard against the fundamental attribution error), being tolerant of differences, being willing to share and take turns, exercising communication skills such as tact, being sensitive to people with vulnerabilities that you do not have direct experience with, respecting other people’s autonomy rather than trying to make choices for them or manipulate them or act on them without their consent, being aware of cultural differences (for example, in body language), not mocking or humiliating others or gossiping about them in their absence, and being helpful and cooperative unless there’s a good reason not to be.

Examples like those might be part of the respect package that a person with a strong sense of respect for others offers by default. For an example of more of a minimum baseline respect standard, the non-aggression principle is one concise formulation that is popular among political libertarians.

Obstacles to respect for others

One way I often see respect-for-others neglected is in commercial contexts. Customers will sometimes treat wait staff, cashiers, and such with no more regard than if they were vending machines. Or, employees will sometimes treat customers as merely potential sales.

Something about being paid-to-do-it seems to make some people willing to go way beyond the bounds of what they would otherwise find decent. For some forms of employment it almost seems de rigueur to treat people disrespectfully. In the wake of the Milgram experiment, Milgram offered this interpretation of the results: “a person comes to view themselves as the instrument for carrying out another person’s wishes, and they therefore no longer see themselves as responsible for their actions.” Something similar seems to happen to some employees, where they consider themselves to be not responsible for things that they do if they do them as part of their jobs.

It takes mental energy to model another person as a complex subjectivity with their own access to knowledge, their own models of the world, their own motivations, and so forth. We have to use guesswork and approximations and heuristics at the best of times. When our minds are also occupied with other tasks, these approximations can reduce to caricatures that may eventually be oversimplified so much that they might as well be mannequins. In order to respect people we have to permit them enough room in our mental models, and enough of our attention, so that they can appear to us as fully-dimensional people. A frequently encountered failure of respect-for-others is absent-minded inconsiderateness, in which a person whose mind is fully occupied with other things gives insufficient regard to the people around them.

One way to bolster respect for others, then, might be to periodically tune down whatever else is going on in your mind, look at the people around you, and attend to trying to understand them more vividly. The cashier who is checking out your groceries: does she appear relaxed or tense? do you think she is at the end of her shift or the beginning? is she new on the job or well-practiced? has she been standing for a long time or did she have a break recently? is she daydreaming or focused on her job? are there ways I put my groceries on the conveyor belt that made it easier or harder for her to process them? How does she answer when I ask “how has your day been going today?”

Another way lack-of-respect seems to bloom these days is in on-line interactions. Anonymity, pseudonymity, or even just being physically remote but virtually present, seems to bring out the worst in some people. If you’ve got a yen to be flamboyantly disrespectful to a stranger, by god you’re living in a golden age. You don’t even have to get up off the couch. You can be disrespectful to people by the thousands almost at the push of a button.

It is difficult even for otherwise well-behaved people to resist the temptation to, for example, share a video of some stranger embarrassing themselves in a particularly entertaining way. Is it respectful to help make someone notorious for some foible, weakness, indiscretion, or misjudgement… probably not. But if I were to judge myself by that standard, I’d fail the test.

If you are frequently shown disrespect — if people do not often reciprocate the respect you show for them — you will probably respond by giving people less respect by default and making them earn the rest. In this way, the typical standard of respect within a culture may decay. Sometimes subcultures develop that try to nurture and defend standards of mutual respect superior to those in society at large (fraternal orders, religious sects, the “PLUR” ethos, William S. Burroughs’s “Johnson family,” and things of that nature).

Thought experiments and games

There are some thought experiments that are designed to promote respect for others by evoking a “had fate so decided, our positions might have been switched” feeling. Most simply is just that: imagine what it would be like if you were in their place and they in yours.

A more complex version of this is a favorite of modern political philosophy: the “veil of ignorance” invented by John Rawls. Imagine that before you came into the world you had no idea who you would become, but you had a voice in what sort of world you would inhabit. What would be the ideal sort of political arrangement you would design from such an original position, if you knew you might end up assigned to any role within it?

My favorite is one that Alan Watts frequently returned to. I think he thought of it as more than a thought experiment: a revealed truth of some sort. Imagine that you are God, but being bored with being omniscient, omnipotent, and so forth, you decide to invent Creation and then go there to hide from yourself, a bit like a king putting on grubbies to mingle with the commoners for a day. In this telling, God incarnates himself in each of us, hiding from himself so thoroughly that he forgets who he is and how he got here. The upshot of this is that while you are experiencing your life, including all of your triumphs and follies and pleasures and pains… the very same “you” is experiencing your neighbor’s life just as vividly. Imagine the respect you would feel for your neighbor if you knew that deep down you were them as well.

That thought experiment is one of those things I am tempted to believe not because I think there’s is any good reason to believe that it is true, but just because I like the implications if it were true. I expect that means I will now have to do LessWrong penance of some sort.

Another way to build the skills of respecting other people might be through game play. You can’t be successful at chess, for example, if you can’t understand your opponent’s position and motivations. Role-playing games permit you to try on personalities and perspectives with goals and motivations unlike your own and may help you broaden your respect for different viewpoints. I wonder whether the many make-believe games of children — cops & robbers and the like — are in part exercises along those same lines.



Discuss

What risks from vaccines?

13 ноября, 2020 - 02:32
Published on November 12, 2020 11:32 PM GMT

I was chatting with friends about the Covid vaccine and they were concerned about side effects. I don’t expect significant side effects but don’t really have a principled analysis as to why not.

What kind of probability should one assign to significant side effects from a vaccine, particularly long term side effects which are less likely to be picked up in shorter testing regimes?



Discuss

Final Babble Challenge (for now): 100 ways to light a candle

13 ноября, 2020 - 02:17
Published on November 12, 2020 11:17 PM GMT

It is time. The final challenge in the 7-week babble challenge series. 

Let’s become stronger. Let’s go out with a bang. 

On the table in front of you is a candle.

This candle will burn as a metaphor for the light of Science, a little beacon of rationality. It will represent the will to keep practicing and honing our Art. 

Your task is simple. 

Light it. 

You have 1 hour to come up with 100 ways. 

Looking back

Here are the rankings before the final round. (You gain a star for completing a challenge, and lose one for missing a week. I’m not including myself since I’m the gamemaster.)  

Fantastic work, everyone.

★★★★★★ gjm 

★★★★★ Yonge

★★★★ Slider

★★★ Bucky

★★ Tetraspace Grouping, supposedlyfun 

★ NunoSempere, Elizabeth, Mark Xu

Overall, since starting on September 30th, there’s now been more than 100 completed babble challenges.

As a result, we have babbled over 5000 ideas.

I haven’t counted how many unique users joined, but plausibly more than 70. For many of them, the babble challenge was one of their first comments ever on LessWrong. Welcome to you all.

I want to thank everyone who joined this quest, it’s been an honor practicing creativity with you.

There were really too many good submissions over the weeks to list them all. What’s more, much of the value comes in just being able to think of many different ones, rather than a single idea being excellent. Nonetheless, to celebrate and inspire you for the final challenge, I gathered some great ones from previous weeks: 

Ways of going to the moon

I leave it on Earth ; eventually, in 4 billion years, the sun will have absorbed both the thing and the Moon and hopefully some parts of both will mix. (Vanilla_cabs)

Break out of the simulation, then reprogram myself to be on the moon. (mr-hire)

Use CRISPR to make myself smarter. Do whatever plan smarter Neel comes up with (Neel Nanda)

Send spaceships out to the asteroid belt to collect asteroids and bring them to earth. Not to extract valuable minerals, just to make the earth bigger and heavier. Both the increased radius and the increased gravity will bring the moon closer. Eventually it will be close enough that I can just reach out and put my object on the moon. (gjm)

…and more than 1000 more ideas!

Ways of escaping a locked room

Metal bars on the windows? Pee on them, take apart the phone, then connect one terminal of the phone battery to the bar and the other to the urine, and wait for the bar to be eaten away. (johnswentworth)

If I find myself in this situation I hereby pre-commit myself to using all of my available resources not to escape but to reign down hellfire remotely on whoever put me in there (avoid getting put in this situation in the first place) (Bucky)

Or, the solution that probably at least 5 people arrived at...

Write a LessWrong question post about being trapped in a locked room, pretending it’s a challenge to practice rationality. 

...and, again, more than 1000 more ideas!

Ways of hiding Einstein’s pen from evil forces for 50 years

Create a duplicate, hide it badly, and let it be stolen. Just keep the real one in a safe at your house. (Ericf)

Disguise it as, or hide it inside, something else, and then give that to someone else to hide, giving them an entirely false story about what it is and why it needs to be hidden. (gjm)

Sell the evil forces the pen for a high price. Invest the money. 50 years later, you will be rich and easily able to buy the pen back. (Mark Xu)

Memorize a binary sequence using a memory palace, which I use as an XOR cipher on a series of coin flips which indicate: "heads: go north 100 feet; tails: go east 100 feet". Flip 100 coins and write down the result, and then bury the coin in the place indicated by the flips XOR the sequence. (This is basically a one-time pad for north-eastern lattice paths) (TurnTrout)

Consequences on the world of the discovery of intelligent ant colonies

Small robots could be used to invade hive-minds and either spy on them or implant and manipulate thoughts (Slider)

Ants get good at hiding their colonies, but you can hire ants to find other ant colonies. (Elizabeth)

A lot of people think 'Oh that's interesting' - and then continue doing exactly what they would have done anyway. (Yonge)

Moving Forwards

This is it. Week 7 out of 7. 

Following the excursions into different forms of babble, I’m returning to where we started. A simple, constrained task. I thought that was most fun, and also most useful in feeling like it actually pushed the limits of my creativity. 

This will be the final babble challenge I host for now. But it won’t be the end of my attempts to build a culture of practice on LessWrong. I’m working on other plans, and hope to announce them soon. 

Next week I might write a longer Babble post-mortem. But one of the core things I take away is what I wrote already in the 3rd week, after noting how many people had participated: 

[The turnout] fills me with excitement and ambition. 

We’ve made a discovery. 

Who knew that there was all this latent excitement for doing weekly rationality challenges? That so many people were willing to actually roll their sleeves up, and show up every week to test the limits of our art? 

There’s a spark here waiting to be fanned into a flame. Imagine where we could go if we keep this up. 

If you feel the same, I invite you to join me. Find ways of practicing in your own life. Stay connected to that deliberateness and the relentless will to self-improve with your scientist hat on. 

Run your own challenges on LessWrong. 

In fact, there has recently been several things happening on LessWrong that move in this direction: 

I'm excited to see where this will go.

Rules
  • 100 answers or nothing. Shoot for 1 hour. 

Any answer must contain 100 ideas to count. That’s the final babble challenge. We're raising the bar. Let's do this!

However, the 1 hour limit is a stretch goal. It’s fine if it takes longer to get to 100. 

  • Post your answers inside of spoiler tags. (How do I do that?)
  • Celebrate other’s answers. 

This is really important. Sharing babble in public is a scary experience. I don’t want people to leave this having back-chained the experience “If I am creative, people will look down on me”. So be generous with those upvotes. 

If you comment on someone else’s post, focus on making exciting, novel ideas work — instead of tearing apart worse ideas. 

  • Not all your ideas have to work. 

I've often found that 1 great idea can hide among 10 bad ones. You just need to push through the worse ones. Keep talking. To adapt Wayne Gretzky's great quote: "You miss 100% of the ideas you never generate." 

It’s fine to say “build a volcano in my backyard and use it to light the candle”, “bribe a dragon to help me” or "rub my hands together real fast until they create fire". 

  • My main tip: when you’re stuck, say something stupid. 

If you spend 5 min agonising over not having anything to say, you’re doing it wrong. You’re being too critical. Just lower your standards and say something, anything. Soon enough you’ll be back on track. 

This is really, really important. I wrote this the first week. I still think it’s true, having now done 6 weeks of babble challenges. The freedom and lightness that comes with just babbling something, even if stupid, proves really helpful for also generating great ideas. 

---

Now, go forth and babble! 100 ways to light a candle!



Discuss

Communication Prior as Alignment Strategy

13 ноября, 2020 - 01:06
Published on November 12, 2020 10:06 PM GMT

Alice has one of three objects:

  • A red triangle
  • A blue square
  • A red circle

She wants Bob to learn which object she has. However, Alice may only send one of three messages:

  • “My object is round”
  • “My object is red”
  • “This is not a pipe”

The rules of the game (i.e. the available messages) are common knowledge before the game starts. What message should Alice send for each object, and what object should Bob deduce from each message?

Let’s think it through from Bob’s standpoint. A clever human might reason like this:

  • “My object is round” implies it’s the red circle, because that’s the only round object.
  • “My object is red” implies it’s the red triangle, because only the triangle and circle are red, and Alice could have perfectly conveyed the information with “My object is round” if it were the circle.
  • “This is not a pipe” implies it’s the blue square, because Alice could have perfectly conveyed the information with one of the other two messages otherwise.

If you’ve played the game CodeNames, then this sort of reasoning might look familiar: "well, 'blue' seems like a good hint for both sky+sapphire and sky+water, but if it were sky+water they would have said 'weather' or something like that instead, so it's probably sky+sapphire...".

Intuitively, this sort of reasoning follows from a communication prior - a prior that someone is choosing their words in order to communicate. In everyday life, this comes up in e.g. the difference between connotation and denotation: when someone uses a connotation-heavy word, the fact that they used that word rather than some more neutral synonym is itself important information. More generally: the implication of words is not the same as their literal content. A communication prior contains a model of how-and-why-the-words-were-chosen, so we can update on the words to figure out their implications, not just their literal meanings.

Communication priors suggest an approach to certain problems in AI alignment. Intuitively, rather than saying “I want X” and the AI taking that completely literally (as computers generally do), the AI instead updates on the fact that I said “I want X”, and tries to figure out what those words imply about what I actually want. It’s like pushing the “do what I mean” button - the AI would try to figure out what we mean, rather than just doing what we say. Indeed, we could even have the AI treat its own source code as a signal about what I mean, rather than as instructions to be taken literally - potentially recognizing when the program we wrote is not quite the program we intended, and doing what we intended instead. (Obviously the program itself would need some open-ended introspection/self-modification capabilities to allow this.) As long as the initial code and initial model of me is “close enough”, the AI could figure out what I meant, and we’d have a “basin of convergence” - any close-enough code/model would converge to what we actually intended.

Of course, that all requires formalizing communication priors. This post sketches out a relatively simple version based on the Alice/Bob example above, then talks about the more complicated version needed for alignment purposes, and about what the approach does and does not do.

Formalizing a Communication Prior

We’ll continue to use the Alice/Bob example with the colored shapes from earlier, though we’ll use more general formulas. We’ll call the message M.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  and the intended meaning (i.e. object) X.

Our receiver (i.e. Bob) starts with some naive guess at the meaning X, just based on the literal content of the message - i.e. “My object is red” would, taken literally, imply that it’s either the triangle or the circle. We’ll write this naive guess as

P0[X|‘‘M”]=1P[M]P[M|X]P[X]

This is basically just a Bayesian update. The only subtlety is the quotes around ‘‘M” - this makes a distinction between the message ‘‘M” (i.e. the letters “My object is red” on a screen) and the literal meaning M of the message (the fact that the object is red). The formula says that the naive guess at the intended meaning given the message (i.e. P0[X|‘‘M”]) is just a Bayesian update on the literal meaning of the message.

At this stage, assuming a uniform prior on the three objects, Bob would say that:

  • “My object is round” means it’s the circle
  • “My object is red” gives ½ chance each to circle and triangle
  • “This is not a pipe” gives ⅓ chance to each object

But at this point, Bob hasn’t accounted for all his information. He also knows that Alice chose the message to maximize the chance that Bob would guess the right object. So, let’s do another Bayesian update on the assumption that Alice chose the message to maximize the probability assigned to X under P0.

P1[X|‘‘M”]=1ZP[(‘‘M”maximizesP0[X|‘‘M”])|X]P0[X|‘‘M”]

(Side note: Z here is a generic symbol for the normalizer in the update, which would normally be P[‘‘M”]. I’ll continue to use it going forward, since the exact things we’re implicitly conditioning P[‘‘M”] on can be a bit confusing in a way which doesn’t add anything.) This is another Bayesian update, but this time starting from P0 rather than the original prior. At this stage, Bob would say that:

  • “My object is round” means it’s the circle
  • “My object is red” means it’s the triangle, since “My object is red” is not the message which gives highest P0[X|‘‘M”] when X is the circle.
  • “This is not a pipe” means it’s the square, since “This is not a pipe” would not give the highest P0[X|‘‘M”] when X is the circle or triangle.

Let’s do one more step, just to illustrate. Bob still hasn’t used all his information - it’s not just that Alice chose the message to maximize the probability assigned to X under P0, she also chose it to maximize the probability assigned to X under P1. How did she choose the message to maximize both of these simultaneously? Well, given our formulas above, if ‘‘M” maximizes P1[X|‘‘M”], then that implies that ‘‘M” maximizes P0[X|‘‘M”] as well. However, the implication does not go back the other way in general; the fact that ‘‘M” maximizes P1[X|‘‘M”] is stronger.

Intuitively, we’re “ruling out” messages for each X at each stage. Any message not ruled out at stage 1 was also not ruled out at stage 0 - the messages “not ruled out” for X are precisely those which assign maximal probability to X at all earlier stages.

Upshot: by choosing X to maximize P1[X|‘‘M”], Alice also implicitly chose X to maximize P0[X|‘‘M”].

Anyway, next step: we form P2[X|‘‘M”] by updating on the fact that ‘‘M” maximizes the probability assigned to X under P1.

P2[X|‘‘M”]=1ZP[(‘‘M”maximizesP1[X|‘‘M”])|X]P0[X|‘‘M”]

Note that we’re still using P0 as our prior in this update; that’s to avoid double-counting the fact that Alice is maximizing P0, while still accounting for the literal content M. If we continue the chain, each subsequent step will look like

Pk+1[X|‘‘M”]=1ZP[(‘‘M”maximizesPk[X|‘‘M”])|X]P0[X|‘‘M”]

In this case, we find that P2 is exactly the same as P1 - the calculation has converged in finite time. More generally, we can say that Bob’s final probabilities should be

P[X|‘‘M"]=P∞[X|‘‘M”]=limk→∞Pk[X|‘‘M”]

As a Fixed Point

The argument above is very meta, and hard to follow. We can simplify it by using a fixed point argument instead.

Instead of the whole sequence of updates, we’ll just start from P0 (i.e. the literal content of the message), and update in a single step on the fact that Alice is optimizing the message: Alice chooses the message ‘‘M” to maximize the final probability P[X|‘‘M”].

P[X|‘‘M”]=1ZP[(‘‘M”maximizesP[X|‘‘M”])|X]P0[X|‘‘M”]

This is a fixed-point formula for P[X|‘‘M”]. Formally, the “communication prior” itself is (‘‘M”maximizesP[X|‘‘M”]).

This is intuitively simple, but unfortunately P[X|‘‘M”] is extremely underdetermined by the fixed-point formula; there are many possible P[X|‘‘M”] we could choose, and limk→∞Pk[X|‘‘M”] is just one of them. Intuitively: we could map messages to objects any way we want, as long as we respect the literal content of the message. As long as Alice and Bob both know the mapping, we choose P[X|‘‘M”] according to the mapping, and everything works out.

The fixed point formula is a criterion which any winning strategy must satisfy, but there are still many winning strategies.

Our particular choice of P[X|‘‘M”]=limk→∞Pk[X|‘‘M”] comes from iteratively expanding the fixed-point formula, with initial point P0. If either Alice or Bob decides to use this model, and the other knows that they’re using it, then it’s locked in.

More generally: each player’s optimal choices depends heavily on their model of the other player. Alice wants to act like Bob’s model of Alice, and Bob wants to act like Alice’s model of Bob. Then there’s the whole tower of Alice’s model of Bob’s model of Alice’s model of…. Our Pk[X|‘‘M”] sequence shows what that tower looks like for one particular model of Alice/Bob.

Beyond Idealized Agents

The (‘‘M”maximizesP[X|‘‘M”]) communication prior is where Alice and Bob’s models of each other enter. In this case, we’re effectively assuming that Alice is a perfect agent - i.e. she picks her message to perfectly optimize Bob’s posterior. This is an idealized communication prior for idealized agents.

For alignment, we instead want a model of how humans communicate - as people who’ve played CodeNames can confirm, humans do not reliably think through many levels of implications of their word-choices! We really want to update on something like (<rough-model-of-human> thinks ‘‘M” results in high P[X|‘‘M”]). The better the model of how the human chose ‘‘M" based on what they want, the better the AI will be able to guess what we want (i.e. X) from our “messages”.

To the extent that the AI is modelling the human modelling the AI, we still get the meta-tower and possibly a fixed point formula (depending on how good the model of the AI in the human’s head is). The AI can treat both its own code and the human-model as “messages”, and so potentially correct sufficiently-small errors in them.

What This Does And Does Not Do

In some sense, this idea solves basically none of the core problems of alignment. We still need a good-enough model of a human and a good-enough pointer to human values. We’d still like an AI architecture with goals stable under successor-construction. For maximum safety, we’d still ideally like some good-enough scaled-down tests and/or proofs that some subcomponents actually work the way we intuitively expect. Etc.

What this does buy us is a basin of convergence. On all of the key pieces, we just need to be “close enough” for the whole thing to work. Potentially being able to recover even from small bugs in the source code is a pretty nice selling point. Of course, there are probably basins of convergence for many approaches, but this one offers at least the possibility of being able to explicitly model the basin. How sensitive is the end result to errors along different dimensions of the human-model? That’s the sort of question which could be addressed (either theoretically or empirically) in toy models along these lines, and potentially lead to generalizable insights about which pieces matter more or less. In other words: we could potentially say things about how big the basin of convergence is, and along which directions it’s wide/narrow.

That said, I still think the biggest blocker - both for this approach and many others - is figuring out pointers to human values, and how pointers to real-world abstract objects/concepts work more generally. Right now, we don’t even understand the type-signature of a “pointer” in this sense, so it’s rather difficult to talk about a basin-of-convergence for human-value-pointers.



Discuss

Socratic Grilling

13 ноября, 2020 - 00:16
Published on November 12, 2020 9:16 PM GMT

NOTE: This is at the park area by the store. Wear your mask and practice social distancing at this event.  Please finish eating before coming. Gather to hang out starting at 1:30 pm, with the main activity commencing at 2:30 pm.

Inspired by the SlateStarCodex post, we will do the activity version of Socratic Grilling.

As explained in the article, rationalist practice requires noticing when you're confused, and attempting to resolve it. But asking such questions sometimes looks like trolling!

So, to see this in action, we'll take turns -- possibly splitting into groups -- having someone speak on a topic very prone to misconceptions. Listeners will commit in advance to either:

a) ask honest clarifying questions about points of confusion, or
b) troll and "trap" the speaker.

At the end, we'll guess who was doing what. (Remember, the best trolls look like "they're just trying to ask questions, man!")

Examples of such topics include evolution, the Efficient Markets Hypothesis, quantum computing, germ theory, monetary policy, and whatever else you've noticed from experience.

We had a lot of fun with this before, and I'm excited to try it out again!
***
The meetup is at Central Market, 4001 N. Lamar, in the park area, near the stone tables by the pond. Look for the LW and SSC signs. 


Reminders of rules for in-person events.

1. Do not attend if you have a fever, cough, or other symptoms.
2. Wear a mask over your nose and mouth at all times during the meetup.
3. Stay >6 feet away from all other attendees. (Exception: people who are already part of the same household)
4. Do not eat or drink during the meetup (because this would defeat the purpose of wearing masks).

We'll see you then!



Discuss

Tips on organizing online meetups

12 ноября, 2020 - 23:02
Published on November 12, 2020 8:02 PM GMT

I have been co-organizing the SlateStarCodex online meetup series with ~40-140 participants in each meetup. We've experimented to get it to work smoothly and have some tips.

This advice is for meetups which have a talk and QA and post-talk socializing. I will appreciate similar tips from meetup organizers.

For   socializing, we use  breakout rooms as mentioned below; but for pure socializing,   Icebreaker works even better, putting people together for short conversations (like speed dating). I saw this work well in some Effective Altruism Icebreaker sessions. 

Another good format is Rump Sessions: Four-minute Lightning talks, given by on-the-spot volunteers (no need to pre-apply); and for each,  the audience votes for a  three-minute extension if the speaker wants. That worked well for the preliminary session in the series, but all the rest had invited speakers.

Here's how we do it.

  • We  advertise using a regular (non-event) post on  LessWrong, as LessWrong does not support non-geographical events. We also make  an event post under LessWrong Tel Aviv,  allowing people to get a notification or see this in the event page.
  • We also post to /r/slatestarcodex
  • We post to Facebook. Few Facebook groups in the rationality-sphere are really active, so we do not share to any of those. (I do post to my local LessWrong Israel group.)
  • We have a MailChimp mailing list, which has worked very well to get out the word. People can sign up when they register for a meetup.
  • For each session, we ask people to register; then we send them an invitation. So, we do not publicize a direct video-meet link. This approach gives us partial visibility into who will join.
  • We use Google Meet because the paid version allows more than 100 participants, and the paid version of Zoom does not. Usually someone can offer a paid version from their work or school, but note that the owner of the account will need to administer the call, so the owner cannot just hand off to the meeting organizers.
  • I suggest in the invitation and in my introduction that --  both for the main event and the after-talk socializing -- participants turn on  video, and connect with their real name, for social connectivity. However, this is not mandatory.
  • We do it Sunday 18:30 UTC because this accommodates the US, Europe, and Israel times of day and workdays.  (This leaves out areas east of Israel to the Pacific, unfortunately).
  • Schedule: I connect 15 minutes before the session to test the audiovisuals. I recommend that the speaker connect a few minutes earlier, as we have had audiovisual problems in the past. The session  begins 1 minute after the scheduled time, and I give a 2-minute technical introduction about muting mics, videorooms, etc., and a 30-second intro for the speaker. The speaker gives a talk for 0 to 90 minutes, and then we do Q&A. After that, we go to videorooms.
  • We ask everyone to mute mics to minimize background noise. I mute people if they forget to  do it themselves.
  • Participants chat freely in text chat during the whole thing.
  • Participants ask questions in text chat, prefixed by a "Q". My co-organizer curates them and feeds me questions to read out. This approach, compared to oral questions, shares the time fairly, avoiding  time-hogs. It also supports those with a bad audio connection. Afterwards, we do allow oral questions if there is time.
  • For the last phase, following Q&A, we share links to 3-4 Jitsi videorooms for socializing. Participants can switch between rooms. This is meant to simulate post-talk chats in a face-to-face conference.


Discuss

Covid 11/12: The Winds of Winter

12 ноября, 2020 - 17:30
Published on November 12, 2020 2:30 PM GMT

Spoiler Alert below the fold, can’t be helped.

We have perhaps the best possible news. Pfizer’s vaccine is over 90% effective, and they are going to ask for emergency use authorization in a few weeks. Woo-hoo!

We also have very bad news. It was clear the numbers were going to get worse, and they are substantially worse than I expected. Things are about to get quite bad out there. Also the attempts to overturn the results of the election aren’t great.

There’s a thought I can’t shake, though.

You ever think a writer is getting a bit lazy when it comes time to wrap things up? 

An inhuman threat that grows more deadly as it gets colder, that no ordinary sword can slay and that cares nothing for politics, grows exponentially in power by the day. 

There remain two who would rule the land. 

One, who is currently in the seat of power, claims they and their family are the best at business but their longtime revenue sources are bankrupt and they rely heavily on loans from banks to stay afloat, while their true accounting method is being vindictive as hell. Infighting has caused the most competent people in the administration to abandon it. Having seized control in the crucial moment amidst divided opposition and maintained it against claims of their illegitimacy and worse, they put up a surprisingly strong fight. When provided with evidence of the inhuman threat right in front of their face, menacing them directly, they briefly acknowledge it, but mostly ignore it and are happy to let others deal with it. It’s not their problem. They warn that their rival would destroy all if allowed to rule. They refuse to admit defeat and keep only the counsel of very close family, even after the verdict is clear. 

The other used to be high up the line of succession before power changed hands, and has returned with allies of varied ethnic origins. They have assembled a broad coalition, many with pasts they are ashamed about, and some of whom have proven difficult to control and prone to rioting. Many are there purely because they would welcome almost any change. They have suffered great tragedy in their life, losing two children and a spouse. They arrive to restore the old norms, for honor and dignity, and to free the people. They pledge to fight systemic injustice. They claim they will rule for all the people, especially those who need help the most. 

As the conflict between the two looks to be reaching its zenith, the inhuman threat threatens to overtake us all. 

The second claimant, with almost no help from the other, puts their campaign on hold and attempts to rise to the challenge, assisting allies in developing new weapons while their rival consolidates their base. But their efforts to defeat the inhuman threat seem to have no plausible path to success and to be in vain. The enemy’s ranks swell and all seems to be lost.

Then, as our darkest hour approaches, a hero answers the call! An outside force has been working the whole time to hone advanced techniques that many people do not believe exist or would prove ineffective. Just in time, they deliver the blow we need to defeat the inhuman enemy before all is lost. Shortly thereafter, the second claimant dispatches the first. 

We are saved, but winter is upon us and much has been lost. It will take much time to rebuild, on many levels. The urge to celebrate is understandable, but the coming months will be the deadliest. Even if most would prefer to pretend otherwise.

I need to stop this aimless musing. Time to focus. Let’s run the numbers, then deal with the vaccine news, then double back to the shorter term in light of both halves. 

The Numbers

I am sorry about all this, I really wish it wasn’t necessary to point out who won the election before going over the Covid numbers, but it is. If you want to know how we will handle the pandemic, it is important to know who will be leading the fight. 

By all means, skip this first section if you don’t need it.

Votes

You really, really should know this already, and I’m sure most of you do, but remarkably large numbers of people are in denial or lying about it, including the President. People are crazy, the world is mad. So I’m going to pause here up front and say it.

Congratulations to President-Elect Joe Biden, who defeated Donald Trump in a free and fair election, and will be the 46th President of the United States at noon on January 20, 2021. Votes are still being counted in several states, so the popular vote margin will continue to grow. Trump continues to file frivolous lawsuits and all-caps tweets alleging voter fraud and that the election was stolen, just as he promised to do before the election. In those lawsuits, no evidence of fraud has been provided. 

Trump is successfully undermining faith in our elections among many of his supporters. Most top Republicans are choosing their words carefully and refusing to acknowledge Biden as President-Elect. Some are choosing their words less carefully and indicating their support for ending our democracy and keeping power regardless of the vote. But Trump has no legal path. 

Trump has also fired most of the civilian pentagon leadership and replaced it with loyalists, while continuing to claim he won the election. Secretary of State Pompeo said there will be ‘a smooth transition to a second Trump administration’ and when given an opportunity later refused to say he was joking. 

Prediction markets are offering a 10% return even now for betting on Biden, including at BetFair where the resolution rule is ‘projected winner’ so according to their rules as written Biden has already won and they should have paid out last week. You can bet hundreds of thousands of dollars that way at any time at those odds. PredictIt’s markets are even crazier. 

We really do live in (at least and probably more than) two disjoint realities. 

Any number of things could happen (probably won’t, almost certainly won’t, but could) between now and January 20.

Also, the pandemic is completely out of control.

So for many reasons, it wouldn’t be a terrible idea to maintain and even top off the emergency supply stash just in case things turn truly ugly. Very low probability but highly dangerous tail risk events are worth guarding against. 

The anti-politics rules for the comment section still apply, now more than ever. Do not discuss politics in the comments section except as it directly relates to Covid-19. Definitely stay away from any discussions or debates about voter fraud or who is or isn’t committing a coup or autogolpe. This is not the place. Reign of terror rules apply here. 

If you know me and want to discuss politics to better understand what is happening, I’m not against that at all, but contact me privately. 

(Or if you want to talk about other stuff, that sounds good too, I don’t talk to friends often enough.)

Deaths DateWESTMIDWESTSOUTHNORTHEASTSep 10-Sep 1611599543199373Sep 17-Sep 2310168932695399Sep 24-Sep 309349902619360Oct 1-Oct 779711032308400Oct 8-Oct 1478212172366436Oct 15-Oct 2180415912370523Oct 22-Oct 2889517012208612Oct 29-Nov 495619772309613Nov 5-Nov 11108926882535870

We knew it was coming. You still hate to see it. The Midwest and Northeast numbers shot way up and there is no good news anywhere. New York’s number rose from 95 to 158 and is now clearly distinguishable from zero on the chart. There is no reason to not expect an even larger percentage rise in deaths in the next few weeks. We will doubtless hit 2,000 per day soon. I still would be very surprised to hit the scare tactic number of 400k deaths in 2020, as that would require averaging 3,340 per day from here, and I don’t think there is enough time for that to happen. 

Positive Tests DateWESTMIDWESTSOUTHNORTHEASTSep 10-Sep 16450507526411581223755Sep 17-Sep 23540258538112773223342Sep 24-Sep 30554969293210630027214Oct 1-Oct 7567429724311017034042Oct 8-Oct 146828412574411799538918Oct 15-Oct 217557114985113323843325Oct 22-Oct 289498318188115812357420Oct 29-Nov 411268425291716709870166Nov 5-Nov 11157378384862206380108581

The Midwest is completely out of control, with cases doubling in the last two weeks. Other areas are better, but not much better. Things in South Dakota are very, very bad and that was several days ago. The Midwest and West are already at record levels. The Northeast should be at record levels within a week or two, the South within one to three. Testing is increasing, but it is only a small part of the story and the testing we have is increasingly inadequate to the task at hand in most states.

Positive Test Percentages PercentagesNortheastMidwestSouthWest9/3 to 9/91.97%6.02%8.48%4.13%9/10 to 9/162.41%5.99%11.35%4.49%9/17 to 9/232.20%5.96%7.13%4.11%9/24 to 9/302.60%6.17%6.18%4.27%10/1 to 10/72.61%6.05%6.74%4.23%10/8 to 10/142.57%8.14%7.09%4.75%10/15 to 10/222.95%8.70%7.85%5.36%10/22 to 10/283.68%9.87%8.58%6.46%10/29 to 11/44.28%12.79%8.86%7.04%11/5 to 11/115.56%17.51%9.89%8.31%

Once again, disaster across the board but the Midwest is in especially deep trouble. Iowa and South Dakota are above 50% positive test results. I would have thought that essentially the maximum, and means we have no idea how many infections are being missed and can expect to undercount deaths as well because many will go undiagnosed. But North Dakota is now at 72% positive which was never seen even in the worst places in New York City. So it’s possible. 

Test Counts DateUSA testsPositive %NY testsPositive %Cumulative PositivesSep 3-Sep 94,849,1345.3%552,6240.9%1.93%Sep 10-Sep 164,631,4085.8%559,4630.9%2.01%Sep 17-Sep 235,739,8535.2%610,8020.9%2.10%Sep 24-Sep 305,839,6275.1%618,3781.1%2.19%Oct 1-Oct 76,021,8075.2%763,9351.3%2.29%Oct 8-Oct 146,327,9725.8%850,2231.1%2.40%Oct 15-Oct 216,443,3716.5%865,8901.2%2.52%Oct 22-Oct 286,936,3007.5%890,1851.4%2.68%Oct 29-Nov 47,244,3478.6%973,7771.6%2.87%Nov 5-Nov 118,185,15410.3%1,059,5592.4%3.13%

So much for New York being able to keep things under control. No place in America is safe. 

This will likely only get worse. Yesterday’s positive rate was 12.7%. Next week looks in expectation to be something like a 12.9% positive rate on 9 million tests and an average of 1,200 deaths per day. Which means half the time it will be worse than that. Yikes. 

Europe

Lockdowns in the United Kingdom, France and Belgium (see below section for Belgium, which I keep off the charts because of scale issues) seem to have at least stabilized matters, although at a terrible level, and deaths will continue to rise for several weeks. Belgium is now making clar progress. France might be, but I don’t believe the spike can be as sharp as the above graph, and it looks like reported tests for that last day are way down, so it’s too early to know what’s happening there. Germany’s half measures seem to be slowing down the rate at which things get worse but not doing enough to stop things getting worse.

Getting away from an unambiguous hockey stick is great news compared to the alternative, but at first glance it looks like a lot of countries are signing up for long periods in limbo again, with highly damaging lockdown restrictions that aren’t strong enough to quickly get the job done. The exception, perhaps, is the hardest hit place of all: Belgium.

Lockdown in Belgium

Belgium’s lockdown is rather strict. It turns out that when you care enough to do it properly, yes, lockdowns still work. And they work fast. 

They don’t work quite as fast as this indicates…

…because either they are doing less testing or the reporting of tests lags a bit, but it does seem that things are turning around. 

Whereas other European nations are doing relatively half-assed lockdowns while keeping schools open. That is not going as well. For graphs, see the Europe numbers section. 

The song remains the same. Either lockdown and try to win for real, or accept defeat for real. Half measures end up in limbo.

All I Want For Christmas Is a Covid Vaccine

It’s happening!

Not for me until next year. Not for most of you either. But it is happening.

The question now is: What does this mean in practical terms?

In the short term, for the next few months and the wave of infections currently upon us, it means nothing. This wave will almost certainly peak before the vaccine has time to have a non-trivial impact on the pandemic.

A few months from now, health care workers will hopefully get vaccinated. That expands the capacity of the hospital system and other health care a substantial amount, because workers can take less precautions and fewer of them will be out sick. That’s great, but on its own it won’t move the needle that much on the overall course of the pandemic.

Early next year, there will be enough people vaccinated that it has a substantial impact on the overall arc of the pandemic. Alas, it is likely that people’s control systems will kick in, and rather than take less risk in order to wait for a vaccine, the majority of people will instead take more risk because they’ll sense things are less dangerous. So if you are a responsible person waiting for your turn to get the vaccine, you’ll need to continue to be careful up until your turn. With vaccination on the horizon, the value of not being infected will be very high.

My guess is that some time between March and July, there will be enough vaccine doses that anyone who actively seeks out the vaccine will be able to get it. At that point, you’ll be able to return to your old life. Dr. Fauci said April for widespread availability after I’d written that guess, smack in the middle of the range, so it seems like the right estimate.

A few months after that, enough people will be vaccinated that life overall starts to feel normal again. I hope.

A lot of things could change that timeline. A key question will be whether we only have one vaccine available, or whether we will have multiple vaccines. The Pfizer vaccine is harder to scale than the others, because it requires extreme cold storage and two doses, and the effects of having multiple available vaccines would mostly be additive.

CEO of Pfizer will take the vaccine first in order to reassure people that it is safe. Skin in the game at its finest. Also a great excuse to get yourself first in line for the vaccine. That’s all right. He’s earned it.

Don’t Care How I Want It Now

When should we begin distribution of the vaccine? How should we choose who gets the vaccine? How much should we be ramping up production? What about other vaccine candidates?

First question is easy. We should begin vaccine distribution yesterday. The ‘good’ news is that not doing so is a small mistake due to limited dose capacity, and the ability for now to keep what doses do exist in cold storage until needed. So by waiting, we are moving some vaccinations from November into December, but not (as I understand it) delaying vaccination in general. That’s unfortunate because getting our health care workers vaccinated now would be a big help, but far less bad than many other mistakes that are being made.

Second question is not as easy. A lot more people want the vaccine as soon as possible than we will have doses available any time soon, so we must choose who gets the vaccine first. It’s clear that using prices is a non-starter because people wouldn’t stand for it, so I won’t bother making that case. 

Doing distribution via lottery allows us to conduct (semi) natural experiments to see how effective and how safe the vaccine is on larger sample sizes, at little extra cost. With prices unavailable, some form of lottery should clearly be part of the solution, whether it’s by birthday, or by area, or something else. I don’t expect us to do this.

I do hold out hope we can give priority to essential workers, especially health care workers. As noted above, vaccinating health care workers not only helps stop the spread, it expands our ability to provide treatment. It also seems highly equitable. These people are putting themselves on the front lines, they get the vaccine first. 

Other essential workers that need to interact with others to do their jobs can follow after that. 

The core argument there is that what matters is getting the pandemic under control and being able to provide essential services, so we should concentrate on the most exposed first. 

The other argument is that we should instead protect the most vulnerable first, and give the vaccine to the elderly or those with other conditions that put them at higher risk of death.

Both sides can point to models that say their way is better. Both sides can make a moral case that they are right. Both factors matter, so it’s a trade-off question.


The important thing is that we get the vaccine out, as quickly as possible, to as many people as possible.

Then there are those who disagree. They have other priorities.

Andrew Cuomo Is The Worst 

Can we finally all agree that Andrew Cuomo is a giant douchebag and always was? 

He has already extensively lectured everyone on how we can’t trust a vaccine that was developed under Trump’s watch. Now that one is coming, he’s stepped up his game.

He is now saying that it is “bad news” that the vaccine was developed while Trump was in office and he is going to “work with other governors to stop distribution of the vaccine.” Because, you see, they’re having “private providers” distribute the vaccine, which will “leave out” some communities. Seriously. Listen to the clip. 

So the vaccine is a cupcake that you have to throw away because you didn’t bring enough for the rest of the class and – seriously listen to the clip if you don’t believe me but this is what he is actually saying – some people don’t live close enough to a CVS, so no one should get vaccinated until Biden is in the White House.

Alternatively: If we don’t do something to stop it, someone somewhere might have two cows.

He is saying that everyone needs to have equal access to health care, and to achieve that he’s going to actively stop others from getting it. Because otherwise, when Biden takes office, he can’t “undo” what Trump has done, and those people will have permanently gotten the vaccine earlier than some other people.

Lest anyone accuse me of false equivalence on Covid-19, let me be clear. This isn’t equivalent to denying there is pandemic or engaging in literal piracy and banditry of medical equipment. 

This is worse. 

If you don’t believe one should ever hate anyone or anything, then I congratulate and salute you on your enlightened attitude. However, if you believe it is good and right to sometimes hate at all, and you hate this with less than the fire of a thousand suns, you aren’t hating it with the fire of enough suns. 

Pitchforks are available at Home Depot and many other fine stores.

We’ll Need More

How much should we ramp up production? We should do all the ramping up of all the production.

Look at what happened to the stock market on the day Pfizer announced its results, note that Pfizer was only a tiny fraction of total gains and went up far less than many other businesses, and then note that the stock market’s gains are a small fraction of the real human gains in economic terms. And the economic damage is only part of the toll on human lives even for those who stay healthy. 

If you do the math on how much it is worth to end this pandemic one day sooner, the answer comes back to a much higher order of magnitude than it would cost to speed up vaccine production and make that happen. 

We missed our chance to do this in the development stage, but we can still do it in the distribution stage. The more money we can throw at this problem to get more production faster, the better. I do not care what it costs. 

We will also need more different vaccines. Even with the right incentives, Pfizer will not be able to produce enough quickly enough on its own, so the other candidates (that turn out to be effective) need to help out as well. 

There are several nightmare scenarios that need to be avoided. If you are in any kind of position to ensure that these do not come to pass, please do everything you can to help!

The first nightmare is if vaccine trials are not allowed to continue once the first emergency use authorization is given. The argument will be that it is not “ethical” to continue to not vaccinate the control group. This is pure insanity, and it is even more pure insanity when there isn’t enough vaccine to go around and those in the control group could not otherwise have gotten it anyway.

I’m still naive enough to think we’re not that insane. At least not yet. But I’m still worried this might happen! Both for the Pfizer trial, and for other vaccine trials. We need to make sure that doesn’t happen. 

Even worse would be denying the EUA because of fears that the trials would end if the EUA was given. This is seriously something we have to worry about – that our “ethics” principles are so reversed that we need to deny everyone the vaccine in order to deny the vaccine to the few people we need to not (yet) vaccinate.

There’s the nightmare that Andrew Cuomo or others like him hold up distribution, as described above in his own section. Please do not let this happen.

The mistake I fear the most is that we might refuse to allow a second vaccine because it is ‘less effective’ than the first vaccine. That Pfizer’s comes in at let’s say 91% effective, and then AstraZenica’s is 87% effective, and because of that they refuse to approve it, even though Pfizer won’t have enough doses available for years. From what I can tell, this is a real worry and this could actually happen. Similar things happen all the time. 

Immunity to Covid-19 For Some, Miniature American Flags for Others

Thanks to a combination of the existing anti-vaccination movement, and the fears that were raised during the campaign by the likes of Andrew Cuomo and Kamala Harris, there will be a lot of people who are reluctant to take the vaccine, or at least to take it relatively early. It might even be a majority of Americans who are not interested. 

When I posted to Twitter about the vaccine, several responses were to inform me that the person responding was most definitely not interested in taking a Covid-19 vaccine, and found me rather crazy for wanting to take one.

In the long term, that’s a problem. We need to get enough people vaccinated that we can resume normal activities. If we only get half the people, especially the half we’d get in this scenario, that on its own won’t be enough. 

In the short term, this may sound like a problem, but actually this is great. We don’t have enough vaccine doses for everyone who wants it. If half the people volunteer to opt out of the process, then the rest of us can get vaccinated twice as fast, and can be less worried about not being high in the priority queue. 

Then, in the long term, there will be hundreds of millions of us around the world who have gotten a Covid-19 vaccine. Our experiences will prove it is safe, and most of the people doubting now can follow later. If they like, they can feel smart for having waited to be sure. Or, if they prefer, they can acknowledge their mistake. Either way seems fine. 

That still leaves the hardcore anti-vax people, but we were never winning them over in any case, and I don’t think there will be enough of them left to cause a major problem. 

What The Hell Is Happening To Us?

Venk Murphy, newly tapped as one of the heads of the new Covid-19 advisory board, points out this story that those who had Covid-19 are 20% likely to have gotten a new mental illness in the last 90 days, which is twice as high as the general population.

It’s easy to bury the lead here. Yes, 20% of people who survive Covid-19 getting a new mental illness is really bad. I was going to ask whether this result was expected. With the broad ways we today define mental illness, combined with the trauma of dealing with the virus, it would make sense that a lot of people would develop a problem at least in the short term under our technical definitions. 

I’d also note that this is among people who know they had Covid-19, not the entire group of people who did have Covid-19. That’s a big difference, and is likely pushing up the difference quite a bit. It’s also the better thing to track, though. It’s no surprise that it is not trauamtic to get asymptomatic Covid-19.

But that all misses the point. 20% of people who survive symptomatic Covid-19 having a new mental illness is bad. But 10% of the entire population developing a new mental illness every three months is much worse! 

That’s more than a third of the population each year. That’s… really terrible.

Multiple friends of mine with no connection to each other responded that 10% matches their observations. That’s how bad it is to live under these conditions. How much of that is the pandemic versus the rest of American life today is an open question, but it’s clear something is deeply, deeply screwed up here.

For comparison to pre-Covid, I googled and found this, which claims 26% of Americans have a diagnosable (not necessarily diagnosed) mental disorder in a given year, and this source claims 46% will suffer from a mental illness at some point in their lives, which presumably are all pre-Covid stats. These numbers are much higher than for the rest of the world. 

I don’t know the usual duration of a mental illness, but if you combine 26% suffering in a given year with 46% suffering at least once at some point, you definitely get a radically different picture than looking at a 10% chance of a new problem every three months.

Someone I know reports that the majority of her daughter’s remote learning class is newly suffering from depression. When she described how the school was running things, and how their entire lives had become dominated by a combination of being tied to screens and then potential and real interruptions demanding proofs of work at any and all hours of the day in order to destroy the lives of both the kids and their parents, I didn’t have to wonder why. 

(Then I may have gotten the parents to form a union that fought back and forced the school to change its policies. But that’s another story.)

This is all only part of the non-economic cost of our countermeasures to contain the virus. What percent chance of death should a person risk to avoid developing a mental illness?

The question of ‘why were so many of us were already so mentally screwed up’ is an important one, but beyond the scope of this column. 

In Other News

In other Andrew Cuomo being the worst news, gyms, bars and restaurants with liquor licenses can stay open but have to close by 10pm. I do not wonder what will happen to capacity utilization when we cut the number of hours available. The bars might be a net benefit because people act stupid at night, maybe even the restaurants, but closing the gyms some of the time definitely makes us less safe. He’s also limiting indoor gatherings to 10 people. Which is at least somewhat helpful. Although, I wonder to what extent telling people the limit is ten causes ten person gatherings.

I remember a few days ago when there were musings like this: Will a small, long-shot U.S. company end up producing the best coronavirus vaccine? They still might. It’s great that we have so many backup plans.

Did you know that congress set aside money so airports could do Covid-19 tests, and then the FAA kept the submitted proposal in limbo where it still is today? Thus no screenings at airports. 

On a related note, what happened to the rapid tests? Exactly what you would think. Regulations and regulatory uncertainty. 

This thread claims that a 22-day delay from positive tests to deaths best matches state data. That is on the extreme high end of my range based on the approaches I’ve taken, which focus more on the national and New York levels, but is plausible. Longer delays are worse news given the current situation. My guess is that you see the biggest direct impact at more like 14 days, but with other deaths that take longer, so if you pick one number it will depend on how you define your error term. 

Deutche Bank proposes a 5% tax on those who work from home voluntarily, to ‘support those whose jobs are under threat.’ I know I was talking about maximizing harm in the name of showing concern, but even I am impressed by this one.

Marginal Revolution offers words of wisdom on how we are told to put our lives on hold but all vaccine development doesn’t have to figure out how to approve and distribute a vaccine while continuing to conduct clinical trials, and this is threatening to potentially take away months of our lives for actual zero reason.

Did you know that being indoors with others is ‘safe’ under a certain threshold of infections but ‘unsafe’ over that threshold? “Some experts” are here to remind us that this is how they model the world. In case it wasn’t obvious, these actions were never safe.

Anecdotes are what they are, but this was only the latest example of someone reporting falling mask use. I’m guessing this is a lot of why things are getting so bad. This isn’t complicated.

In case you are wondering how we handled Covid-19 in prisons, this answer from Texas is not reassuring, and reminds us how bad Covid-19 can get under worst-case conditions.

CDC finally admits, this week, that mask wearing benefits the wearer and not only those around them. Better (ridiculously over the top amounts of) late than never, I suppose.

North Denmark in lockdown over mutated virus in mink farms, and all the minks were going to be killed, because minks are highly vulnerable to Covid-19 and there was worry about potential mutations. I was all ready to give kudos to Denmark for treating this problem with most of the seriousness it deserved, and thinking that maybe, just maybe, there is a threshold where we get ourselves together. Then some Danish MPs started complaining about the threat to livelihoods and claiming that the cull order was illegal, and the government backed down. It seems you can lock down the people and destroy their livelihoods together, but threaten the mink farm lobby and there will be hell to pay. A sobering turn of events.

Also, somewhat off topic, but relevant to how decisions are being made these days: while Denmark is farming fifteen million minks in cages and cares more about doing that then guarding against the mutation of a global pandemic, they also banned butchering of kosher and halal meat in 2015. I would wonder why, except that I don’t wonder why.

Even more off topic but worth pointing out: The Netflix series The Queen’s Gambit is excellent and you should watch it. Tier one, must see.

The Winds of Winter

A vaccine is coming. If you want it but do not have priority, chances are you will get it some time around April. That’s five months from now.

Meanwhile, cases are soaring. Positive test rates are over 10% and rising at >15% per week. Deaths are rising over 15% per week. Hospital systems are starting to become overloaded in the hardest hit areas, with the worst yet to come. Things are even worse in Europe.

The value of staying safe is higher than it has been since April. For at least the next two months, that value will only rise.

The chances of becoming infected are at an all-time high. If you become ill during the next few months, there is serious danger that the hospital system you arrive at will be rationing care. Our care has improved greatly, but temporarily it will likely get worse. 

Whereas if you make it to April, you can get a vaccine, and be (not entirely, but mostly) safe permanently. The cost of staying safe is only about five months of precautions, rather than an open-ended nightmare.

There are many variables to consider, but a rough estimate would be that the effective risk level of a given activity is somewhere between double and ten times the rate it was a month ago. That’s why I’ve been saying for weeks that if you have a risk that you need to take, that is worth taking, take it sooner rather than later. 

Whatever level of precautions you decided were appropriate before, you should increase that level.

Because everything works in power laws, it isn’t obvious that this will have a major impact on the way you live your life over the next five months, if you were already working or going to school from home and generally playing on the ‘safe’ side of things. I often mock calling some things ‘safe’ and some things ‘unsafe’ as a binary, but most things are clearly on one side or the other of that binary. There aren’t that many things that are borderline choices! 

If you are forced to take risks, now is the time to do everything possible to minimize those risks. 

But as always, think ahead! You need to retain your safety and sanity for at least several months before things improve, and likely at least five before the vaccine is available. Do not put yourself on a path that you cannot sustain.

Do I still think the worst is probably behind us? I do, but I am more concerned that the medical system will collapse, in which case things could quickly become very bad. It also seems more plausible that people’s control systems have broken down – people may be so sick of it at all that they won’t adjust, at the very time safety is most valuable both personally and collectively. 

The next few months are going to be difficult, but assuming a peaceful transition of power (even after everything, I still can’t quite fully believe that I need to write those words, but I do) the supply chains will hold. Our health care workers and other essential employees will have the equipment they need. Many of the most vulnerable will have a chance to protect themselves. A vaccine is on its way. And it’s a grim thing to say, but already 3.13% of the United States has tested positive, with the likely true infection rate closer to 20% and rising quickly. 

It may not feel like it, but this wave is the climax. We are eight months in and more than halfway home, but we need to hold things together until the tide turns. 

It can be done. Let’s hope we have the will to do it.



Discuss

What are Examples of Great Distillers?

12 ноября, 2020 - 17:09
Published on November 12, 2020 2:09 PM GMT

A distiller (name based on https://distill.pub/) is a writer who can explain a complex topic, one with a lot of research debt, so that knowledgeable undergraduates in the subject can get a good intuition and picture of the subject. I'm looking for examples of people you consider great distillers in a field in which you have advanced knowledge. For example, Terrence Tao is a great distiller of Mathematics, and Scott Aaronson is a great distiller of Complexity Theory and Quantum Computing. What are your favorite distillers, and why?

I'm asking because I'm trying to improve my own distilling skills, and studying/stealing from the masters is a great way to get better. Only issue is that you need to know the masters. 



Discuss

Any work on honeypots (to detect treacherous turn attempts)?

12 ноября, 2020 - 08:41
Published on November 12, 2020 5:41 AM GMT

I know the idea of making a "honeypot" to detect when an AI system would attempt a treacherous turn if given the opportunity has been discussed (e.g. IIRC, in Superintelligence).  But is there anyone actually working on this?  Or any work that's been published?



Discuss

Does there exist a detailed Bayesian COVID tracker?

12 ноября, 2020 - 08:06
Published on November 12, 2020 5:06 AM GMT

[bounty: $100 for recommending a tool that I use for more than two weeks]

I would love (and happily pay for) a piece of software that I could tell (a) a bunch of recent interactions between people ("Alice, masked, spent 3 hours, indoors, distanced, with Bob, unmasked, on Nov 1"), and (b) a bunch of evidence about their health over time ("Dolores tested negative on Nov 3"), and make queries about how likely various people are to have COVID.

I'd want it to be capable of complicated inferences like "Alice met with Bob yesterday; Bob met with Charlie 4d ago; Charlie separately met with Dolores that same day. Dolores just tested negative; given that, Alice is now less likely to be incubating COVID."

Ideally, it would take into account things like contagiousness-over-time profiles, and incubation periods, and asymptomatic cases, and how all those things differ between people, and tests' false negative rates -- but I realize that's a lot to ask.

Some non-solutions:

  • https://microcovid.org is great at what it does, but what it does is analyze individual activities, not make inferences between people and across time.
  • ^ The associated MicroCOVID spreadsheet does better on this front, but (AFAICT) doesn't capture correlations between people's risk levels, or make inferences like "Zelda tested negative, therefore all the microcovids she inflicted over the last couple weeks should be somewhat discounted."
  • Privacy-conscious COVID tracking apps can't offer the level of sophistication I want. I want to be able to account for masked-ness and ventilation, which flatly isn't captured by "How many pings did Alice's phone hear from Bob's?"


Discuss

A Correspondence Theorem in the Maximum Entropy Framework

12 ноября, 2020 - 01:46
Published on November 11, 2020 10:46 PM GMT

Classical mechanics didn't work any less well once we discovered quantum, Galilean relativity and Newtonian gravity didn't work any less well once we discovered special and general relativity, etc. This is the correspondence principle, aka Egan's Law: in general, to the extent that old models match reality, new models must reproduce the old.

This sounds like it should be a Theorem, not just a Law - a "correspondence theorem".

This post presents such a theorem within one particular framework - maximum entropy. It's not the theorem I'd eventually like - even my correspondence theorem from last month has more substance to it. But this one is simple, intuitive, and both its strong points and shortcomings help to illustrate what I want out of a correspondence theorem. (In fact, I wrote down this one before the other correspondence theorem, and the shortcomings of this theorem partially inspired that one.)

Background: Principle of Maximum Entropy

By "maximum entropy framework", I mean roughly the technique used by Jaynes - see e.g. the widget problem in Logic of Science, page 440. If you've seen the principle of maximum entropy in the context of statistical mechanics, it's the same thing. I expect a lot of people who are otherwise familiar with maximum entropy distributions are not familiar with this particular framework (especially outside of a physics context), so a bit of background is in order.

We'll start with the widget problem: a company produces red, green and blue widgets, and for inventory purposes we wish to predict how many of each color will be sold on a given day. Based on sales aggregated over the past year, the average sales per day are roughly 40 green, 20 red, and 10 blue. Given only this information, what distribution should we assign to tomorrow's sales?

In a high-school statistics class, the answer would be "insufficient information" - the averages are not enough info to figure out the whole distribution. But this isn't a high-school statistics class. We're Bayesians, we've received relevant information, we have to update somehow.

Jaynes argues that the canonically-correct way to do this is maximum entropy: we find the model M.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  which has the highest possible entropy, subject to the constraints E[#green|M]=40, E[#red|M]=20, and E[#blue|M]=10. A couple ways to interpret this idea:

  • The classical argument from stat mech: the vast majority of distributions are very close to the maximum entropy distribution. Equivalently, we implicitly have a (non-normalizable) uniform prior over distribution-space which we're updating (via Bayes' rule) based on the expections.
  • The argument from information: maximizing entropy means minimizing the information assumed in the model. By maximizing entropy subject to the expectation constraint, we're accounting for the expectation but assuming as little as possible aside from that (in an information-theoretic sense).

Regardless of how we interpret it, the math works out the same. If our variable is X and our constraints are E[fi(X)|M]=μi, then the maximum entropy distribution is

P[X|M]=1Zeλ.(f(x)−μ)dx

where

Z=minλ∫xeλ.(f(x)−μ)dx

and λ is the minimizing argument. You can find derivations and discussions and all that in any stat mech book, or in Jaynes. For the widget problem above, X would be the colors of each widget ordered in a day, fgreen(X) would be the number of green widgets ordered, and the constraints would say E[fgreen(X)|M]=40, etc. To compute the λ's, we'd evaluate the integral analytically and then minimize (see Jaynes).

Anyway, the main point here is that we can now "update" on certain kinds of information by adding constraints to our maximum entropy problem. For instance, if we find out that the daily variance in green widget sales is 10, then we'd add a constraint saying E[(fgreen(X)−40)2|M′]=10. Our maximum entropy distribution would then have one additional λ, and an additional term ((fgreen(x)−40)2−10) in the exponent. All written out, we'd go from

P[X=x|M]=1Zeλgreen(fgreen(x)−40)+λred(fred(x)−20)+λblue(fblue(x)−10)dx

to

P[X=x|M′]=1Z′eλ′green(fgreen(x)−40)+λ′red(fred(x)−20)+λ′blue(fblue(x)−10)+λ′var((fgreen(x)−40)2−10)dx

... and we'd have to solve the modified minimization problem to find λ′ and Z′.

Conceptual takeaway: rather than updating on individual data points, in this framework we're given a sequence of summaries of "features" of the dataset, of the form "expectation of fi(X) is μi" (found by e.g. computing the average of fi over a large data set). Each such feature becomes a new constraint in an optimization problem. This turns out to be equivalent to a Bayesian update in situations where a Bayesian update makes sense, but is more general - roughly speaking, it can work directly with virtual evidence updates.

Correspondence Theorem for Maximum Entropy Updates

On to the interesting part.

Let's imagine that two analysts are (separately) building maximum-entropy models of some data. They each query the giant database for average values of certain features - f1(X) for the first analyst, f2(X) for the second. They end up with two models:

  • M1 is the maximum-entropy model with features E[f1(X)|M1]=μ1
  • M2 is the maximum-entropy model with features E[f2(X)|M2]=μ2

We'll assume that both of these are "correct", in the sense that the average values μactually do match the data.

Let's say that model 2 is "better" than model 1, in the sense that it has better predictive power on the real data-generating process D:  E[lnP[X|M_1]|D]">E[lnP[X|M2]|D]>E[lnP[X|M1]|D]. (This is proportional to the average number of bits used by each model to encode a data point X from D.) So,  the analysts' boss plans to just use model 2. But let's stretch the story to AI-alignment-style concerns: what if model 2 is using some weird ontology? What if the things the company cares about are easy to express in terms of the features f1, but hard to express in terms of the features f2?

Now for the claim.

We have two possibilities. Either:

  • We can construct a third model M′ which had strictly better predictive power than M2, OR
  • The features E[f1(X)|M1]=μ1 are already implied by M2; those features are already "in there" in some sense.

The proof will show the sense in which this is true.

Proof

The obvious thing to do is combine the two models into a single maximum-entropy model M' with both the features  E[f1(X)|M′]=μ1 and  E[f2(X)|M′]=μ2. How does the predictive power of this model look?

For maximum-entropy models in general, the predictive power E[lnP[X|M]|D] is has a simple expression, assuming the values μ are correct for D (i.e. E[f(X)|D]=μ):

E[lnP[X|M]|D]=∫x[λ.(f(x)−μ)−lnZ]p[x|D]dx=λ.(μ−μ)−lnZ=−lnZ

... so it's just the negative log of the normalizer Z. So, M′ has higher predictive power than M2 if-and-only-if Z′<Z2.

Now, recall that Z comes from a minimization problem. Specifically:

Z′=minλ1,λ2∫xeλ1.(f1(x)−μ1)+λ2.(f2(x)−μ2)dx

Z2=minλ2∫xeλ2.(f2(x)−μ2)dx

Key thing to notice: the objective for Z2 is just the objective for Z′ with λ1 set to zero. In other words: the space which model M2 searches for minima is a subset of the space which model M′ searches for minima. Thus, Z′ is always at least as small as Z2; model M′ has predictive power at least as high as M2.

Furthermore, let's assume that the optima in these problems are unique - that's not necessarily the case, but it is usually true in practice. (The objectives are convex, so uniqueness of the minimum can only fail in specific ways - I'll leave the "fun" of working that out as an exercise to the reader.) We know that M′ reduces to M2 when λ1=0; if the optima are unique and Z′=Z2 then that means λ1 is indeed 0, so M′=M2.

... but M′ has to satisfy the constraints E[f1(X)|M′]=μ1. So if M′=M2, then M2 also satisfies those constraints. That's the sense in which the features E[f1(X)|D]=μ1 are "already present" in M2: E[f1(X)|M2]=μ1.

So, we have two cases:

  • M′≠M2: M′ has strictly better predictive power (i.e. E[lnP[X|M_2]|D]">E[lnP[X|M′]|D]>E[lnP[X|M2]|D])
  • M′=M2: the features E[f1(X)|D]=μ1 from model 1 are already implicit in model 2 (i.e. E[f1(X)|M2]=μ1)
What's Really Going On Here?

If we strip away the math, the underlying phenomenon here seems kind of trivial.

The key is that we assume μ is correct for the true data-generation process D. We justify this by imagining that we have some very large number of data points, so law of large numbers kicks in and we can correctly estimate averages of our features. We're not just collecting noisy data points; we're directly learning facts about the true distribution, and we're learning those facts with perfect certainty.

So our theorem is saying something like... two models both contain some facts about the (observable) true distribution. Either:

  • we can combine them into a strictly better model which contains all the facts from both, OR
  • all the facts from one model are already contained in the other, and the combined model makes the same predictions as the "better" original model.

(A warning, however: this intuitive story is not perfect. Even if the combined model makes the same predictions as one of the original models about the data, it can still update differently as we learn new facts.)

Is This Trivial?

Let's go back to the starting point: it all adds up to normality. New models need to reproduce the old models in all the places where the old models worked - otherwise the new models are strictly suboptimal.

The obvious-but-trivial formalization of this is that new models have to make the same predictions about the data as the old models, in all the places where the old models predicted correctly. Corollary: any features (i.e. functions) of the data correctly predicted by the old models must also be correctly predicted by the new models.

... and that's basically what we've proven. Within the maximum entropy framework, any features of the data (specifically long-run average values) correctly predicted by an "old" model must also be correctly predicted by a "new" model, else the new model is strictly suboptimal. So in that sense, it seems pretty trivial.

However, there are two senses in which it's nontrivial. First, in the case where the new model incorrectly predicts some feature-values μ encoded in the old model, we've explicitly constructed a new model which outperforms the old. It's even a pretty simple, clean model - just another maximum entropy model.

Second, even the "trivial" idea that new models must make the same predictions about the data in places where the old model was right can cover some pretty nontrivial cases, because "features" of the data distribution can be pretty nontrivial. For instance, we can have a whole class of features of the form E[gi(X1)gj(X2)]=E[gi(X1)]E[gj(X2)]. With infinitely many such constraints, we can encode independence of X1 and X2. After all, independence of two observed variables is a property of the data distribution, so it's something we can use as a "feature". Likewise for conditional independence. (Of course, once we get into infinite-feature territory we do need to be more careful about applying the theory to real, finite data sets...)



Discuss

[Link] Digital Democracy Is Within Reach

12 ноября, 2020 - 01:12
Published on November 11, 2020 10:12 PM GMT

EPISODE SUMMARY

Imagine a world where every country has a digital minister and technologically-enabled legislative bodies. Votes are completely transparent and audio and video of all conversations between lawmakers and lobbyists are available to the public immediately. Conspiracy theories are acted upon within two hours and replaced by humorous videos that clarify the truth. Imagine that expressing outrage about your local political environment turned into a participatory process where you were invited to solve that problem and even entered into a face to face group workshop. Does that sound impossible? It’s ambitious and optimistic, but that's everything that our guest this episode, Audrey Tang, digital minister of Taiwan, has been working on in her own country for many years. Audrey’s path into public service began in 2014 with her participation in the Sunflower Movement, a student-led protest in Taiwan’s parliamentary building, and she’s been building on that experience ever since, leading her country into a future of truly participatory digital democracy. 



Discuss

Learning Normativity: A Research Agenda

12 ноября, 2020 - 00:59
Published on November 11, 2020 9:59 PM GMT

(Related to Inaccessible Information, Learning the Prior, and Better Priors as a Safety Problem. Builds on several of my alternate alignment ideas.)

 

I want to talk about something which I'll call learning normativity. What is normativity? Normativity is correct behavior. I mean something related to the fuzzy concept humans convey with the word "should". I think it has several interesting features:

  • Norms are the result of a complex negotiation between humans, so they shouldn't necessarily be thought of as the result of maximizing some set of values. This distinguishes learning normativity from value learning.
  • A lot of information about norms is present in the empirical distribution of what people actually do, but you can't learn norms just by learning human behavior. This distinguishes it from imitation learning.
  • It's often possible to provide a lot of information in the form of "good/bad" feedback. This feedback should be interpreted more like approval-directed learning rather than RL. However, approval should not be treated as a gold standard.
  • Similarly, it's often possible to provide a lot of information in the form of rules, but rules are not necessarily 100% true; they are just very likely to apply in typical cases.
  • In general, it's possible to get very rich types of feedback, but very sparse: humans get all sorts of feedback, including not only instruction on how to act, but also how to think.
  • Any one piece of feedback is suspect. Teachers can make mistakes, instructions can be wrong, demonstrations can be imperfect, dictionaries can contain spelling errors, reward signals can be corrupt, and so on.
Example: Language Learning

 A major motivating example for me is how language learning works in humans. There is clearly, to some degree, a "right way" and a "wrong way" to use a language. I'll call this correct usage.

One notable feature of language learning is that we don't always speak, or write, in correct usage. This means that a child learning language has to distinguish between mistakes (such as typos) and correct usage. (Humans do sometimes learn to imitate mistakes, but we have a notion of not doing so. This is unlike GPT systems learning to imitate the empirical distribution of human text.)

This means we're largely doing something like unsupervised learning, but with a notion of "correct"/"incorrect" data. We're doing something like throwing data out when it's likely to be incorrect.

A related point is that we are better at recognizing correct usage than we are at generating it. If we say something wrong, we're likely able to correct it. In some sense, this means there's a foothold for intelligence amplification: we know how to generate our own training gradent.

Another fascinating feature of language is that although native speakers are pretty good at both recognizing and generating correct usage, we don't know the rules explicitly. The whole field of linguistics is largely about trying to uncover the rules of grammar.

So it's impossible for us to teach proper English by teaching the rules. Yet, we do know some of the rules. Or, more accurately, we know a set of rules that usually apply. And those rules are somewhat useful for teaching English. (Although children have usually reached fluency before the point where they're taught explicit English grammar.)

All of these things point toward what I mean by learning normativity:

  • We can tell a lot about what's normative by simply observing what's common, but the two are not exactly the same thing.
  • A (qualified) human can usually label an example as correct or incorrect, but this is not perfect either.
  • We can articulate a lot about correct vs incorrect in the form of rules; but the rules which we can articulate never seem to cover 100% of the cases. A linguist is a lot like a philosopher: taking a concept which is understood at an intuitive level (which a great many people can fluently apply in the correct manner), but struggling for years to arrive at a correct technical definition which fits the intuitive usage.

In other words, the overriding feature of normativity which I'm trying to point at is that nothing is ever 100%. Correct grammar is not defined by any (known) rules or set of text, nor is it (quite) just whatever humans judge it is. All of those things give a lot of information about it, but it could differ from each of them. Yet, on top of all that, basically everyone learns it successfully. This is very close to Paul's Inaccessible Information: information for which we cannot concoct a gold-standard training signal, but which intelligent systems may learn anyway.

Another important feature of this type of learning: there is a fairly clear notion of superhuman performance. Even though human imitation is most of the challenge, we could declare something superhuman based on our human understanding of the task. For example, GPT is trained exclusively to imitate, so it should never exceed human performance. Yet, we could tell if a GPT-like system did exceed human performance:

  • Its spelling and grammar would be immaculate, rather than including humanlike errors;
  • its output would be more creative and exciting to read than that of human authors;
  • when good reasoning was called for in a text, its arguments would be clear, correct, and compelling;
  • when truth was called for, rather than fiction, its general knowledge would be broader and more accurate than a human's.

It seems very possible to learn to be better than your teachers in these ways, because humans sometimes manage to do it.

Learning in the Absence of a Gold Standard

In statistics and machine learning, a "gold standard" is a proxy which we treat as good enough to serve as ground truth for our limited purposes. The accuracy of any other estimate will be judged by comparison to the gold standard. This is similar to the concept of "operationalization" in science.

It's worth pointing out that in pure Bayesian terms, there is nothing especially concerning about learning in the absence of a gold standard. I have data X. I want to know about Y. I update on X, getting P(Y|X). No problem!

However, that only works if we have the right prior. We could try to learn the prior from humans, which gets us 99% of the way there... but as I've mentioned earlier, human imitation does not get us all the way. Humans don't perfectly endorse their own reactions.

(Note that whether "99% of the way" is good enough for AI safety is a separate question. I'm trying to define the Big Hairy Audacious Goal of learning normativity.)

Actually, I want to split "no gold standard" into two separate problems.

  1. There's no type of feedback which we can perfectly trust. If humans label examples of good/bad behavior, a few of those labels are going to be wrong. If humans provide example inferences for learning the prior, some of those example inferences are (in a very real sense) wrong. And so on.
  2. There's no level at which we can perfectly define the loss function. This is a consequence of no-perfect-feedback, but it's worth pointing out separately.
No Perfect Feedback

I think I've made the concept of no-perfect-feedback clear enough already. But what could it mean to learn under this condition, in a machine-learning sense?

There are some ideas that get part of the way:

  • Jeffrey updates let us update to a specific probability of a given piece of feedback being true, rather than updating to 100%. This allows us to, EG, label an image as 90%-probable cat, 9%-probable dog, 1% broad distribution over other things.
    • This allows us to give some evidence, while allowing the learner to decide later that what we said was wrong (due to the accumulation of contrary evidence).
    • This seems helpful, but we need to be confident that those probability assignments are themselves normatively correct, and this seems like it's going to be a pretty big problem in practice.
  • Virtual evidence is one step better: we don't have to indicate what actual probability to update to, but instead only indicate the strength of evidence.
    • Like Jeffrey updates, this means we can provide strong evidence while still allowing the system to decide later that we were wrong, due to the accumulation of contradicting evidence.
    • Unlike Jeffrey updates, we don't have to decide what probability we should update to, only the direction and strength of the evidence.
  • Soft labels in machine learning provide a similar functionality. In EM learning, a system learns from its own soft labels. In LO-shot learning, a system leverages the fact that soft labels contain more information than hard labels, in order to learn classes with less than one examples per class.

However, although these ideas capture weak feedback in the sense of less-than-100%-confidence feedback, they don't capture the idea of interpretable feedback:

  • A system should ideally be able to learn that specific types of feedback are erroneous, such as corrupted-feedback cases in reinforcement learning. A system might learn that my feedback is lower quality right before lunch, for example.
  • A system should be able to preserve the overall meaning of a label despite an ontology shift. For example, deciding that fruit/vegetable is not a useful taxonomic or culinary distinction should not destroy the information gained from such labels. Or, if human feedback includes formal English grammar, that information should not be totally discarded if the system realizes that the rules don't fully hold and the supposed grammatical categories are not as solid as claimed.
  • Feedback should be associated with a cloud of possible interpretations. When humans say "weird", we often mean "unusual", but also sometimes mean "bad". When humans say we don't understand, we often really mean we don't endorse. A system should, ideally, be able to learn a mapping from the feedback humans actually give to what they really mean. This is, in any case, the general solution to the previous bullet points.

But "learning a mapping from what feedback is given to what is meant" appears to imply that there is no fixed loss function for machine learning to work on, which would be a serious challenge. This is the subject of my point #2 from earlier:

No Perfect Loss Function

We can frame (some) approaches to the value specification problem in a sequence of increasingly sophisticated approaches (similar to the hierarchy I discussed in my "stable pointers to value" posts (1,2,3)):

  1. Direct specification of the value function. This fails because we don't know what values to specify, and expect anything we can write down to be highly Goodhart-able.
  2. Learning human values. We delegate the specification problem to the machine. But, this leaves us with the meta problem of specifying how to learn. Getting it wrong can lead to wireheading and human manipulation. Even in settings where this is impossible, we face Stuart's no-free-lunch results.
  3. Learning to learn human values. Stuart suggests that we can get around the no-free-lunch results by loading the right prior information into the learner, in keeping with his more general belief that Bayesian reasoning is fine as long as it has the right prior information. But this seems to go back to the problem of learning the human prior. So we could apply a learning approach again here. But then we again have a specification problem for the loss function for this learning...
  4. ...

You get the picture. We can keep pushing back the specification problem by learning, learning to learn, learning to learn to learn... Each time we push the problem back, we seem to gain something, but we're also stuck with a new specification problem at the meta level.

Could we specify a way to learn at all the levels, pushing the problem back infinitely? This might sound absurd, but I think there are ways to accomplish this. We need to somehow "collapse all the levels into one learner" -- otherwise, with an infinite number of levels to learn, there would be no hope. There needs to be very significant generalization across levels. For example, Occam's razor is a good starting rule of thumb at all levels (at least, all levels above the lowest). However, because Occam is not enough, it will need to be augmented with other information.

Recursive reward modeling is similar to the approach I'm sketching, in that it recursively breaks down the problem of specifying a loss function. However, it doesn't really take the same learning-to-learn approach, and it also doesn't aim for a monolithic learning system that is able to absorb information at all the levels.

I think of this as necessary learning-theoretic background work in order to achieve Stuart Armstrong's agenda, although Stuart may disagree. The goal here is to provide one framework in which all the information Stuart hopes to give a system can be properly integrated.

Note that this is only an approach to outer alignment. The inner alignment problem is a separate, and perhaps even more pressing, issue. The next section could be of more help to inner alignment, but I'm not sure this is overall the right path to solve that problem.

Process-Level Feedback

Sometimes we care about how we get the answers, not just what the answers are. That is to say, sometimes we can point out problems with methodology without being able to point to problems in the answers themselves. Answers can be suspect based on how they're computed.

Sometimes, points can only be effectively made in terms of this type of feedback. Wireheading and human manipulation can't be eliminated through object-level feedback, but we could point out examples of the wrong and right types of reasoning.

Process-level feedback blurs the distinction between inner alignment and outer alignment. A system which accepts process-level feedback is essentially exposing all its innards as "outer", so if we can provide the appropriate feedback, there should be no separate inner alignment problem. (Unfortunately, it must be admitted that it's quite difficult to provide the right feedback -- due to transparency issues, we can't expect to understand all models in order to give feedback on them.)

I also want to emphasize that we want to give feedback on the entire process. It's no good if we have "level 1" which is in charge of producing output, and learns from object-level feedback, but "level 2" is in charge of accepting process-level feedback about level 1, and adjusting level 1 accordingly. Then we still have a separate inner alignment problem for level 2.

This is the same kind of hierarchy problem we saw in "No Perfect Loss Function". Similarly, we want to collapse all the levels down. We want one level which is capable of accepting process-level feedback about itself.

Learning from Process-Level Feedback

In a Bayesian treatment, process-level feedback means direct feedback about hypotheses. In theory, there's no barrier to this type of feedback. A hypothesis can be ruled out by fiat just as easily as it can be ruled out by contradicting data. 

However, this isn't a very powerful learning mechanism. If we imagine a human trying to inner-align a Bayesian system this way, the human has to find and knock out every single malign hypothesis. There's no generalization mechanism here.

Since detecting malign hypotheses is difficult, we want the learning system to help us out here. It should generalize from examples of malign hypotheses, and attempt to draw a broad boundary around malignancy. Allowing the system to judge itself in this way can of course lead to malign reinterpretations of user feedback, but hopefully allows for a basin of attraction in which benevolent generalizations can be learned.

For example, in Solomonoff induction, we have a powerful hierarchical prior in the distribution on program prefixes. A program prefix can represent any kind of distribution on hypotheses (since a program prefix can completely change the programming language to be used in the remainder of the program). So one would hope that knocking out hypotheses would reduce the probability of all other programs which share a prefix with that hypothesis, representing a generalization "this branch in my hierarchical prior on programs seems iffy". (As a stretch goal, we'd also like to update against other similar-looking branches; but we at least want to update against this one.)

However, no such update occurs. The branch loses mass, due to losing one member, but programs which share a prefix with the deleted program don't lose any mass. In fact, they gain mass, due to renormalization.

It seems we don't just want to update on "not this hypothesis"; we want to explicitly model some sort of malignancy judgement (or more generally, a quality-of-hypothesis judgement), so that we can update estimations of how to make such judgements. However, it's difficult to see how to do so without creating a hierarchy, where we get a top level which isn't open to process-level feedback (and may therefore be malign).

Later, I'll present a Bayesian model which does have a version of generalization from feedback on hypotheses. But we should also be open to less-Bayesian solutions; it's possible this just isn't captured very well by Bayesian learning.

Prospects for Inner Alignment

I view this more as a preliminary step in one possible approach to inner alignment, rather than "a solution to inner alignment".

If (a) you want to learn a solution to inner alignment, rather than solving it ahead of time, and (b) you agree with the framing of process-level feedback / feedback on hypotheses, and (c) you agree that we can't rely on a trusted meta-level to take process-level feedback, but rather need to accept feedback on "the whole process", then I think it stands to reason that you need to specify what it means to learn in this setting. I view the preceding sections as an argument that there's a non-obvious problem here.

For example, Stuart Armstrong has repeatedly argued that Bayesian learners can overcome many safety problems, if only they're given the right prior information. To the extent that this is a claim about inner alignment (I'm not sure whether he would go that far), I'm claiming that we need to solve the problem of giving process-level feedback to a Bayesian learner before he can make good on his claim; otherwise, there's just no known mechanism to provide the system with all the necessary information.

Anyway, even if we accomplish this step, there are still several other obstacles in the way of this approach to inner alignment.

  1. Transparency: It's unrealistic that humans can provide the needed process-level feedback without powerful transparency tools. The system needs to correctly generalize from simpler examples humans provide to the more difficult examples which a human can't understand. That will be difficult if humans can only label very very simple examples. 
  2. Basin of Attraction: Because the system could use malign interpretations of human feedback, it's very important that the system start out in a benign state, making trusted (if simplistic) generalizations of the feedback humans can provide.
  3. Running Untrusted Code: A straightforward implementation of these ideas will still have to run untrusted hypotheses in order to evaluate them. Giving malign hypotheses really low probability doesn't help if we still run really low-probability hypotheses, and the malign hypotheses can find an exploit. This is similar to Vanessa's problem of non-Cartesian daemons.

Regardless of these issues, I think it's valuable to try to solve the part of the problem I've outlined in this essay, in the hope that the above issues can also be solved.

Summary of Desiderata

Here's a summary of all the concrete points I've made about what "learning normativity" should mean. Sub-points are not subgoals, but rather, additional related desiderata; EG, one might significantly address "no perfect feedback" without significantly addressing "uncertain feedback" or "interpretable feedback". 

  1. No Perfect Feedback: we want to be able to learn with the possibility that any one piece of data is corrupt.
    1. Uncertain Feedback: data can be given in an uncertain form, allowing 100% certain feedback to be given (if there ever is such a thing), but also allowing the system to learn significant things in the absence of any certainty.
    2. Interpretable Feedback: ideally, we want rich hypotheses about the meaning of feedback, which help the system to identify corrupt feedback, and interpret the information in imperfect feedback.
  2. No Perfect Loss Function: we don't expect to perfectly define the utility function, or what it means to correctly learn the utility function, or what it means to learn to learn, and so on. At no level do we expect to be able to provide a single function we're happy to optimize.
    1. Learning at All Levels: Although we don't have perfect information at any level, we do get meaningful benefit with each level we step back and say "we're learning this level rather than keeping it fixed", because we can provide meaningful approximate loss functions at each level, and meaningful feedback for learning at each level. Therefore, we want to be able to do learning at each level.
    2. Between-Level Sharing: Because this implies an infinite hierarchy of levels to learn, we need to share a great deal of information between levels in order to learn meaningfully.
  3. Process Level Feedback: we want to be able to give feedback about how to arrive at answers, not just the answers themselves.
    1. Whole-Process Feedback: we don't want some segregated meta-level which accepts/implements our process feedback about the rest of the system, but which is immune to process feedback itself. Any part of the system which is capable of adapting its behavior, we want to be able to give process-level feedback about. 
    2. Learned Generalization of Process Feedback: we don't just want to promote or demote specific hypotheses. We want the system to learn from our feedback, making generalizations about which kinds of hypotheses are good or bad.
Initial Attempt: Recursive Quantilizers

I'll give an initial stab at solving these problems, as a proof-of-concept. (Otherwise I fear the above desiderata may look like they're simply impossible.)

This is a formalization of the recursive quantilizers idea which I described previously.

A quantilizer is a mild optimizer which avoids catastrophic outcomes with high probability, averting Goodhart's Law. It accomplishes this by refusing to 100% trust its value function. This seems like a good building block for us, since it significantly addresses "no perfect loss function."

A quantilizer requires a value function, V.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , which it mildly optimizes, and a safe distribution, S, which is a distribution over outputs which is assumed to have a low probability of catastrophic outcomes. It also requires an optimization parameter, p. The quantilizer Q(V,S,p) mildly optimizes by randomly taking the top p% of outputs from S, as ranked by V. (p can be derived from an estimate of the probability of catastrophe in S, combined with a level of tolerance for catastrophic risk.)

In line with our learning at all levels desiderata, we want to learn V, S, and p. Moreover, we'll produce all of these via quantilization, and recursively learn the inputs to all of those quantilizers as well.

In order to get between-level sharing, we will actually learn one big quantilization process which is equivalent to all those infinitely many levels.

Here's the setup.

In order to unify all the levels, I'll have quantilizers select unified type for all arguments (UTAAs): a tuple (V,S,p,o) where o is the object-level output we want out of the whole system (EG plans, strings of text, whatever). So now we understand the quantilization function Q(a,b,c) to take 3 UTAAs. It will take the value function from the first, the safe distribution from the second, and the optimization parameter from the third. It then produces a UTAA. The user can take the object-level output of that UTAA as the result.*

 

What we want is a notion of stationary distribution on UTAAs. This is a distribution D∈ΔUTAA such that if we sample 3 elements from D, and feed those three elements into Q, the distribution on outputs of Q is just D again.

In effect, a stationary distribution D is all quantilized out; we can quantilize again, but it won't make any difference. So, we can just sample from D and take the object-level output from the resulting UTAA.

This is very similar to the notion of stationary distribution for markov chains. The difference is that we have a markov tree instead of a chain.

So, just as finding the stationary distribution of a markov-chain monte carlo algorithm is similar to running the algorithm long enough that running it any longer is pointless, finding a stationary distribution for recursive quantilizers is like building a learning-to-learn hierarchy that's large enough that we don't get anything by going any further.

That's all well and good, but how are we supposed to find a stationary distribution we like? We can't just take a fixed point and hope it's useful and benign; there'll be lots of crazy fixed points. How do we steer this thing toward desirable outcomes?

Parameterizing Stationary Distributions

If a markov chain has multiple stationary distributions, we can parameterize them through a distribution on starting states. A distribution on starting states just means a probability of picking any one starting element, so this relationship is completely linear: by interpolating between different starting elements, we interpolate between the stationary distributions which those starting elements eventually reach.

We can similarly parameterize stationary distributions via initial distributions. However, we don't get the same linearity. Because we have to select many starting elements for the 3n inputs to an n-level tree, and we select those elements as independent draws from the initial distribution, we can get nonlinear effects. (This is just like flipping a biased coin (with sides labelled 1 and 0) twice and sending the two results through an XOR gate: the probability of getting a 1 out of the XOR is nonlinear in the bias.)

This means we can't reduce our uncertainty over initial distributions to uncertainty over initial UTAA. (There may be some other tricks we can use to simplify things, but they probably aren't worth exploring in this post.)

So we can parameterize our uncertainty over stationary distributions via uncertainty over initial distributions. But, this is just turning uncertainty over one kind of distribution into uncertainty over another. What's the benefit of this?

  1. The set of stationary distributions is hard to know, but the set of possible initial distributions is clear. So this gives us an easy-to-work-with representation of stationary distributions.
  2. We know every stationary distribution is in the set, since we can start out in a stationary distribution.
  3. We can easily define the mapping from initial distributions to stationary distributions; it's just the stationary distribution you get by running things long enough, sampling from the given initial distribution. (Of course we may not get to any stationary distribution at all, but we can formally solve this by introducing a cutoff in program size, or through other devices.)
  4. We can therefore define learning: an update against a UTAA produces an update against initial distributions which produce that UTAA.

This is, of course, a very computationally intensive procedure. Unless better algorithms are found, the only way we can update is by producing a large quantilization tree (which we hope has converged) and running it many times to evaluate the outcome of a given initial distribution.

However, the resulting system has many marvelous properties. If we want to give feedback at any level in the hierarchy, we can convert this into feedback about UTAAs, and update our prior over initial distributions accordingly. For example:

  • We can label outputs as bad/incorrect by updating against all UTAAs which include those outputs.
  • We can give evidence about the value function over outputs, and convert this to an update about UTAAs based on the value function they contain. So, we can do value learning.
  • We can learn about the safe distribution over outputs. For example, one proposal for finding safe distributions is to model human behavior. Data-sets of human behavior could induce updates over UTAAs by checking how well a UTAA's proposed safe distribution fits the data.
  • At the same time, we can learn about the loss function by which we score safe distributions. If we have an update about this loss function, we translate it to an update about UTAAs by checking how a UTAA's value function examines the safe distribution of another UTAA when scoring it. Updating UTAAs based on this will, effectively, change the way safe distributions are selected in the second-to-last quantilization step. (Of course, it really changes all the quantilization steps, but when we anchor ourselves in how changes to the initial distribution alter our distribution on actual outputs, the easiest way to understand what's going on is to see this as a change to the second-to-last step.)
  • Similarly, we can learn about the loss function by which we score loss functions. So in the same system, we can directly learn from feedback, we can do value learning, and we can do meta-value-learning where we learn how to interpret evidence in value-learning.
  • Similarly, we can learn the safe distribution for meta-loss functions, the safe distribution over safe distributions, and on and on.
  • We can also allow process-level feedback by enabling UTAA value functions to examine the source code of other UTAAs (e.g. looking at how those UTAAs compute their value functions and safe distributions). We can teach UTAAs to detect suspicious code in other UTAAs and rate those UTAAs very poorly.

Wouldn't it be fascinating to be able to provide all those types of learning in one system?

Let's examine how we did in terms of the criteria which I gave.

  1. No Perfect Feedback: This wasn't addressed directly, but might be indirectly addressed via #2.
    1. Uncertain Feedback: I didn't specify any way to provide uncertain feedback, but it would be easy enough to do so.
    2. Interpretable Feedback: I think this is a big failing of the approach as it stands. 
  2. No Perfect Loss Function: Very significantly addressed by quantilization.
    1. Learning at All Levels: Very significantly addressed by the recursive quantilization setup.
    2. Between-Level Sharing: Significantly addressed. I didn't really talk about how this works, but I think it can work well in this setup.
  3. Process Level Feedback: Significantly addressed. The process which creates a given output is essentially the big tree that we sample. We can give any kind of feedback about that tree that we want, including any computations which occur inside of the value functions or safe distributions or elsewhere.
    1. Whole-Process Feedback: Somewhat addressed. There is a question of whether the initial distribution constitutes a meta-level beyond the reach of process-level feedback.
    2. Learned Generalization of Process Feedback: Significantly addressed. Process-level feedback can be given directly, as evidence against a specific UTAA, in which case there will be some generalization as we update against anything which thought that UTAA was a good idea. Or it could be given more indirectly, as general (level-independent) information about how value functions should judge UTAA. In that case there may be more generalization, as we update on how to judge UTAAs generally. (Or maybe not? An equivalence theorem about these different types of feedback would be nice.)

I think the most significant problem here is the lack of interpretable feedback. When we give feedback about something, we have to figure out how to translate it into an update about UTAAs (which can then be translated into an update about initial distributions). This update is fixed forever. This means the updates we make to the system aren't really tied to the value functions which get learned. So, for example, learning better value-learning behavior doesn't directly change how the system responds to updates we give it about the value function. (Instead, it may change how it interprets some other set of data we give it access to, as input to UTAAs.) This makes the "learning-to-learn" aspect of the system somewhat limited/shallow.

The second most significant concern here is whether we've really achieved whole-process feedback. I was initially optimistic, as the idea of stationary distributions appeared to collapse all the meta levels down to one. However, now I think there actually is a problem with the highest level of the tree. The initial distribution could be predominantly malign. Those malign UTAAs could select innocent-looking (but deadly) UTAAs for the next generation. In this way, the malign code could disappear, while achieving its goals by introducing subtle bugs to all subsequent generations of UTAAs.

The way I've specified things, trying to update against these malign UTAAs wouldn't work, because they're already absent in the stationary distribution. 

Of course, you could directly update against them in the initial distribution. This could eliminate select malign UTAAs. The problem is that this kind of process feedback loses generalizability again. Since it's the top level of the tree, there's nothing above it which is selecting it, so we don't get to update against any general selection behaviors which produced the malign UTAAs.

The only way out of this I see at present is to parameterize the system's beliefs directly as a probability distribution over stationary distributions.. You can think of this as assuming that the initial distribution is already a stationary distribution. This way, when we update against malign UTAAs at the beginning of the process, we update against them occurring at any point in the process, which means we also update against any UTAAs which help select malign UTAAs, and therefore get generalization power.

But this seems like an annoyingly non-constructive solution. How are we supposed to work with the set of fixed points directly without iterating (potentially malign) code to find them?

*: Actually, a UTAA should be a compact specification of such  tuple, such as a program or neural network which can output the desired objects. This is necessary for implementation, since EG we can't store V as a big table of values or S as a big table of probabilities. It also will allow for better generalization, and process-level feedback.



Discuss

CHAI Internship Application

12 ноября, 2020 - 00:10
Published on November 11, 2020 9:10 PM GMT

Hi everyone,

I'm the Assistant Director at CHAI and as some of you may know, CHAI is currently accepting applications for our 2021 internship program.

Early deadline is 11/23 for applicants who require an earlier response from us. Our normal deadline is 12/13. 

You can find more information and the application itself here

Please e-mail me at chai-admin@berkeley.edu if you have any questions!



Discuss

How can we lobby to get a vaccine distributed faster?

12 ноября, 2020 - 00:01
Published on November 11, 2020 9:01 PM GMT

Fellow Rats,

Tyler Cowen has argued that we can release vaccines now without compromising phase III trials through randomization. We could thus benefit from the expected value of innoculating more people earlier and of getting an answer sooner. He has proposed two mechanisms.

  1. Randomly distribute treatments and placebos to at risk groups like bus drivers. This seems like a great idea, since bus drivers are in unusual danger and need.

  2. Use a "tie-breaker" design, which is a hybrid of regression discontinuity and randomized control trial. Basically you want to treat some subset of the population, but want an impact assessment. So you randomize only near the cutoff for service. So we could vaccinate some of the most at-risk persons and randomize at the liminal cases, achieving an optimal tradeoff between present benefits and information. The abstract of Owen and Varians article is below.

For some reason, the US is currently implementing neither ideas. Our approach is to stockpile lots of vaccines and wait for a greenlight from a single conventional information-only trial. Cowen is mostly being ignored.

Lobbying

We can lobby the government and Pfizer to change this. The US gov has plenty of avenues for lobbying to force discussion on these ideas. Here are a few, off the top of my head.

  • Tweet at public health experts in the style of 1 day sooner
  • Call our senators, complain about FDA regulations
  • Call our representatives, complain
  • Go to our representatives office and demand to speak to the staff. Show him the paper. Demand a meeting.
  • Call local television stations. Read the paper in detail and prepare a speech. Build publicity
  • Tweet at Donald Trump directly
  • Call the FDA
  • Call think tanks affiliated with party leadership

I am uncertain which body needs to approve such a policy change. It could be mandatable from the white house, require legislation, be mandatable from the FDA, or be entirely under the pharma companies control. The easiest body to pressure is the FDA because they answer to our legislators.

Appendix A: Motivated by customer loyalty plans and scholarship programs, we study tie-breaker designs which are hybrids of randomized controlled trials (RCTs) and regression discontinuity designs (RDDs). We quantify the statistical efficiency of a tie-breaker design in which a proportion Δ of observed subjects are in the RCT. In a two line regression, statistical efficiency increases monotonically with Δ, so efficiency is maximized by an RCT. We point to additional advantages of tie-breakers versus RDD: for a nonparametric regression the boundary bias is much less severe and for quadratic regression, the variance is greatly reduced. For a two line model we can quantify the short term value of the treatment allocation and this comparison favors smaller Δ with the RDD being best. We solve for the optimal tradeoff between these exploration and exploitation goals. The usual tie-breaker design applies an RCT on the middle Δ subjects as ranked by the assignment variable. We quantify the efficiency of other designs such as experimenting only in the second decile from the top. We also show that in some general parametric models a Monte Carlo evaluation can be replaced by matrix algebra.



Discuss

Time in Cartesian Frames

11 ноября, 2020 - 23:25
Published on November 11, 2020 8:25 PM GMT

This is the twelfth and final post in the Cartesian Frames sequence. Read the first post here.

Up until now, we have (in the examples) mostly considered agents making a single choice, rather than acting repeatedly over time.

The actions, environments, and worlds we've considered might be extended over time. For example, imagine a prisoner's dilemma where "cooperating" requires pushing a button every day for five years.

However, our way of discussing Cartesian frames so far would treat "push the button every day for five years" as an atomic action, a single element a∈A.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} . 

Now, will begin discussing how to use Cartesian frames to explicitly represent agents passing through time. Let us start with a basic example.

 

1. Partial Observability

Consider a process where two players, Yosef and Zoe, collaboratively choose a three-digit binary number. Yosef first chooses the first digit, then Zoe chooses the second digit, then Yosef chooses the third digit. The world will be represented by the three-digit number. The Cartesian frame from the perspective of Yosef looks like this:

C0=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝000010000010001011001011000011000011001010001010100110110100101111111101100111111100101110110101⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠.

Here, C0=(A0,E0,⋅0) is a Cartesian frame over W0={000, 001, 010, 011, 100, 101, 110, 111}.

The four possible environments from left to right represent Zoe choosing 0, Zoe choosing 1, Zoe copying the first digit, and Zoe negating the first digit.

The eight possible agents can be broken up into two groups of four. In the top four possible agents, Yosef chooses 0 for the first digit, while in the bottom four, he chooses 1. Within each group, the four possible agents represent Yosef choosing 0 for the third digit, choosing 1 for the third digit, copying the second digit, and negating the second digit.

Consider the three partitions W1, W2, and W3 of W0 representing the first, second and third digits respectively. Wi={w0i,w1i}, where w01={000, 001, 010, 011}, w11={100, 101, 110, 111}, w02={000, 001, 100, 101}, w12={010, 011, 110, 111}, w03={000, 010, 100, 110}, and w13={001, 011, 101, 111}.

Clearly, by the definition of observables, W2 is not observable in C0. But there is still a sense in which this does not tell the whole story. Yosef can observe W2 for the purpose of deciding the first digit, but can't observe W2 for the purpose of deciding the third digit.

There are actually many ways to express this fact, but I want to draw attention to one specific way to express this partial observability: ExternalW1(C0) can observe W2.

Indeed, we have 

ExternalW1(C0)≃C1=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝000010100110000010100111000010101110000010101111000011100110000011100111000011101110000011101111001010100110001010100111001010101110001010101111001011100110001011100111001011101110001011101111⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠.

It may seem counter-intuitive that when you externalize W1, and thus take some control out of the hands of the agent, you actually end up with more possible agents. This is because the agent now has to specify what the third digit is, not only as a function of the second digit, but also as a function of the first digit. The agent could have specified the third digit as a function of the first digit before, but some of the policies would have been identical to each other.

The four possible environments of C1 specify the first two digits, while the 16 possible agents represent all of the ways to have the third digit be a function of those first two digits. It is clear that W2 is observable in C1.

This gives us a generic way to define a type of partial observability:

Definition: Given a Cartesian frame C over W, and partitions V and T of W, we say V is observable in C after time T if V is observable in ExternalT(C).

 

2. Partitions as Time

Built into the above definition is the fact that we are thinking of (at least some) partitions of W as representing time. This makes a lot of sense when we think of W as a set of possible complete world histories. For any given time, this gives a partition where world histories are in the same subset if they agree on the world history up to that point in time.

For example, the above partition W1 was the partition that we got by considering a time after Yosef chooses the first digit, but before Zoe chooses the second digit.

Further, this gives us a sequence of nested partitions, since the partition associated with one time is always a refinement of the partition associated with an earlier time.

Note that this is a multiplicative/updateless view of time. There is also an additive/updateful view of time, in which time is a nested sequence of subsets. In the additive view, possible worlds are eliminated as you pass through time. In the multiplicative view, possible worlds are distinguished from each other as you pass through time. We will focus on the multiplicative view, which I consider better-motivated.

 

3. Nested Subagents

Let C=(A,E,⋅) be a fixed Cartesian frame over a world W. Let T0,⋯,Tn be a sequence of nested partitions of W, with T0={W}, Tn={{w} | w∈W}, and Ti+1 a refinement of Ti.

This gives a nested sequence of multiplicative superagents CTn◃×⋯◃×CT0, where CTi=ExternalTi(C), which follows from the lemma below. 

Lemma: Given a Cartesian frame C over W, if U and V are partitions of W and U is a refinement of V, then ExternalU(C)◃×ExternalV(C).

Proof: Let C=(A,E,⋅), and let u:W→U and v:W→V send each element of W to their part in U and V respectively. Let ExternalU(C)=(A/BU,BU×E,⋅U), where BU={{a′∈A | ∀e∈E,u(a′⋅e)=u(a⋅e)} | a∈A}. Similarly, letExternalV(C)=(A/BV,BV×E,⋅V), where BV={{a′∈A | ∀e∈E,v(a′⋅e)=v(a⋅e)} | a∈A}. Let bU:A→BU and bV:A→BV send each element of A to its part in BU and BV respectively. 

Since U is a refinement of V, there exists a v′:U→V, such that v′∘u=v. Further, we have that BU is a refinement of BV, so there exists a b′V:BU→BV such that b′V∘bU=bV.

It suffices to show there exist three sets X, Y, and Z, and a function f:X×Y×Z→W such that ExternalU(C)≃(X,Y×Z,⋄) and ExternalV(C)≃(X×Y,Z,∙), where ⋄ and ∙ are given by x⋄(y,z)=f(x,y,z) and (x,y)∙z=f(x,y,z).

We will take X to be A/BU and Z to be BV×E. We define Y to be the set of all right inverses to b′V, Y={y:BV→BU | ∀b∈BU, b′V(y(b))=b}. We will let f(x,y,(b,e))=x(y(b))⋅e.

First, we show

ExternalU(C)=(A/BU,BU×E,⋅U)≃(X,Y×Z,⋄).

We define

(g0,h0):(A/BU,BU×E,⋅U)→(X,Y×Z,⋄)

and

(g1,h1):(X,Y×Z,⋄)→(A/BU,BU×E,⋅U)

as follows. Let g0 and g1 be the identity on X=A/BU, and let h0:Y×Z→BU×E be given by h0(y,(b,e))=(y(b),e). Finally, let h1:BU×E→Y×Z be chosen to satisfy h1(b,e)=(y,(b′V(b),e)), where y is such that y(b′V(b))=b, and for b′≠b′V(b), y(b′) is chosen arbitrarily to be any preimage of b′ under b′V.

We have that (g0,h0) is a morphism, because for all x∈A/BU and (y,(b,e))∈Y×Z,

g0(x)⋄(y,(b,e))=f(x,y,(b,e))=x(y(b))⋅e=x⋅U(y(b),e)=x⋅Uh0(y,(b,e)).

Similarly, (g1,h1) is a morphism, because for all x∈X and (b,e)∈BU×E, we have

g1(x)⋅U(b,e)=x⋅U(b,e)=x(b)⋅e=x(y(b′V(b)))⋅e=f(x,y,(b′V(b),e))=x⋄(y,(b′V(b),e))=x⋄h1(b,e),

where y is as given in the definition of h1. Since g0∘g1 and g1∘g0 are both the identity, we have that (g0,h0)∘(g1,h1) and (g1,h1)∘(g0,h0) are both homotopic to the identity, so ExternalU(C)≃(X,Y×Z,⋄).

Next, we show

ExternalV(C)=(A/BV,BV×E,⋅V)≃(X×Y,Z,∙).

We define

(g2,h2):(A/BV,BV×E,⋅V)→(X×Y,Z,∙)

and

(g3,h3):(X×Y,Z,∙)→(A/BV,BV×E,⋅V)

as follows. Let h2 and h3 be the identity on Z=BV×E, and let g3:X×Y→A/BV be given by g3(x,y)=x∘y. To see that x∘y is in A/BV, we need to verify that bV∘x∘y is the identity on BV. Indeed,

bV∘x∘y=b′V∘bU∘x∘y=b′V∘y,

which is the identity on BV. Let g2:A/BV→X×Y be given by g2(q)=(q′,bU∘q), where q′∈A/BU is chosen such that for all b∈BV, q′(bU(q(b)))=q(b), and for b′ not in the image of bU∘q, q′(b′)∈b′. We can do this simultaneously for all inputs of the form bU(q(b)), since bU∘q is injective, since it has a left inverse, b′V.

We have that (g2,h2) is a morphism, because for all q∈A/BV and (b,e)∈Z, we have

g2(q)∙(b,e)=(q′,bU∘q)∙(b,e)=f(q′,bU∘q,(b,e))=q′(bU(q(b)))⋅e=q(b)⋅e=q⋅V(b,e)=h2(q)⋅V(b,e),

where q′ is as in the definition of g2. Similarly, (g3,h3) is a morphism, because for all (x,y)∈X×Y and (b,e)∈BV×E, we have

g3(x,y)⋅V(b,e)=x∘y⋅V(b,e)=x(y(b))⋅e=f(x,y,(b,e))=(x,y)∙(b,e)=(x,y)∙h3(b,e).

Since h3∘h2 and h2∘h3 are both the identity, we have that (g2,h2)∘(g3,h3) and (g3,h3)∘(g2,h2) are both homotopic to the identity, so ExternalV(C)≃(X×Y,Z,∙), completing the proof. □

The sequence CT0,…,CTn represents the agent persisting across time, but each subagent CTi does not really represent a single time-slice of the agent. Instead, CTi represents an agent persisting across time starting at the time Ti.

I think that this is actually the more natural notion. However, if we want to think about an agent persisting across times as a sequence of single times-slices of the agent, we could also do that. Since CTi+1 is a multiplicative subagent of CTi, CTi+1 must have a sister DTi+1 in CTi, so we could consider the sequence DT1,…,DTn.

 

4. Controllables Decrease and Observables Increase Over Time

An interesting fact about these sequences CT0,…,CTn is that controllables decrease and observables increase over time, so for i≤j we have Obs(CTi)⊆Obs(CTj) and Ctrl(CTi)⊇Ctrl(CTj) (and Ensure(CTi)⊇Ensure(CTj) and Prevent(CTi)⊇Prevent(CTj)), which follows directly from the following two lemmas.

Lemma: Given a Cartesian frame C over W, if U and V are partitions of W and U is a refinement of V, then Ctrl(ExternalV(C))⊇Ctrl(ExternalU(C)).

Proof: Let CV=ExternalV(C), and let CU=ExternalV(C). We will actually only need to use the fact that CU◃×CV, and that both CU and CV have nonempty agents. CU and CV do in fact have nonempty agent, because, as we have shown, externalizing a partition of W always produces nonempty agents.

It suffices to establish that Ensure(CTi)⊇Ensure(CTj), and the result for Ctrl follows trivially.

Since CU◃×CV, there exist X, Y, Z, and f:X×Y×Z→W such that CU≃(X,Y×Z,⋄) and CV≃(X×Y,Z,∙), where ⋄ and ∙ are given by x⋄(y,z)=f(x,y,z) and (x,y)∙z=f(x,y,z). Let C′U=(X,Y×Z,⋄), and let C′V≃(X×Y,Z,∙). Observe that X and Y are nonempty.

Since Ensure is preserved by biextensional equivalence, it suffices to show that Ensure(C′V)⊇Ensure(C′U). Let S∈Ensure(C′U). Thus, there exists some x0∈X, such that for all (y,z)∈Y×Z, x0⋄(y,z)=f(x0,y,z)∈S. Since Y is nonempty, we can take an arbitrary y0∈Y, and observe that for all z∈S, (x0,y0)∙z=f(x0,y0,z)∈S. Thus,  S∈Ensure(C′V). □

Lemma: Given a Cartesian frame C over W, if U and V are partitions of W and U is a refinement of V, then Obs(ExternalV(C))⊆Obs(ExternalU(C)).

Proof: Let C=(A,E,⋅), and let u:W→U and v:W→V send each element of W to their part in U and V respectively. Let ExternalU(C)=(A/BU,BU×E,⋅U), where BU={{a′∈A | ∀e∈E,u(a′⋅e)=u(a⋅e)} | a∈A}. Similarly, letExternalU(C)=(A/BV,BV×E,⋅V), where BV={{a′∈A | ∀e∈E,v(a′⋅e)=v(a⋅e)} | a∈A}. Let bU:A→BU and bV:A→BV send each element of A to its part in BU and BV respectively. 

Since U is a refinement of V, there exists a v′:U→V, such that v′∘u=v. Further, we have that BU is a refinement of BV, so there exists a b′V:BU→BV such that b′V∘bU=bV.

Let S∈Obs(ExternalV(C)). Thus, for every pair q0,q1∈A/BV, there exists a q2∈A/BV such that q2∈if(S,q0,q1). Thus, we can define an f:A/BV×A/BV→A/BV  such that for all q0,q1∈A/BV, f(q0,q1)∈if(S,q0,q1). 

Our goal is to show that S∈Obs(ExternalU(C)). For this, it suffices to show that for any q0,q1∈A/BU, there exists a q2∈A/BU such that q2∈if(S,q0,q1). 

Let q0,q1∈A/BU be arbitrary. Given an arbitrary b∈BU, let qbi∈A/BV be any element that satisfies qbi(b′V(b))=qi(b). This is possible because qi(b)∈b⊆b′V(b). It does not matter what qbi does on other inputs. Let q2:BU→A be such that for all b∈BU, q2(b)=f(qb0,qb1)(b′V(b)).

To complete the proof, we need to show that q2∈A/BU and q2∈if(S,q0,q1). 

To show that q2∈A/BU, we need that for all b∈BU, q2(b)∈b. Let b∈BU be arbitrary. Since q0(b)∈b, by the definition of BU, it suffices to show that for all e∈E, u(q2(b)⋅e)=u(q0(b)⋅e). Further, since q1(b)∈b, we already have that for all e∈E, u(q1(b)⋅e)=u(q0(b)⋅e). Thus, it suffices to show that for all e∈E, either q2(b)⋅e=q0(b)⋅e or q2(b)⋅e=q1(b)⋅e. Indeed, if q2(b)⋅e∈S, then

q2(b)⋅e=f(qb0,qb1)(b′V(b))⋅e=qb0(b′V(b))⋅e=q0(b)⋅e,

and similarly, if q2(b)⋅e∉S, then q2(b)⋅e=q1(b)⋅e. Thus, we have that for all e∈E, u(q2(b)⋅e)=u(q0(b)⋅e), so for our arbitrary b∈BU, q0(b)∈b, so q2∈A/BU.

Let (b,e)∈BU×E  be such that q2⋅U(b,e)∈S. We want to show that q2⋅U(b,e)=q0⋅U(b,e). Indeed,

q2⋅U(b,e)=q2(b)⋅e=f(qb0,qb1)(b′V(b))⋅e=f(qb0,qb1)⋅V(b′V(b),e)=qb0⋅V(b′V(b),e)=qb0(b′V(b))⋅e=q0(b)⋅e=q0⋅U(b,e).

Symmetrically, if (b,e)∈BU×E is such that q2⋅U(b,e)∉S, we have q2⋅U(b,e)=q1⋅U(b,e). Thus q2∈if(S,q0,q1).

Thus, since q0 and q1 were arbitrary, we have that S∈Obs(ExternalU(C)), completing the proof. □

This result allows us to think of time as a sort of ritual in which control of the world is sacrificed in exchange for ability to condition on the world.

 

5. Directions for Future Work

As I noted at the start of this sequence, Cartesian frames take their motivation from Hutter, attempting to improve on the cybernetic agent model; they take their angle of attack from Pearl, using combinatorics to infer functional structure from relational structure; and they take their structure from game theory, working with base objects that look similar to normal-form games.

Building up from very simple foundations, we have found that Cartesian frames yield elegant notions of agents making choices and observations, of agents acting over time, and of subagent relations. At the same time, Cartesian frames allow us to switch between different levels of description of the world and consider many different ways of factorizing the world into variables.

I suspect that this is the last post I will write on Cartesian frames for a while, but I am excited about the framework, and would really like to get more people working on it.

To help with that, I've commented below with various directions for future work: ways that I think the framework could be extended, made better, or applied.

I've erred on the side of inclusion in these comments: some may point to dead ends, or may be based on false assumptions.

If you have questions or want to discuss Cartesian frames, I'll be hosting a fourth and final office hours / discussion section this Sunday at 2pm PT on GatherTown.



Discuss

[AN #125]: Neural network scaling laws across multiple modalities

11 ноября, 2020 - 21:20
Published on November 11, 2020 6:20 PM GMT

[AN #125]: Neural network scaling laws across multiple modalities Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world View this email in your browser Newsletter #125
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.
Audio version here (may not be up yet).
Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer. SECTIONS HIGHLIGHTS
TECHNICAL AI ALIGNMENT
        MESA OPTIMIZATION
        FORECASTING
OTHER PROGRESS IN AI
        REINFORCEMENT LEARNING    HIGHLIGHTS

Scaling Laws for Autoregressive Generative Modeling (Tom Henighan, Jared Kaplan, Mor Katz et al) (summarized by Asya): This paper looks at scaling laws for generative Transformer models of images (predicting pixels or parts of image encodings), videos (predicting frames of image encodings), multimodal image <-> text (predicting captions based on images or images based on captions), and mathematical problem solving (predicting answers to auto-generated questions about algebra, arithmetic, calculus, comparisons, integer properties, measurement, polynomials, and probability). The authors find that:

- Cross-entropy loss as a function of compute follows a power law + constant in all these data modalities (just as it does in language (AN #87)). Information theoretically, this can be interpreted as scaling a 'reducible loss' which estimates the KL divergence between the true and model distributions, and an 'irreducible loss' which estimates the entropy of the true data distribution.

- Performance on ImageNet classification fine-tuned from their generative image model also follows such a power law, whereas ImageNet classification trained from scratch actually gets worse with sufficiently large model sizes. Interestingly, this classification power law continues even past model sizes where the generative cross-entropy loss starts bending as a result of irreducible loss. The authors conclude that approaching the irreducible loss for some dataset does not necessarily indicate diminishing returns for representation quality or semantic content.

- Optimal model size as a function of compute follows a power law with an exponent very close to ~0.7 for all data modalities they've studied so far. This implies that in the current compute regime, as compute budgets grow, it's best to devote a majority of compute towards making models bigger and a minority towards training on more data.

- Larger models perform better on extrapolating to math problems more difficult than those seen in training, but only insofar as they do better on the training distribution (no benefits to 'strong generalization').

- Larger models are able to take advantage of more multimodal information, but the scaling is extremely slow-- a 1-billion-parameter model uses 10% of the information in a caption to define an image, while using 20% of the information would require a 3-trillion-parameter model.

As in the language models paper (AN #87), extrapolating the steep power laws found for optimally-used compute seems to eventually paradoxically result in loss lower than the bound given by shallower power laws for optimally-used training data. The authors offer a potential hypothesis for resolving this inconsistency-- in the regime of less compute and smaller model sizes, increasing model size effectively increases the amount of information you extract from each data point you train on, resulting in the steepness of the current compute law. As compute increases past a certain point, however, the amount of information extracted per data point approaches the maximum amount possible, so the curve switches to a shallower regime and marginal compute should be used increasingly on dataset increases rather than model size increases. If this hypothesis is true, we should eventually expect the scaling laws for compute to bend towards laws set by dataset size, and perhaps should think they will ultimately be set by trends for overfitting (see this post for another explanation of this).

Read more: the scaling “inconsistency”: openAI’s new insight



Asya's opinion: I would also recommend listening to Jared Kaplan's talk on this.

I was really excited to learn about more empirical work here. These results suggest that scaling behavior predictable with smooth power-laws is likely a feature of most generative models, not just text. I found it surprising that optimal model size given a compute budget scales the same way across data modalities-- it does seem to suggest that there's something more fundamental going on here that I don't understand (but which may be explained in this theory paper that I haven't read). It's also interesting that pretraining on a generative model (rather than training from scratch) seems to confer real benefits to scaling behavior for image classification-- this lends some support to the view that a lot of the learning that needs to happen will come from unsupervised settings.

A lot of the most salient questions around current scaling laws for me still lie in the translation between cross-entropy loss in these domains and performance on downstream tasks we care about. I feel very unsure about whether any of the fine-tuned generative models we (currently) have the data to train are likely to have transformative performance within even the next 5 orders of magnitude of compute scaling.

Rohin's opinion: In addition to the points Asya made above, I wanted to speculate on the implications of these scaling laws for AGI. I was particularly struck by how well these scaling laws seem to fit the data. This was also true in the case of mathematics problems, at least for the models we have so far, even though intuitively math requires “reasoning”. This suggests to me that even for tasks that require reasoning, capability will increase smoothly along a spectrum, and the term “reasoning” is simply a descriptor of a particular capability level. (An alternative position is that “reasoning” happens only to the extent that the neural net is implementing an algorithm that can justifiably be known to always output the right answer, but this sort of definition usually implies that humans are not doing reasoning, which seems like a deal-breaker.)

Note however that we haven't gotten to the level of performance that would be associated with "reasoning", so it is still possible that the trends stop holding and reasoning then leads to some sort of discontinuous increase in performance. I just wouldn't bet on it.

   TECHNICAL AI ALIGNMENT
 MESA OPTIMIZATION

Confucianism in AI Alignment (John Wentworth) (summarized by Rohin): Suppose we trained our agent to behave well on some set of training tasks. Mesa optimization (AN #58) suggests that we may still have a problem: the agent might perform poorly during deployment, because it ends up optimizing for some misaligned mesa objective that only agrees with the base objective on the training distribution.

This post suggests that in any training setup in which mesa optimizers would normally be incentivized, it is not sufficient to just prevent mesa optimization from happening. The fact that mesa optimizers could have arisen means that the incentives were bad. If you somehow removed mesa optimizers from the search space, there would still be a selection pressure for agents that without any malicious intent end up using heuristics that exploit the bad incentives. As a result, we should focus on fixing the incentives, rather than on excluding mesa optimizers from the search space.

Clarifying inner alignment terminology (Evan Hubinger) (summarized by Rohin): This post clarifies the author’s definitions of various terms around inner alignment. Alignment is split into intent alignment and capability robustness, and then intent alignment is further subdivided into outer alignment and objective robustness. Inner alignment is one way of achieving objective robustness, in the specific case that you have a mesa optimizer. See the post for more details on the definitions.



Rohin's opinion: I’m glad that definitions are being made clear, especially since I usually use these terms differently than the author. In particular, as mentioned in my opinion on the highlighted paper, I expect performance to smoothly go up with additional compute, data, and model capacity, and there won’t be a clear divide between capability robustness and objective robustness. As a result, I prefer not to divide these as much as is done in this post.

  FORECASTING

Measuring Progress in Deep Reinforcement Learning Sample Efficiency (Anonymous) (summarized by Asya) (H/T Carl Shulman): This paper measures historic increases in sample efficiency by looking at the number of samples needed to reach some fixed performance level on Atari games and virtual continuous control tasks. The authors find exponential progress in sample efficiency, with estimated doubling times of 10 to 18 months on Atari, 5 to 24 months on state-based continuous control, and 4 to 9 months on pixel-based continuous control, depending on the specific task and performance level. They find that these gains were mainly driven by improvements in off-policy and model-based deep RL learning approaches, as well as the use of auxiliary learning objectives to speed up representation learning, and not by model size improvements. The authors stress that their study is limited in studying only the published training curves for only three tasks, not accounting for the extent to which hyperparameter tuning may have been responsible for historic gains.



Asya's opinion: Following in the footsteps of AI and Efficiency (AN #99), here we have a paper showing exponential gains in sample efficiency in particular. I'm really glad someone did this analysis-- I think I'm surprised by how fast progress is, though as the paper notes it's unclear exactly how to relate historic improvements on fixed task performance to a sense of overall improvement in continuous control (though several of the main contributors listed in the appendix seem fairly general). I also really appreciate how thorough the full paper is in listing limitations to this work.

Since these papers are coming up in the same newsletter, I'll note the contrast between the data-unlimited domains explored in the scaling laws paper and the severely data-limited domain of real-world robotics emphasized in this paper. In robotics, it seems we are definitely still constrained by algorithmic progress that lets us train on fewer samples (or do better transfer from simulations (AN #72)). Of course, maybe progress in data-unlimited domains will ultimately result in AIs that make algorithmic progress in data-limited domains faster than humans ever could.

   OTHER PROGRESS IN AI
 REINFORCEMENT LEARNING

DeepSpeed: Extreme-scale model training for everyone (DeepSpeed Team et al) (summarized by Asya): In this post, Microsoft announces updates to DeepSpeed, its open-source deep learning training optimization library. The new updates include:

- '3D parallelism', a scheme for carefully optimizing how training runs are split across machines. Training runs that use 3D parallelism demonstrate linear scaling of GPU memory and compute efficiency, enabling the theoretical training of extremely large models of over a trillion parameters on as few as 800 NVIDIA V100 GPUs.

- 'ZeRO-Offload', which allows CPU memory to be used during training runs, enabling running models of up to 13 billion parameters on a single NVIDIA V100 GPU.

- 'DeepSpeed Sparse Attention', an instrumental technology that reduces the compute and memory requirements of attention computations used in models like Transformers. Compared to models that use densely computed attention, this enables models that pay attention to sequences that are 10x longer and can be trained up to 6.3x faster.

- '1-bit Adam', a scheme for compressing the communication requirements between machines doing training runs that use the Adam gradient descent optimizer. 1-bit Adam enables up to 5x less communication and up to 3.5x faster training runs.

Fast reinforcement learning through the composition of behaviours (André Barreto et al) (summarized by Flo): While model-based RL agents can easily adapt their policy to changed rewards on the same environment, planning is expensive and learning good models can be challenging for many tasks. On the other hand, it is challenging to get model-free agents to adapt their policy to a new reward without extensive retraining. An intermediate solution is to use so-called successor features: Instead of a value function V(π,s) representing the expected discounted reward for a policy π starting in state s, successor features are a vector-valued value function ψ(π,s) representing an expected discounted feature vector ϕ. If our reward equals r = w ⋅ ϕ for some weight vector w, we can easily obtain the original value function by taking the scalar product of the successor features and the weight vector: V(π,s) = w ⋅ ψ(π,s). Successor features thus allow us to evaluate a fixed policy π for all rewards that are linear in ϕ, which is called generalized policy evaluation.

Now that we can evaluate policies for different preferences, we would like to efficiently find a good policy for a given novel preference. Inspired by human learning that often combines previously learned skills, we employ generalized policy improvement. In vanilla policy improvement, we improve upon a policy π we can evaluate by choosing the action that maximizes the immediate reward plus the discounted value V(π,s') of following π starting in the next state s'. In generalized policy improvement, we have multiple policies and choose the action that maximizes the reward plus the discounted value of following the best of these policies starting in the next state s'. To obtain a policy for the new preference, we "stitch together" all policies we learnt for previous preferences and the resulting policy performs at least as good as all of the old policies with respect to the new preference. As generalized policy improvement does not require any additional environment samples, it enables zero-shot transfer to new preferences. Empirically, even if the weight vector w has to be learnt from reward signals, generalized policy improvement is very sample efficient. Additional samples can then be used to further improve the policy using standard RL.

Read more: Fast reinforcement learning with generalized policy updates



Flo's opinion: I really like the idea of successor features. Similar to model-based systems, they allow us to evaluate policies for many different rewards, which can be useful for anticipating problematic behaviour before deploying a system. However, note that we still need to execute the policy we obtained by generalized policy improvement to evaluate it for different rewards: The only guarantees we have is that it is better than the previous policies for the reward for which the improvement step was carried out (and potentially some weaker bounds based on the similarity of different rewards).



γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction (Michael Janner et al) (summarized by Flo): Long planning horizons are often necessary for competitive performance of model-based agents, but single-step models get less and less accurate with longer planning horizons as errors accumulate. Model-free algorithms don't have this problem but are usually reward- and policy-specific, such that transfer to other tasks can be hard. The paper proposes policy-specific γ-models as an intermediate solution: instead of learning the distribution of the next state given a state-action pair (s,a), or the final state of an n-step rollout given (s,a) and a policy π, it learns the distribution of a rollout with a stochastic, geometrically distributed length. Unlike for n-step models with n>1, the distribution follows a Bellman-style decomposition into the single-step distribution and the discounted distribution for the next state s', which allows for off-policy training of the model by bootstrapping the target distribution.

Now, if rewards are consequentialist in the sense that they only depend on the state, the expected reward under this distribution is equal to 1-γ times the Q-value for π of (s,a) such that we can use the model for policy evaluation given arbitrary consequentialist rewards. Similar to how single-step models (0-models) can be rolled out to obtain (less accurate) multi-step models, sequential rollouts of a γ-model can be reweighed to obtain a γ-model with larger γ. While this introduces some error, it reduces the bootstrap error during training, which grows with γ. Being able to interpolate between rollouts of single-step models that accumulate error during testing and models with large γ that accumulate error during training allows us to find a sweet spot between the two extremes.

In practice, single-step models are often used for model-based value expansion (MVE), where only N steps are rolled out and a value function is used for evaluating longer-term consequences. The authors' algorithm, γ-MVE instead uses N rollouts of the γ-model and adjusts the weighing of the value function accordingly. γ-MVE performs strongly both in terms of sample efficiency and final performance on a set of low-dimensional continuous control tasks.



Flo's opinion: I am a bit surprised that this works so well, as both bootstrapping and learning generative models for distributions can be unstable and the method combines both. On the other hand, there is a long tradition of continuous interpolations between different RL algorithms and their performance at the sweet spot is often significantly stronger than at the extremes.

FEEDBACK I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email. PODCAST An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.
Subscribe here:

Copyright © 2020 Alignment Newsletter, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
 

Discuss

Страницы