Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 14 минут 55 секунд назад

Sunday 22nd: The Coordination Frontier

20 ноября, 2020 - 01:09
Published on November 19, 2020 10:09 PM GMT

For this Sunday's LessWrong meetup, I'll be giving a short presentation on some moral/game-theoretic/coordination problems and solutions I've been thinking about, which I would like feedback on.

Three related puzzles:

  • Sometimes, people disagree on coordination protocols, or on moral frameworks. What do we do about that in the general case?
  • Many rationalists explore novel coordination principles. This can result in us learning about "new and exciting ways that everyone is defecting on each other all the time". If a principle is novel, it's hard to coordinate around. How do we handle disagreements about that?
  • Sometimes, Alice and Bob disagree on how to coordinate, but Bob thinks Alice has overall demonstrated better judgment, and it's (relatively) easy for Bob to defer to Alice. But other times, Alice and Bob don't trust each other, and each of them thinks the other is somewhat less sophisticated (despite having overall similar worldviews). How should they handle that situation?

"The Coordination Frontier" is my (placeholder) term for "the parts of morality/game-theory/coordination that aren't obvious, especially new principles that are novel advances on the current state-of-the-art." I think it's a useful concept for us to collectively have as we navigate complex new domains in the coming years

I have some existing thoughts that I'd like feedback on, and I'd generally like to spark discussion about this topic.

Approximate format will be:

  1. I give a short(ish) presentation
  2. General Q&A and public discussion
  3. Brainstorming in a google doc, and then splitting into smaller groups to discuss particular subtopics.
  4. Return to a centralized conversation and share insights.

This will take place in the Walled Garden, in the Manor (to the north of the central map)

http://garden.lesswrong.com?code=MNlp&event=the-coordination-frontier



Discuss

Simpson's paradox and the tyranny of strata

19 ноября, 2020 - 20:46
Published on November 19, 2020 5:46 PM GMT

Simpson's paradox is an example of how the same data can tell different stories. Most people think of this as an odd little curiosity, or perhaps a cautionary tale about the correct way to use data.

You shouldn't see Simpson's paradox like that. Rather than some little quirk, it's actually just the simplest case of a deeper and stranger issue. This is less about the “right” way to analyze data and more about limits to what questions data can answer. Simpson's paradox is actually a bit misleading, because it has a solution, while the deeper issue doesn't.

This post will illustrate this using no statistics and (basically) no math.

I Zeus

You are a mortal. You live near Olympus with a flock of sheep and goats. Zeus is a jerk and has taken up shooting your animals with lightning bolts.

He doesn't kill them; it's just boredom. Transforming into animals to seduce love interests gets old eventually.

Anyway, you wonder: Does Zeus have a preference for shooting sheep or goats? You decide to keep records for a year. You have 25 sheep and 25 goats, so you use a 5x5 grid with one cell for each animal.

At first glance, it seems like Zeus dislikes goats more than He dislikes sheep. (If you're worried about the difference being due to random chance, feel free to multiply the number of animals by a million.)

II Colors

Thinking more, it occurs to you that some animals have darker fur than others. You go back to your records and mark each animal accordingly.

You re-do the analysis, splitting into dark and light groups.

Overall, sheep are zapped less often than goats. But dark sheep are zapped more often than dark goats (7⁄11 > 10⁄16) and light sheep are zapped more often than light goats (5⁄14 > 3⁄9). This is the usual paradox: The conclusion changes when you switch from analyzing everyone to splitting into subgroups.

How does that reversal happen? It's simple: For both sheep and goats, dark animals get zapped more often, and there are more dark goats than dark sheep. Dark sheep are zapped slightly more than dark goats, and similarly for light sheep. But dumping all the animals together changes the conclusion because there are so many more dark goats. That's all there is to the regular Simpson's paradox. Group-level differences can be totally different than subgroup differences when the ratio of subgroups varies.

This probably seems like a weird little edge case so far. But let's continue.

III Stripes

Thinking even more, you notice that many of your (apparently mutant) animals have stripes. You prepare the data again, marking each animal according to stripes, rather than color.

You wonder, naturally, what happens if you analyze these groups.

The results are similar to those with color. Though sheep are zapped less often than goats overall (12⁄25 < 13⁄25), plain sheep are zapped more often than plain goats (5⁄14 > 3⁄9), and striped sheep are zapped more often than striped goats (7⁄11 > 10⁄16).

IV Colors and stripes

Of course, rather than just considering color or stripes, nothing stops you from considering both.

You decide to consider all four subgroups separately.

Now sheep are zapped less often in each subgroup. (1⁄4 < 2⁄7, 6⁄7 < 8⁄9, etc.)

When you compare everyone, there's a bias in against goats. When you compare by color, there's a bias against sheep. When you compare by stripes, there's also a bias against sheep. Yet when you compare by both color and stripes, there's a bias against goats again.

Type of animals compared Who gets zapped more often? All Goats Light Sheep Dark Sheep Plain Sheep Striped Sheep Dark Plain Goats Dark Striped Goats Light Plain Goats Light Striped Goats

How can this happen?

To answer that, it's important to realize that anything can happen. There could be basically any biases that reverse (or don't) in whatever way when you split into subgroups. In the table above, essentially any sequence of goats / sheep is possible in the right-hand column.

But how, you ask? How can this happen? I think this is the wrong question. Instead we should ask if there is anything to prevent this from happening. There are a huge variety of possible datasets, with all sorts of different group averages. Unless there is some special structure forcing things to be "orderly", essentially arbitrary stuff can happen. There is no special force here.

V Individuals

So far, this all seems like a lesson about the right way to analyze data. In some cases, that's probably true. Suppose you are surprised to read that Prestige Airways is more often delayed than GreatValue Skybus. Looking closer, you notice that Prestige flies mostly between snowy cities while Skybus mostly flies between warm dry cities. Prestige can easily have a better track record for all individual routes, but a worse track record overall, simply because they fly hard routes more often. In this case, it's probably correct say that Prestige is more reliable.

But in other cases, the lesson should be just the opposite: There is no "right" way to analyze data. Often the real world looks like this:

There's no clear dividing line between "dark" and "light" animals. Stripes can be dense or sparse, thick or thin, light or dark. There can be many dark spots or few light spots. This list can go on forever. In the real world, individuals often vary in so many ways that there's no obvious definition of subgroups. In these cases, you don't beat the paradox. To get answers you have to make arbitrary choices, yet the answers depend on the choices you make.

Arguably this is a philosophical problem as much as a statistical one. We usually think about bias in terms of "groups". If prospects vary for two "otherwise identical" individuals in two groups, we say there is a bias. This made sense for airlines above: If Prestige was more often on time than GreatValue for each route, it's fair to say Prestige is more reliable.

But in a world of individuals, this definition of bias breaks down. Suppose Prestige mostly flies in the middle of the day on weekends in winter, while Skybus mostly flies at night during the week in summer. They vary from these patterns, but never enough that they are flying the same route on the same day at the same time at the same time of year. If you want to compare, you can group flights by cities or day or time or season, but not all of them. Different groupings (and sub-groupings) can give different result. There simply is no right answer.

This is the endpoint of Simpson's paradox: Group level differences often really are misleading. You can try to solve that by accounting for variability within groups. There are lots of complex ways to try to do that I haven't talked about, but none of them solve the fundamental problem of what bias mean whens every example is unique.



Discuss

Notes on Prudence

19 ноября, 2020 - 19:14
Published on November 19, 2020 4:14 PM GMT

This post examines the virtue of prudence. It is meant mostly as an exploration of what other people have learned about this virtue, rather than as me expressing my own opinions about it, though I’ve been selective about what I found interesting or credible, according to my own inclinations. I wrote this not as an expert on the topic, but as someone who wants to learn more about it. I hope it will be helpful to people who want to know more about this virtue and how to nurture it.

What is prudence?

Prudence is one of the four cardinal virtues. From there it became part of the seven traditional Christian virtues. It turns up again and again in virtue traditions. I can’t very well ignore it.

And yet… the word “prudence” has gone through such a dramatic shift in meaning that it’s difficult to know how to tackle this one.

“Prudence” was a common English translation of the Greek word phrónēsis, which has implications that range from having how-to skills to things like choosing your goals wisely and exercising good judgment when picking paths to those goals. In short, it is wisdom applied to practical, real-world decision-making, where the rubber meets the road.

When prudence was incorporated into the traditional Christian virtues, it was via the Latin word prudentia, which can mean things like rationality, insight, discernment, foresight, wisdom, or skill. Again, though, the focus is on the quality of your process of making practical decisions, so this isn’t too far off.

Dana Carvey as President G.H.W. Bush on Saturday Night Live

But nowadays when you call someone “prudent” you usually mean that they are cautious: they plan ahead, look before they leap, avoid taking unnecessary risks, save for a rainy day, and that sort of thing. The word now has an old-fashioned sound to it, and is rare enough as a complement that it’s sometimes even deployed as an insult, to imply that the “prudent” person is over-cautious, timid, afraid to take chances, or reluctant to innovate. (The resemblance of the word “prudence” to the etymologically distinct word “prudish” has also contributed to giving the word a stuffy connotation.)

Because of this meaning shift, when you see someone singing the praises of “prudence” it’s important to investigate further to find out which sort of prudence they’re praising. Sometimes authors will even drift from one definition to the other without seeming to realize that they’re doing so (see for example “In Praise of Prudence” from Positive Psychology News).

Prudence as practical wisdom / decision theory

The science of what is a rational decision to make, given certain goals and constraints and uncertainties, is called Decision Theory. It is complex and interesting and I am thankful that there is a marvelous Decision Theory FAQ on LW so I don’t have to try to summarize it myself.

Prudence (in the sense of “practical wisdom”) might be considered decision theory put into practice. Being practically skilled at making rational decisions is something that goes beyond theoretical understanding of good decision-making processes.

Aristotle explained the difference this way: While it’s possible for a young person to be a savant with a genius understanding of something like mathematics, prudence seems to be something that must be acquired through long experience. This is because expertise in mathematics largely requires an intellectual understanding of abstract universals, while prudence requires actual encounters with real-life particulars. When you teach a young savant a mathematical truth, he or she grasps it as a truth immediately; but when you teach a truth of prudence, the same student may have reason to be skeptical and to need to see that truth exemplified in real-life examples first before he or she can internalize it into his or her worldview.

You exercise prudence when you:

  1. Recognize that you are faced with a decision and are not indifferent to the outcome.
  2. Use a skillful process of evaluating your alternatives to come up with the best choice.
  3. Follow through on that decision by actually acting as you have decided to act. (This may also involve the virtue of self-control.)

Psychologist Barry Schwartz has made prudence (in the sense of practical wisdom) a focus of his work. Here are links to videos of some of his talks on the subject:

In part what Schwartz is doing is pushing back against theories that what we need to do to improve society is to create better rules and institutions on the one hand, or cleverly manipulate incentives on the other. He believes, and says that his research supports, that those things are insufficient. To make things better, you need to improve not the incentives or structures that people act within, but the characters of the people themselves.

If I squint and turn my head at an angle, this looks to me like the practical version of the theoretical ethics debate between deontologists, consequentialists, and virtue ethicists. Deontologists might advocate better rules and institutions; consequentialists might argue for the importance of incentives; and virtue ethicists emphasize the need for character.

Prudence as well-honed caution / risk management

You often hear that people have become too meek and risk-averse: afraid to try new things, to make bold experiments, to boldly go where no one has gone before, and so forth. On the other hand, the VIA Institute on Character, which uses a virtue-oriented assessment test to find people’s “inventory of character strengths,” found that “the least prevalent character strengths in [those they tested] are prudence, modesty, and self-regulation.” (The VIA Institute uses the prudence-as-caution definition: “Prudence means being careful about your choices, stopping and thinking before acting. It is a strength of restraint.”)

People often estimate risks poorly, and plan for them badly. Lists of typical human cognitive biases show few that are not also ways risk-assessment can go awry. We seem to have a variety of contradictory heuristics that are good enough to help us make the day-to-day quick decisions we need muddle through life, but that reveal themselves to be shockingly absurd when examined closely.

The popularity of casino gambling, and its addictiveness in some people, suggests that even when we gamify simple scenarios of risk management and provide prompt negative feedback for poor risk assessment, people can fail to correct appropriately.

Certainly if the stakes are high enough and we have enough time to think about it, we would be wise to insist on more rational methods than “just eyeballing it” with our ramshackle instincts. This is especially true in circumstances in which we are exposed to risks very different from those our ancestors would have faced — such as driving on the freeway, starting a course of chemotherapy, or sharing an unguarded opinion on an internet forum. In such cases we can expect even less reliable help from our instinctual heuristics.

There is some similarity between prudence in this sense (appropriate response to risk) and courage (appropriate response to fear). However, fear and risk may be only loosely correlated because of the difficulties we face when trying to assess risk. Much of the challenge of courage has to do with our emotional response to fear, whereas much of the challenge of prudence has to do with the cognitive challenge of assessing risk well. Still, there is some overlap, and some people who think of themselves as overly risk-averse may need to work on courage as much as or more than on risk-assessment.



Discuss

Covid 11/19: Don’t Do Stupid Things

19 ноября, 2020 - 19:00
Published on November 19, 2020 4:00 PM GMT

There is very good news. We have a second vaccine! Both Pfizer and Moderna’s vaccines have now shown 94%+ effectiveness in treating Covid-19. Not to be outdone by Moderna’s report of 94.5% effectiveness, Pfizer has its final results and they are excellent. 95% effective, 94% effective in people over 65, no major safety concerns. They plan to apply for emergency use authorization within days and likely already have done so by the time you read this. Estimates are that we can vaccinate 20 million people by year’s end.

There is also very bad news. We have more positive test results than ever before, more hospitalizations than ever before, and a substantially increased positive test rate. Deaths lag tests, but are on track to rise proportionally to the rise in tests. 

Within that bad news there is good news. There are signs that the Midwest in particular, where things are maximally bad right now, is getting ready to peak. My estimate for how bad things will get before they get better, and how long it will be before that happens, have both tentatively gone down somewhat. Bad news can still be good news if you expected even worse news.

There is also an important note before we get to business.

Last week in my weekly post, I said some things about the election that did not relate to Covid-19. Doing so broke the norms of my personal blog and the norms of LessWrong, where these updates are reposted. I felt it was the least bad option under the circumstances. I could have executed it better, but I stand by my decision, and accept the consequences. 

I do not intend to let it happen again. 

One of the consequences of breaking this norm is that it is even more important for me to strongly reassert the norm going forward. Tempted as I may become, I will be extra careful not to discuss politics except as it directly relates to Covid-19 or requires us to take precautions for our own safety. I also plan to enforce the rule strongly in the comments. 

If you feel I have crossed that line in this or a future post in this series, once you have verified no one else has yet called me out, please do not hesitate to call me out on that.

If you feel I have crossed that line in a post not in this series, and I didn’t explicitly acknowledge that I was doing a politics-necessary thing like a prediction market post, call me out on that too.

Thank you. Let’s run the numbers.

The Numbers Deaths DateWESTMIDWESTSOUTHNORTHEASTSep 17-Sep 2310168932695399Sep 24-Sep 309349902619360Oct 1-Oct 779711032308400Oct 8-Oct 1478212172366436Oct 15-Oct 2180415912370523Oct 22-Oct 2889517012208612Oct 29-Nov 495619772309613Nov 5-Nov 11108927122535870Nov 12-Nov 181255293428181127

Deaths are rising everywhere. That’s baked in for at least the next few weeks. Deaths generally continue to reliably follow cases proportionally after a 14-21 day delay, and case counts have yet to stabilize. The Midwest number here is good news after last week’s giant jump, but mostly there are no surprises here.

Positive Tests DateWESTMIDWESTSOUTHNORTHEASTSep 17-Sep 23540258538112773223342Sep 24-Sep 30554969293210630027214Oct 1-Oct 7567429724311017034042Oct 8-Oct 146828412574411799538918Oct 15-Oct 217557114985113323843325Oct 22-Oct 289498318188115812357420Oct 29-Nov 411268425291716709870166Nov 5-Nov 11157495387071206380108581Nov 12-Nov 18211222452265255637150724

Things are still getting worse everywhere, but the rate at which the Midwest in particular is getting worse is slowing down. Hopefully people there are starting to get the message. It’s also likely that herd immunity effects are having an impact. With positive test rates in the double digits, there’s little doubt a large majority of cases are being missed, and parts of the Midwest now have positive test counts imply the actual case counts should be very high, in some cases high enough that herd immunity should be rapidly approaching. If anything, there is the worry that the Dakotas are providing evidence that the herd immunity threshold is on the higher end of its potential range, because we are already at or near the lower end. 

Positive Test Percentages PercentagesNortheastMidwestSouthWest9/10 to 9/162.41%5.99%11.35%4.49%9/17 to 9/232.20%5.96%7.13%4.11%9/24 to 9/302.60%6.17%6.18%4.27%10/1 to 10/72.61%6.05%6.74%4.23%10/8 to 10/142.57%8.14%7.09%4.75%10/15 to 10/222.95%8.70%7.85%5.36%10/22 to 10/283.68%9.87%8.58%6.46%10/29 to 11/44.28%12.79%8.86%7.04%11/5 to 11/115.56%17.51%9.89%8.31%11/12 to 11/186.99%18.90%11.64%10.66%

Percentages and test counts tell the same story. A small relative increase in the Midwest that bodes relatively well, signs that aren’t great in the other regions. 

Test Counts

DateUSA testsPositive %NY testsPositive %Cumulative PositivesSep 10-Sep 164,636,1405.8%559,4630.9%2.00%Sep 17-Sep 235,737,9195.2%610,8020.9%2.09%Sep 24-Sep 305,833,7575.1%618,3781.1%2.18%Oct 1-Oct 76,009,8455.2%763,9351.3%2.28%Oct 8-Oct 146,322,8655.7%850,2231.1%2.39%Oct 15-Oct 216,439,7816.5%865,8901.2%2.52%Oct 22-Oct 286,933,1567.5%890,1851.4%2.67%Oct 29-Nov 47,245,6008.6%973,7771.6%2.86%Nov 5-Nov 118,285,49510.6%1,059,5592.4%3.13%Nov 12-Nov 188,924,33812.3%1,155,6702.9%3.47%

My prediction last week was 12.9% positive rate on 9 million tests. We got a 12.3% positive rate on 8.9 million tests. Which is still way worse than last week’s rate of 10.8%, but better than my expectations due to things stabilizing somewhat the last few days of this week.

For next week my best guess is 13.4% positive rate on 9.5 million tests. That’s still headed in the wrong direction, but at a slower pace. One must constantly redefine what counts as good news.

The basic facts are unchanged and worth repeating. It is bad out there and is continuing to get worse. Cases will likely increase for a while longer, deaths for a month longer than that. You need to decide whether you are willing to hold out until the vaccines are available. 

If you do not wish to get Covid-19, and I do not think you want to get Covid-19, then unless you have already had it, now is the time to be even more cautious than ever before (with the exception of the greater New York region in March/April). That means among other things: mask up, do everything outdoors whenever possible and especially don’t spend 15+ minute periods together with others indoors unless they’re in your pod, socially distance, avoid being in the direct path of anyone talking let alone yelling or singing, supplement Vitamin D. Most of all don’t do stupid things, like large indoor gatherings without masks such as a traditional Thanksgiving. Encourage others to do the same. 

Machine Learning Project

They have relaunched (hat tip: Nate Silver) after a hiatus, and in this new iteration the site is focused entirely on now-casting, to answer the question of how many people are currently infected or have been infected. 

Here is their best guess.

For the United States, their answer comes back 631k infected yesterday and 560k infected five days ago (versus 164k positive test results yesterday). That is less cases than I would have expected, and unfortunately if true it implies both that herd immunity is not developing slower than I was guessing, and that the infection fatality rate is higher and remains around 0.5%. I still think the IFR is lower than that at this point, but this updates me somewhat towards less unnoticed infections. It’s plausible we are now better at knowing who to test, and we have more tests, so even though the positive rate is going up, the percentage of cases detected might not be going down. 

Their guess for total infections so far is 16.9%, with numbers in the high 30s for the Dakotas and 22.3% for New York. That also seems on the low end of plausible to me.   

Europe

Not pictured for usual reasons is Belgium, among others. If you look at Belgium, you see both a giant peak and now a giant rapid decline. What they are doing is working, and working fast. Positive test percentages are still troubling everywhere else, so there is worry that late reporting is making things look better than they actually are.

Deaths are climbing, but that was already baked in before lockdowns started.

This tweet alerted me to this website that gives excellent visualizations of Covid-19 in Europe. The effects of lockdowns are unmistakable. Again, we need to worry about late reporting, but the borders tell the story. 

Sweden is no longer the control group. An odd time to end such a valuable scientific experiment, but that’s how it goes these days. I suppose there wasn’t that much more to learn.

Go Away Or I Will Taunt You a Second Time

States enact more Covid-19 rules as infections are on the rise. 

The problem is that these new rules are both looking mostly in the wrong places and beyond toothless. 

Much of Europe went into strict lockdown. I was and am still skeptical that they were right to keep schools open, but it was a real attempt that clearly was capable of working, and it seems to be working.

The new American restrictions are not a real attempt, and have no chance of working. They presumably will make a small impact, but this is not what trying looks like. Contrast this with Europe, where trying is taking place. When the United States turns this wave around, it will not primarily be due to imposed restrictions, but instead due to some combination of voluntary behavior changes and people becoming immune.

When one sees someone claiming to address a problem, one can ask how many levels of ‘try’ and ‘pretend’ are involved. 

In this case, I believe that the attempts are pretending to pretend to try to try to solve the problem. They are not trying to try, but they are pretending that they are doing something that could plausibly pretend to be trying to try. Their hope is that this symbolically translates as ‘doing something’ and prevents the creation of common knowledge that there is no attempt at all. Ideally, it will allow those who do pretend to pretend to try to try to mark those who do not do this as blameworthy when things go badly, or even allow the meta-pretenders to claim credit when things improve. Ah, the joys of living in a high level simulacrum

The one exception might be California, which may be trying to try or even outright trying. That would be consistent with their approach so far. 

If I don’t write a post within a few months breaking down various meta levels of trying and pretending, please remind me to do so. It’s definitely worth doing.

In the meantime, be under no illusions that we are doing anything as a nation other than giving up and letting people fend for themselves, while hoping that enough people will be successful enough that the hospital systems will remain intact. 

And again, I am not convinced that this is the wrong decision. I am especially not convinced it is the wrong decision given the public choice problems involved. We lack the capacity to do enough to solve the problem before the vaccine arrives. So what is the alternative?  

That article from CNN also introduced me to a great new line.

Our New Motto: Don’t Share Your Air and Don’t Do Stupid Things

Exactly.

That’s a motto being used by Los Angeles Mayor Eric Garcetti, and I for one am here for it. I’ve been saying ‘don’t do stupid stuff’ but I am not good at marketing and stupid things is clearly better. Share your air also rolls nicely off the tongue and summarizes the most important secondary point (even if it is technically logically unnecessary, since it is a stupid thing to do). 

This correctly boils things down to their essence, in the style of You Have About Five Words, provided people can connect ‘don’t share your air’ to not doing things indoors. If they can also connect it to ‘wear a mask’ that might be a reach, but it is both at least plausible and even better. 

If there was more bandwidth available, next up in my priority queue would be telling people to supplement Vitamin D, but the American people’s available bandwidth is, shall we say, not high.

I cannot emphasize enough that people doing stupid things are most of the problem. 

Our failure to contain this pandemic is not about people making careful informed trade-offs and choosing slightly too much risk slightly too often. Nor is our problem an insufficiently detailed understanding of exactly how to go from somewhat safe to absolutely safe, or exactly how many feet apart to be or which direction to face or how long to interact, or wearing the wrong kind of mask, or any other such details.

Those details matter! They especially matter to you personally if you want to get the most out of life while taking the least risk. The difference between getting those details right or wrong can easily be an order of magnitude of risk. 

But here’s the thing. If you’re even thinking about those questions, you’re not taking that much of the risk. You have already cut out most of your risk compared to your pre-pandemic behaviors. Cutting your risk further could be worthwhile for you but that’s not how we win. It is like worrying about whether you are going to contribute to climate change because you used a paper bag that was slightly too large while looking up at a coal power plant. You are not the issue here.

And let me tell you. In what might be the ultimate evergreen statement, people are doing a lot of stupid things.

There’s a whole section called Thanks for the Hypocrisy later in this post about Thanksgiving plans, where 40% plan to have a gathering of 10 or more people. 

One nurse’s tale from South Dakota, treating patients that don’t believe Covid-19 is real. A huge portion of the public continues to treat Covid-19 as not being real, or no worse than the flu.

It seems Magic players in Oklahoma are trying to gather together for large tournaments on the same day their state ran out of ICU beds? Come on, everyone, we’re better than this.

First cruise ship to set sail since the pandemic has five people test positive for Covid-19. I am Jack’s utter lack of surprise.

Masks continue to be seen as a political statement, and are slightly annoying to wear, so huge percentages of people refuse to wear them.

Indoor dining continues in most of the country because of ‘the economy’ and because we don’t want restaurants to go out of business nor can we agree to give those restaurants money to not go out of business. So this week, Maryland did something about its indoor dining… and went from 75% to 50% capacity.

The New York Times reported on a wedding that had 200 guests and the fact that they had 200 people at an indoor wedding was both barely mentioned and also excused because the 200 people were ‘socially distanced.’ 

Others think that small groups and daily routines don’t count, so they felt like they were ‘doing everything right (WSJ).’ So they gather without masks outside of their home pods like it is nothing.

Most people think that certain types of activities are ‘safe’ and ‘don’t count’ in some important sense. That’s how people instinctively think and it is also the language authorities are using. One can follow all the guidelines and still have a two-household ten-person Thanksgiving, and in most places dine indoors with others several times a week, and so forth. 

The biggest category of ‘stupid stuff’ has been identified, and authorities seem to be converging on informal gatherings of friends of family, indoors, in their homes. This is the new thing to blame, which is the wrong framing to think about almost anything, but is also probably the big reason things are out of control. People don’t think of their friends or family as risky, they let down their guard, and then they’re wrong. 

The new stupid thing is doing intertemporal substitution exactly wrong. Lots of people see that the vaccine is on the horizon, that the end is near, and they start taking more risks rather than less. 

This is of course exactly backwards. An end to the pandemic raises the value of staying safe, and it lowers the cost of staying safe. So you should be safer and take less risk. But people’s minds largely don’t work like that. I’m not sure exactly what they do think instead. One possibility is that they implicitly have an idea for how much risk they are willing to take, and now that they won’t have to ‘spend out of their budget’ for that much longer, they are free to take more risk now. Another is that they get the message ‘things are better’ and act like things are better, without processing any implications at all, and that seems closer to the central thing going on to me. But I don’t understand and insights here are appreciated. 

All I Want For Christmas are a Covid Vaccine And a PS5 But They Underpriced Them And Now They’re All Sold Out

Moderna’s vaccine is 94.5% effective in preliminary results, with 90 symptomatic cases in the control group versus 5 in the treatment group. Consensus seems to think the results are ideal and very good news.  Here is the official press release. Moderna’s vaccine is similar to Pfizer’s, but has the advantage of not requiring storage to be in super cold freezers. Hence the building of giant freezers already underway. We could be doing much better on vaccine logistics and timelines, but we also are doing some things right. 

I have not seen it pointed out explicitly by anyone, but we have far more efficacy and safety data on both vaccines than we think that we do, because they are very similar vaccines. Many did realize this enough to note that they expected the Moderna vaccine to be effective due to the results we got from Pfizer, but it goes further than that. To a large extent, we can use the results from each vaccine trial as additional evidence about the other, and gain even more confidence that both vaccines work. Remember, we haven’t had a few failed attempts and two successes. Nothing has been put in a file drawer. We have two similar success stories and zero failures. 

The reason I am not making a bigger deal about ‘distributing these vaccines yesterday’ is that my understanding is that distribution and production have distinct bottlenecks, and production is already at the maximum regulations and our limited willingness to pay extra (along with any other bottlenecks) will allow. Thus, we are moving November vaccinations into December and January, but April vaccinations stay in April.

There are concerns that states do not have enough money for vaccine distribution (Washington Post). Obviously this is insane and the federal government should give states far more money than they actually need to avoid even slight delays in distribution. But it aso shouldn’t matter. I know states are in financial trouble, but the vaccine is the way to make that stop, so states should spend the money if they have to, no matter what it takes, whether or not the federal government picks up the check later. Money is fungible.

One odd thing is that projected timelines did not seem to move at all when we went from one vaccine to two. That could mean that the supply chains both have the same bottleneck, or it could mean that everyone was assuming Moderna’s vaccine would work and it was already scheduled in. Or it could mean that we are scaling up by orders of magnitude, so one doubling now actually does not change the timeline much. 

“Vaccine chief Moncef Slaoui says there will be enough vaccines to immunize 20 million Americans in December.” My understanding is now that this is from a combination of both vaccines. 

An important part of my model has been that when people become infected, they provide much more effective immunity than you would expect at random, because people aren’t infected at random. 

With the vaccine, it depends on who gets the vaccine. If we base our decisions on who is most vulnerable and/or who is most eager to get the vaccine, immunity from vaccination is going to be much less effective at providing herd immunity than one would expect from random vaccinations. Those are exactly the people who are already being careful, so them being immune won’t make as much difference as we might like. The pandemic won’t end as quickly.

If we go for those who have jobs that put them at highest risk, the opposite happens. Every time we immunize an essential worker who has not already caught Covid-19, or someone who is otherwise taking a lot of risk, we are bringing things closer to a conclusion faster. 

It looks like we are going to do a combination of these approaches. Some essential workers will be at the front of the line, and our most vulnerable will be there as well. Those who want the vaccine the most and value it the most will mostly have to wait, but will likely comprise the ‘second wave’ that I will join. 

When will things be normal again? Some experts say surprisingly quickly: Fauci cautions ‘gradual return’ to normalcy by ‘second, third’ quarter 2021. To me, that’s surprisingly quick. As an individual, I might be ready for normal life in May, but that doesn’t mean normal life is ready for me. If we are less than a year from things seeming mostly normal, I will absolutely take that result. One could plausibly respond that the average person will hear ‘vaccine available’ as ‘pandemic over’ so any realistic timeline will sound pessimistic. It is also possible that this is a case of giving an optimistic prediction, but framing it as pessimistic because Very Serious People only allow pessimistic predictions. One must constantly be cautioning.

Immunity to Covid-19 For Some, Miniature American Flags for Others

I talked about this last week, and now have found some hard data. 

A new Gallup poll finds that 58% of Americans would take a Covid-19 vaccine, up from 50% previously but lower than before vaccine safety turned into a partisan issue. That seems about right for ‘would get the vaccine if it was offered incidentally during a check-up.’ And it is far worse in other places, such as in Spain where only 24% of people want to get the vaccine right away.  My guess is that ‘willing to go out of one’s way to get it’ is lower than that, and ‘willing to put in effort to secure part of a limited supply, maybe even wait on a line’ is lower still, and that it will be even lower once everyone starts talking about the side effects. Which are mild, but still mean there’s a substantial chance you spend the next day in bed feeling bad. A lot of people aren’t down for that.

Many sources, even well-intentioned ones, are not helping matters. Joe Rogan, on the nation’s most popular podcast, openly questioned whether one day feeling bad to get the vaccine wasn’t worse than getting Covid-19 if you are healthy, because his assistant and others he knows have had mild cases they barely noticed, and he noticed that Trump is out of shape, got Covid-19 and didn’t die. 

I even see how he got to that place, and it shows how hard it is to get a good model even when one is trying, as I believe that he is. Joe Rogan’s podcast is so popular for many reasons, but in part it is because he actually tries to use reason and his experiences to build up a full-of-gears model of physical reality in a way regular people can relate to. People are starved for that. I hope he is helping inspire others to think for themselves. He clearly has some important pieces of the puzzle that the people I know mostly lack, and he clearly lacks many important pieces that the people I know have mastered. I strongly disagree with him a lot. He gets a lot of things importantly wrong, but at least he is wrong, and I have no doubt I am frequently wrong too. An exchange of knowledge would be highly valuable. 

Despite all the problems convincing people to take the vaccine, as I noted last week, I do not expect this to impact distribution of the vaccine any time soon. Half the people are plenty of people until half the people are vaccinated, at which point we will know it is safe and we will pick up another 10%, at which point we will be at or near full herd immunity and we will have had plenty of time to use other methods to convince the remaining people, including requiring vaccination to participate in various activities if that proves necessary. 

In the meantime, this only makes it easier for those of us who actively want the vaccine yesterday.

Thanks for the Hypocrisy

Government officials rise as one to urge us to limit or cancel our Thanksgiving plans. They are, of course, correct. By that time things will probably be substantially more dangerous than they are even now. Dining and talking indoors for much of the day in large groups is one of the riskiest activities in terms of Covid-19 infection. Combining it with students returning from college will make it even worse, and many gatherings include our most vulnerable. It’s not a good idea. 

If you have thanksgiving plans outside of your pod and do not want your family to be infected with Covid-19, either everyone coming needs to quarantine for two weeks beforehand, or you need to cancel your thanksgiving plans.  

The people do not seem to be listening. “Nearly 40% of US residents plan to participate in gatherings of 10 or more people this holiday season despite concerns over the spread of COVID-19”. Here are some other numbers:

However, 73% of respondents said they would practice social distancing during the holidays and 79% suggested that they would celebrate or gather only with people with whom they live, the data showed.

Just over 80% indicated that they would ask family and friends invited to events not to come if they had symptoms of COVID-19.

I notice I am confused and a bit boggled. That’s a lot of people who claim to be living in groups of ten or more. 

It’s also almost 20% of people not asking those with symptoms of Covid-19 not to attend. 

Even after everything that has happened, that last one blows my mind. People who are sick should stay home and not attend gatherings. This isn’t a new principle. Even if you don’t think Covid-19 exists, even if the year is let’s say 2018, if you are coughing up a storm and feel terrible, stay home. This is not hard.  

Holding the holiday at all is different, even done indoors, without masks and without social distancing. Yes, that’s crazy risky. You should not do this. 

But I understand. I had to step in to tell someone in my family, coming to my own Thanksgiving that they had to isolate or they couldn’t attend, because it’s a damn hard thing to say to someone. That conversation can go very badly. Even I was tempted to let it happen. No one wants to be the villain who ruined Thanksgiving. It has been a long and lonely year. Thanksgiving is to many the second most important holiday of the year, second only to Christmas which is under similar threat. I totally, totally get it.

Even more than that, I don’t even think the decision is obviously wrong. From a personal perspective, the holidays are valuable to us, and who are we to tell people they aren’t worth the risk? People are choosing that risk, they have skin in the game and they are revealing their preferences to us. 

From a collective action standpoint, it would be better if we were all going to do our part to contain the virus. But it is clear at this point that we are not going to do that. How do you tell people to keep picking Stag when half the country keeps choosing Rabbit, if that’s how you view the payoff matrix? 

It’s damn hard! Here’s Germany’s recent attempt, which is worth a spoiler-free watch. On top of the obvious things, it’s worth noting that the message is to do something to improve the world rather than do something as a signal. Nor did it shame anyone. A sharp contrast with the American ads I saw in New York, and a welcome one. Not good enough to get it done, but good show. 

So you’re the nation’s politicians who want to help, and you don’t know what to do. You can’t do lockdowns or restrictions because the public won’t stand for it or listen to you. That bridge is burned. Physical action like advancing the vaccine is out of your hands. Especially if he has anything to say about it. 

I still have an important suggestion for you: Set a good example and don’t be a giant obvious hypocrite. 

As in, if you want to encourage people to be responsible and encourage everyone not to engage in the one activity we most need to stop doing, indoor dining outside one’s household, it would help to not be a giant hypocrite and hold indoor dinners for incoming congressional members of both parties. People notice. People remember. As they should. They tend to react like this. This account sums up my reaction. Canceling the event after people react to it won’t help much. 

Governor Newsome of California regrets attending a 12 person dinner at the French LaundryMore refusal to set a good example. More utter lack of skin in the game.

You might be a member of the New York City government who bragged about not limiting the size of your indoor gatherings. Which is ‘ok, fair’ but in his defense there are others who are also flaunting the rules who aren’t getting called out as loudly, which isn’t fair. Fairness is important.

At least eight members of congress tested positive this past week. Any individual person can be unlucky, but seven in one week makes it clear that good examples are most certainly not being set.

That’s only the incidents on a personal level that I noticed this week. 

The protests. The haircut. The funeral for John Lewis. The Rose Garden ceremony. The president resumed work in person while infectious. Non-existent contact tracing and notifications at the highest levels. The celebrations after the election. Indoor dining at up to 75% capacity. The list goes on. 

Maybe remember that the ‘little people’ who do the work count as people, and don’t keep saying they’re not there and no one was within six feet at least when we have video or photographic evidence.

While they close playgrounds and schools, and tell regular people not to celebrate holidays. 

This keeps happening. If there is to be any hope of convincing people to do their part, it needs to stop.

New York City Closes Schools

This story is, as usual, the embodiment of New York in the age of Covid-19.

Europe has mostly closed everything else but left schools open. 

New York City has a teacher’s union

Unions are not known for their flexibility. When schools reopened, it was agreed that they would close again if positive test rates in the city hit 3%. 

They hit exactly 3% yesterday, so today the schools are closed.

Hours before this, a reporter asked Cuomo if New York City schools would be closed, which caused him to tell the reporter they were ‘confused’ and generally go on a tirate.

Kids can’t go to school anywhere in the city despite many areas still being at roughly 1% positive rates, but they can go to a movie theater for ‘enrichment’ instead

De Blasio has no plan for how to reopen the schools, because ‘“This day seemed far off, thankfully.” Isn’t it great when your false impressions save you from having to do the work of being mayor? 

These stories always put me in a strange spot, because I am not in general in favor of schools, but closing schools while allowing indoor dining to continue is everywhere and always a stunningly major league screw-up. Schools so far do not seem to be a major source of transmission in the city, and many find them invaluable. Even if you agree with me that school is terrible, remote learning as implemented by schools seems to be far worse, and be designed to ensure kids don’t escape any of the tortures of attending school while not providing what benefits they were getting by being around other kids and occasionally even learning things. It’s almost as if the system acts to punish people rather than solve the problem.

I am confident that closing schools while leaving other things open is going to damage the kids involved, given the current state of remote learning, and enrage the parents especially those put into logistical nightmares, which is a lot of them. They will remember. 

And remember that if remote learning is making the whole class depressed, or otherwise destroying lives, that is a choice made by humans that can be undone by humans. Consider contacting the other parents in the class, standing as one voice, and saying no. Tell them what the new rules are, and dare them to fail the entire class over failing to sit there and be tortured. Or, of course, consider withdrawing from remote learning entirely, and using other methods.

In Other News

You know how you get people to give up entirely and let the pandemic run wild? One easy way is you say things like this, and tell people that even after they are personally vaccinated with a 95% effective vaccine, and wait the 14 days after the second shot, they still need to wear masks and socially distance. Not only wouldn’t I blame people for giving up if they thought that was the official Very Serious Person line, I’d think that from their perspective they were absolutely doing the right thing. 

Marginal Revolution reports that NY Times reports a 30-minute at-home Covid test has been approved, although a prescription is required because the cartel must be paid. 

A study came out in Denmark looking at the effectiveness of mask usage. It was underpowered, leaving it unable to confirm its hypothesis that mask use is highly effective, and you can guess how people are interpreting that. It looks like if you don’t use antibody tests, which in context introduce a ton of noise to an already underpowered study, you do find sufficient evidence to conclude the masks worked. This despite low power, highly uneven mask compliance (which is to be expected in the real world of course), and also everyone treating ‘what we can show with p<0.05’ as equal to ‘how effective masks are’ when evaluating the study. Plus no one can make the other person wear a mask, which is where the majority of effectiveness lies. All in all, it seems like even more strong evidence masks work, and full mask compliance would be more than sufficient to end the pandemic quickly on its own.   

Mayo Clinic has had 900 people infected over the past two weeks, with 93% of them getting infected while off the job, and the majority of the rest being infected in the break room while eating. Indoor dining is still super dangerous, and we get another data point that it is possible to take proper precautions when providing health care. If you want to do something safely badly enough, you can do it, and the professionals want it badly enough.

Dolly Parton helped fund the Moderna vaccine. Neat. No idea why anyone needed to do that, but still. Neat.

You remember that thing we heard about two weeks ago, that it seems we can mostly detect who has Covid-19 for zero marginal cost through analysis of an audio recording of a forced cough? And then everyone forgot about it, presumably because regulatory barriers mean it’s useless? It’s still there and still looks much more accurate than it needs to be in order to be the basis of a solution to this whole thing. It would still be a real, real shame if someone were to find a way to make this available for free online to anyone who wants it. A real shame! 

There’s also now a proposal to use smartwatch data, which can happen automatically for existing users. The question the paper does not yet answer is how many false positives the system would find. If that number is sufficiently low, and regulations don’t get in the way, this seems valuable. Until we get that information, nothing to see yet. But it’s more evidence for the theory that ‘Covid-19 causes a bunch of changes to most people and you can detect those changes in a lot of ways.’

Senator Rob Portman (R-OH) has announced he is part of the Janssen-Johnson and Johnson Phase 3 vaccine trial. Good for him!

Elon Musk gets tested four times for Covid-19 in one day, gets two positives and two negatives, shares news with the world with framing that “something very bogus is going on” via Twitter. As anyone who knows Bayes’ rule can figure out, he almost certainly had Covid-19. A few days later, he had to watch his latest SpaceX launch remotely. But also he put astronauts into space so let’s all go easy on him and wish him a speedy recovery. It’s all right that he has useful models of some parts of the world but not others. 

If you had 50%+ positive test rates and death tolls to match in the ‘what it takes to get Governor Doug Burgum of North Dakota to issue a mask mandate’ pool, congratulations, you’re a winner. It can be done.

Last week Cuomo told us he was going to try and stop distribution of the virus to New York until Biden was president. In a rare show of bipartisan cooperation, Trump won’t deliver it until Cuomo says he’s ready. Cuomo of course responded by threatening to sue. Biden has announced he is going to appoint a ‘supply commander’ to distribute the vaccine, which could be quite the point of leverage if one wanted to use it.

Remember February? When the incoming Chief of Staff Ronald Klain was one of many telling us things like “If you want to do something useful today, go to Chinatown — buy a meal, go shopping. The virus attacks humans, not people of any ethnicity/race. Fear is hurting Chinese-American owned businesses, baselessly. Let’s fight the disease AND let’s fight prejudice” or “We don’t have a #COVIDー19 epidemic in the US but we are starting to see a fear epidemic. Kudos to @NYCMayor (and others) for standing against that”? It’s important. We need to remember.

Surgeon General backs the ‘pandemic fatigue’ framing of why containment has collapsed. 

Marginal Revolution looks back at Economics and Epidemiology. All points made here seem right to me.

Minnesota Republicans test positive for Covid-19, alert Republicans but do not alert Democrats. SFailure to notify Democrats in the state legislature of Republican positive tests seems to also have happened in Pennsylvania. And Ohio. 

Thread in which a Very Serious Person points out things are bad, but also finds new evidence that antibodies are protective via a case study at a camp. Water is wet, sky blue, experts report. Which is valuable when many Very Serious People are continuously saying they are not sure.

More than 80 percent of prisoners in Carson City prison test positive for Covid-19. Reminder that a large percentage of prisoners have not even been convicted of a crime.

So this is weird, there’s a pattern of people who have had Covid-19 saying Coke no longer tastes good. How many lives will be saved? How many others improved? And presumably this wouldn’t only happen to Coke, so what is the general pattern? It would be great if this was a general awakening that artificial superstimuli were not good.

As one of those who fled New York City due to the pandemic, but who plans to return, I’ve been curious to get a good estimate for how many people left the city. My best guess was that a lot of people left Manhattan, with the majority of some richer areas emptying out, based on various anecdotal stories of empty buildings, but never found any hard data. The New York Post tallies the change of address forms and comes up with 244k such requests from March through July, versus 101k the previous year. So that would be a 143k change of request forms. The Post doesn’t then multiply by household size, but I think this is an oversight and should increase this by at least a factor of 2.4? Given that families have greater incentive to leave than singles, probably more. Rising crime certainly isn’t helping but that didn’t happen until after June and the spike was almost entirely done by then, so that’s more the Post showing its agenda than a major real cause. I do buy that the spike in departures from the Upper West Side zip codes housing new temporary homeless shelters is not a coincidence. 

“It is heartbreaking to see the politicization of sensible precautions. Think how quickly everyone agreed to take off their shoes in airports.” Because of one otherwise failed terrorist attack that resulted in zero casualties, and almost two decades later, in many places we are still doing it. Quite the example that got picked there. Maybe there is a reason people are skeptical when the government tells them what they need to do to stay safe. 

Conclusion

It’s awful out there. Stay safe. The vaccine is coming, but for most of us not until something like April or May.

Google estimates there are 18 million health care workers in the United States. We will have enough vaccine doses in 2020 to give to 20 million people. Adding in police and firemen would make it an even 20. Assuming that is indeed where we start, there won’t be much left for others, even the most vulnerable. There are 49.5 million Americans over the age of 65. 

Meanwhile, people are going to keep acting irresponsibly and doing stupid things. America is not going to do what it takes to get things under control in the next few months. The message here hasn’t changed. Time to buckle down. Will the medical system hold together? My guess is it probably will, but we are about to test it and find out. You don’t want to be part of that test. 

Don’t do stupid things.



Discuss

How do you evaluate whether a $500 donation to a project that you know well is a good idea?

19 ноября, 2020 - 15:42
Published on November 19, 2020 12:42 PM GMT

At habryka's recent office hours he made the point that it's really hard for a big grand-giving organization like the Long-Term Future Fund (LTFF) to fund certain projects. While good rationalist culture for example is valuable, funding culture by committee in a top-down way is problematic. The people with local knowledge of a cultural project are in a better position to judge whether or not a given rationalist culture project is valuable enough that it should be funded then a grant committe like that of the LTFF.

Given that LTFF and OpenPhil have access to a lot of financial captial these days for very rich individuals, the scare resources these days isn't primarily money but local knowledge about which projects are worthwhile. This suggests that individuals who have a small donation budget and access to a lot of local knowledge of individual projects are more effective when they make use of their scarce resources of local knowledge instead of donating to money to big grant making organizations. 

While giving money to big grant makers is straightforward, giving money to small projects is harder. How should an individual who has a yearly donation budget of <$10,000 go about evaluating local projects for donating an amount like $500 to them? 



Discuss

Some AI research areas and their relevance to existential safety

19 ноября, 2020 - 06:18
Published on November 19, 2020 3:18 AM GMT

Introduction

This post is an overview of a variety of AI research areas in terms of how much I think contributing to and/or learning from those areas might help reduce AI x-risk.  By research areas I mean “AI research topics that already have groups of people working on them and writing up their results”, as opposed to research “directions” in which I’d like to see these areas “move”. 

I formed these views mostly pursuant to writing AI Research Considerations for Human Existential Safety (ARCHES).  My hope is that my assessments in this post can be helpful to students and established AI researchers who are thinking about shifting into new research areas specifically with the goal of contributing to existential safety somehow.  In these assessments, I find it important to distinguish between the following types of value:

  • The helpfulness of the area to existential safety, which I think of as a function of what services are likely to be provided as a result of research contributions to the area, and whether those services will be helpful to existential safety, versus
  • The educational value of the area for thinking about existential safety, which I think of as a function of how much a researcher motivated by existential safety might become more effective through the process of familiarizing with or contributing to that area, usually by focusing on ways the area could be used in service of existential safety.
  • The neglect of the area at various times, which is a function of how much technical progress has been made in the area relative to how much I think is needed.

Importantly:

  • The helpfulness to existential safety scores do not assume that your contributions to this area would be used only for projects with existential safety as their mission.  n This can negatively impact the helpfulness of contributing to areas that are more likely to be used in ways that harm existential safety.
  • The educational value scores are not about the value of an existential-safety-motivated researcher teaching about the topic, but rather, learning about the topic.
  • The “neglect” scores are not measuring whether there is enough “buzz” around the topic, but rather, whether there has been adequate technical progress in it.

Below is a table of all the areas I considered for this post, along with their entirely subjective “scores” I’ve given them. The rest of this post can be viewed simply as an elaboration/explanation of this table:

Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Out of Distribution Robustness

Zero/
Single

1/10

4/10

5/10

3/10

1/10

Agent Foundations

Zero/
Single

3/10

8/10

9/10

8/10

7/10

Multi-agent RL

Zero/
Multi

2/10

6/10

5/10

4/10

0/10

Preference Learning

Single/
Single

1/10

4/10

5/10

1/10

0/10

Side-effect Minimization

Single/
Single

4/10

4/10

6/10

5/10

4/10

Human-Robot Interaction

Single/
Single

6/10

7/10

5/10

4/10

3/10

Interpretability in ML

Single/
Single

8/10

6/10

8/10

6/10

2/10

Fairness in ML

Multi/
Single

6/10

5/10

7/10

3/10

2/10

Computational Social Choice

Multi/
Single

7/10

7/10

7/10

5/10

4/10

Accountability in ML

Multi/
Multi

8/10

3/10

8/10

7/10

5/10

The research areas are ordered from least-socially-complex to most-socially-complex.  This roughly (though imperfectly) correlates with addressing existential safety problems of increasing importance and neglect, according to me.  Correspondingly, the second column categorizes each area according to the simplest human/AI social structure it applies to:

Zero/Single: Zero-human / Single-AI scenarios

Zero/Multi: Zero-human / Multi-AI scenarios

Single/Single: Single-human / Single-AI scenarios

Single/Multi: Single-human / Multi-AI scenarios

Multi/Single: Multi-human / Single-AI scenarios

Multi/Multi: Multi-human / Multi-AI scenarios

Epistemic status & caveats

I developed the views in this post mostly over the course of the two years I spent writing and thinking about AI Research Considerations for Human Existential Safety (ARCHES).  I make the following caveats:

  1. These views are my own, and while others may share them, I do not intend to speak in this post for any institution or group of which I am part.
  2. I am not an expert in Science, Technology, and Society (STS).  Historically there hasn’t been much focus on existential risk within STS, which is why I’m not citing much in the way of sources from STS.  However, from its name, STS as a discipline ought to be thinking a lot about AI x-risk.  I think there’s a reasonable chance of improvement on this axis over the next 2-3 years, but we’ll see.
  3. I made this post with essentially zero deference to the judgement of other researchers.  This is academically unusual, and prone to more variance in what ends up being expressed.  It might even be considered rude.  Nonetheless, I thought it might be valuable or at least interesting to stimulate conversation on this topic that is less filtered through patterns deference to others.  My hope is that people can become less inhibited in discussing these topics if my writing isn’t too “polished”.  I might also write a more defferent and polished version of this post someday, especially if nice debates arise from this one that I want to distill into a follow-up post.
Defining our objectives

In this post, I’m going to talk about AI existential safety as distinct from both AI alignment and AI safety as technical objectives.  A number of blogs seem to treat these terms as near-synonyms (e.g., LessWrong, the Alignment Forum), and I think that is a mistake, at least when it comes to guiding technical work for existential safety.  First I’ll define these terms, and then I’ll elaborate on why I think it’s important not to conflate them.

AI existential safety (definition)

In this post, AI existential safety means “preventing AI technology from posing risks to humanity that are comparable or greater than human extinction in terms of their moral significance.”  

This is a bit more general than the definition in ARCHES.  I believe this definition is fairly consistent with Bostrom’s usage of the term “existential risk”, and will have reasonable staying power as the term “AI existential safety” becomes more popular, because it directly addresses the question “What does this term have to do with existence?”.

AI safety (definition)

AI safety generally means getting AI systems to avoid risks, of which existential safety is an extreme special case with unique challenges.  This usage is consistent with normal everyday usage of the term “safety” (dictionary.com/browse/safety), and will have reasonable staying power as the term “AI safety” becomes (even) more popular.  AI safety includes safety for self-driving cars as well as for superintelligences, including issues that these topics do and do not share in common.

AI ethics (definition)

AI ethics generally refers to principles that AI developers and systems should follow.  The “should” here creates a space for debate, whereby many people and institutions can try to impose their values on what principles become accepted.  Often this means AI ethics discussions become debates about edge cases that people disagree about instead of collaborations on what they agree about.  On the other hand, if there is a principle that all or most debates about AI ethics would agree on or take as a premise, that principle becomes somewhat easier to enforce.

AI governance (definition)

AI governance generally refers to identifying and enforcing norms for AI developers and AI systems themselves to follow.  The question of which principles should be enforced often opens up debates about safety and ethics.  Governance debates are a bit more action-oriented than purely ethical debates, such that more effort is focussed on enforcing agreeable norms relative to debating about disagreeable norms.  Thus, AI governance, as an area of human discourse, is engaged with the problem of aligning the development and deployment of AI technologies with broadly agreeable human values.  Whether AI governance is engaged with this problem well or poorly is, of course, a matter of debate.

AI alignment (definition)

AI alignment usually means “Getting an AI system to {try | succeed} to do what a human person or institution wants it to do”. The inclusion of “try” or “succeed” respectively creates a distinction between intent alignment and impact alignment.   This usage is consistent with normal everyday usage of the term “alignment” (dictionary.com/browse/alignment) as used to refer to alignment of values between agents, and is therefore relatively unlikely to undergo definition-drift as the term “AI alignment” becomes more popular.  For instance, 

  • (2002) “Alignment” was used this way in 2002 by Daniel Shapiro and Ross Shachter, in their AAAI conference paper User/Agent Value Alignment, the first paper to introduce the concept of alignment into AI research.  This work was not motivated by existential safety as far as I know, and is not cited in any of the more recent literature on “AI alignment” motivated by existential safety, though I think it got off to a reasonably good start in defining user/agent value alignment.
  • (2014) “Alignment” was used this way in the technical problems described by Nate Soares and Benya Fallenstein in Aligning Superintelligence with Human Interests: A Technical Research Agenda.  While the authors’ motivation is clearly to serve the interests of all humanity, the technical problems outlined are all about impact alignment in my opinion, with the possible exception of what they call “Vingean Reflection” (which is necessary for a subagent of society thinking about society).
  • (2018) “Alignment” is used this way by Paul Christiano in his post Clarifying AI Alignment, which is focussed on intent alignment.

A broader meaning of “AI alignment” that is not used here

There is another, different usage of “AI alignment”, which refers to ensuring that AI technology is used and developed in ways that are broadly aligned with human values.  I think this is an important objective that is deserving of a name to call more technical attention to it, and perhaps this is the spirit in which the “AI alignment forum” is so-titled.  However, the term “AI alignment” already has poor staying-power for referring to this objective in technical discourse outside of a relatively cloistered community, for two reasons:

  1. As described above, “alignment” already has a relatively clear technical meaning that AI researchers have already gravitated towards interpreting “alignment” to mean, that is also consistent with natural language meaning of the term “alignment”, and
  2. AI governance, at least in democratic states, is basically already about this broader problem.  If one wishes to talk about AI governance that is beneficial to most or all humans, “humanitarian AI governance” is much clearer and more likely to stick than “AI alignment”.

Perhaps “global alignment”, “civilizational alignment”, or “universal AI alignment” would make sense to distinguish this concept from the narrower meaning that alignment usually takes on in technical settings.  In any case, for the duration of this post, I am using “alignment” to refer to its narrower, technically prevalent meaning.

Distinguishing our objectives

As promised, I will now elaborate on why it’s important not to conflate the objectives above.  Some people might feel that these arguments are about how important these concepts are, but I’m mainly trying to argue about how importantly different they are.  By analogy: while knives and forks are both important tools for dining, they are not usable interchangeably.

Safety vs existential safety (distinction)

“Safety” is not robustly usable as a synonym for “existential safety”.  It is true that AI existential safety is literally a special case of AI safety, for the simple reason that avoiding existential risk is a special case of avoiding risk.  And, it may seem useful for coalition-building purposes to unite people under the phrase “AI safety” as a broadly agreeable objective.  However, I think we should avoid declaring to ourselves or others that “AI safety” will or should always be interpreted as meaning “AI existential safety”, for several reasons:

  1. Using these terms as synonyms will have very little staying power as AI safety research becomes (even) more popular.
  2. AI existential safety is deserving of direct attention that is not filtered through a lens of discourse that confuses it with self-driving car safety.
  3. AI safety in general is deserving of attention as a broadly agreeable principle around which people can form alliances and share ideas.

Alignment vs existential safety (distinction)

Some people tend to use these terms as as near-synonyms, however, I think this usage has some important problems:

  1. Using “alignment” and “existential safety” as synonyms will have poor staying-power as the term “AI alignment” becomes more popular.  Conflating them will offend both the people who want to talk about existential safety (because they think it is more important and “obviously what we should be talking about”) as well as the people who want to talk about AI alignment (because they think it is more important and “obviously what we should be talking about”).
  2. AI alignment refers to a cluster of technically well-defined problems that are important to work on for numerous reasons, and deserving of a name that does not secretly mean “preventing human extinction” or similar.
  3. AI existential safety (I claim) also refers to a technically well-definable problem that is important to work on, and deserving of a name that does not secretly mean “getting systems to do what the user is asking”.
  4. AI alignment is not trivially helpful to existential safety, and efforts to make it helpful require a certain amount of societal-scale steering to guide them.  If we treat these terms as synonyms, we impoverish our collective awareness of ways in which AI alignment solutions could pose novel problems for existential safety.

This last point gets its own section.

AI alignment is inadequate for AI existential safety

Around 50% of my motivation for writing this post is my concern that progress in AI alignment, which is usually focused on “single/single” interactions (i.e., alignment for a single human stakeholder and a single AI system), is inadequate for ensuring existential safety for advancing AI technologies.  Indeed, among problems I can currently see in the world that I might have some ability to influence, addressing this issue is currently one of my top priorities.

The reason for my concern here is pretty simple to state, via the following two diagrams:

Of course, understanding and designing useful and modular single/single interactions is a good first step toward understanding multi/multi interactions, and many people (including myself) who think about AI alignment are thinking about it as a stepping stone to understanding the broader societal-scale objective of ensuring existential safety.  

However, this pattern mirrors the situation AI capabilities research was following before safety, ethics, and alignment began surging in popularity.  Consider that most AI (construed to include ML) researchers are developing AI capabilities as stepping stones toward understanding and deploying those capabilities in safe and value-aligned applications for human users.  Despite this, over the past decade there has been a growing sense among AI researchers that capabilities research has not been sufficiently forward-looking in terms of anticipating its role in society, including the need for safety, ethics, and alignment work.  This general concern can be seen emanating not only from AGI-safety-oriented groups like those at DeepMind, OpenAI, MIRI, and in academia, but also AI-ethics-oriented groups as well, such as the ACM Future of Computing Academy:

https://acm-fca.org/2018/03/29/negativeimpacts/

Just as folks interested in AI safety and ethics needed to start thinking beyond capabilities, folks interested in AI existential safety need to start thinking beyond alignment.  The next section describes what I think this means for technical work.

Anticipating, legitimizing and fulfilling governance demands

The main way I can see present-day technical research benefitting existential safety is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise over the next 10-30 years.  In short, there often needs to be some amount of traction on a technical area before it’s politically viable for governing bodies to demand that institutions apply and improve upon solutions in those areas.  Here’s what I mean in more detail:

By governance demands, I’m referring to social and political pressures to ensure AI technologies will produce or avoid certain societal-scale effects.  Governance demands include pressures like “AI technology should be fair”, “AI technology should not degrade civic integrity”, or “AI technology should not lead to human extinction.”  For instance, Twitter’s recent public decision to maintain a civic integrity policy can be viewed as a response to governance demand from its own employees and surrounding civic society.

Governance demand is distinct from consumer demand, and it yields a different kind of transaction when the demand is met.  In particular, when a tech company fulfills a governance demand, the company legitimizes that demand by providing evidence that it is possible to fulfill.  This might require the company to break ranks with other technology companies who deny that the demand is technologically achievable.  

By legitimizing governance demands, I mean making it easier to establish common knowledge that a governance demand is likely to become a legal or professional standard.  But how can technical research legitimize demands from a non-technical audience?

The answer is to genuinely demonstrate in advance that the governance demands are feasible to meet.  Passing a given professional standard or legislation usually requires the demands in it to be “reasonable” in terms of appearing to be technologically achievable.  Thus, computer scientists can help legitimize a governance demand by anticipating the demand in advance, and beginning to publish solutions for it.  My position here is not that the solutions should be exaggerated in their completeness, even if that will increase ‘legitimacy’; I argue only that we should focus energy on finding solutions that, if communicated broadly and truthfully, will genuinely raise confidence that important governance demands are feasible.  (Without this ethic against exaggeration, common knowledge in the legitimacy of legitimacy itself is degraded, which is bad, so we shouldn’t exaggerate.)

This kind of work can make a big difference to the future.  If the algorithmic techniques needed to meet a given governance demand are 10 years of research away from discovery---as opposed to just 1 year---then it’s easier for large companies to intentionally or inadvertently maintain a narrative that the demand is unfulfillable and therefore illegitimate.  Conversely, if the algorithmic techniques to fulfill the demand already exist, it’s a bit harder (though still possible) to deny the legitimacy of the demand.  Thus, CS researchers can legitimize certain demands in advance, by beginning to prepare solutions for them.

I think this is the most important kind of work a computer scientist can do in service of existential safety.  For instance, I view ML fairness and interpretability research as responding to existing governance demand, which legitimizes the cause of AI governance itself, which is hugely important.  Furthermore, I view computational social choice research as addressing an upcoming governance demand, which is even more important.

My hope in writing this post is that some of the readers here will start trying to anticipate AI governance demands that will arise over the next 10-30 years.  In doing so, we can begin to think about technical problems and solutions that could genuinely legitimize and fulfill those demands when they arise, with a focus on demands whose fulfillment can help stabilize society in ways that mitigate existential risks.

Research Areas

Alright, let’s talk about some research!

Out of distribution robustness (OODR)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Out of Distribution Robustness

Zero/Single

1/10

4/10

5/10

3/10

1/10

This area of research is concerned with avoiding risks that arise from systems interacting with contexts and environments that are changing significantly over time, such as from training time to testing time, from testing time to deployment time, or from controlled deployments to uncontrolled deployments.

OODR (un)helpfulness to existential safety:  

Contributions to OODR research are not particularly helpful to existential safety in my opinion, for a combination of two reasons:

  1. Progress in OODR will mostly be used to help roll out more AI technologies into active deployment more quickly, and
  2. Research in this area usually does not involve deep or lengthy reflections about the structure of society and human values and interactions, which I think makes this field sort of collectively blind to the consequences of the technologies it will help build.

I think this area would be more helpful if it were more attentive to the structure of the multi-agent context that AI systems will be in.  Professor Tom Dietterich has made some attempts to shift thinking on robustness to be more attentive to the structure of robust human institutions, which I think is a good step:

Unfortunately, the above paper has only 8 citations at the time of writing (very little for AI/ML), and there does not seem to be much else in the way of publications that address societal-scale or even institutional-scale robustness.

OODR educational value:

Studying and contributing to OODR research is of moderate educational value for people thinking about x-risk, in my opinion.  Speaking for myself, it helps me think about how society as a whole is receiving a changing distribution of inputs from its environment (which society itself is creating).  As human society changes, the inputs to AI technologies will change, and we want the existence of human society to be robust to those changes.  I don’t think most researchers in this area think about it in that way, but that doesn’t mean you can’t.

OODR neglect:  

Robustness to changing environments has never been a particularly neglected concept in the history of automation, and it is not likely to ever become neglected, because myopic commercial incentives push so strongly in favor of progress on it.  Specifically, robustness of AI systems is essential for tech companies to be able to roll out AI-based products and services, so there is no lack of incentive for the tech industry to work on robustness.  In reinforcement learning specifically, robustness has been somewhat neglected, although less so now than in 2015, partly thanks to AI safety (broadly construed) taking off.  I think by 2030 this area will be even less neglected, even in RL. 

OODR exemplars:

Recent exemplars of high value to existential safety, according to me:

Recent exemplars of high educational value, according to me:

Agent foundations (AF)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Agent Foundations

Zero/Single

3/10

8/10

9/10

8/10

7/10

This area is concerned with developing and investigating fundamental definitions and theorems pertaining to the concept of agency.  This often includes work in areas such as decision theory, game theory, and bounded rationality.  I’m going to write more for this section because I know more about it and think it’s pretty important to “get right”.

AF (un)helpfulness to existential safety:  

Contributions to agent foundations research are key to the foundations of AI safety and ethics, but are also potentially misusable.  Thus, arbitrary contributions to this area are not necessarily helpful, while targeted contributions aimed at addressing real-world ethical problems could be extremely helpful.  Here is why I believe this:

I view agent foundations work as looking very closely at the fundamental building blocks of society, i.e., agents and their decisions.  It’s important to understand agents and their basic operations well, because we’re probably going to produce (or allow) a very large number of them to exist/occur.  For instance, imagine any of the following AI-related operations happening at least 1,000,000 times (a modest number given the current world population):

  1. A human being delegates a task to an AI system to perform, thereby ceding some control over the world to the AI system.
  2. An AI system makes a decision that might yield important consequences for society, and acts on it.
  3. A company deploys an AI system into a new context where it might have important side effects.
  4. An AI system builds or upgrades another AI system (possibly itself) and deploys it.
  5. An AI system interacts with another AI system, possibly yielding externalities for society.
  6. An hour passes where AI technology is exerting more control over the state of the Earth than humans are.

In order to be just 55% sure that the result of these 1,000,000 operations will be safe, on average (on a log scale) we need to be at least 99.99994% sure that each instance of the operation is safe.  Similarly, for any accumulable quantity of “societal destruction” (such as risk, pollution, or resource exhaustion), in order to be sure that these operations will not yield “100 units” of societal destruction, we need each operation on average to produce at most “0.00001 units” of destruction.*

(*Would-be-footnote: Incidentally, the main reason I think OODR research is educationally valuable is that it can eventually help with applying agent foundations research to societal-scale safety.  Specifically: how can we know if one of the operations (a)-(f) above is safe to perform 1,000,000 times, given that it was safe the first 1,000 times we applied it in a controlled setting, but the setting is changing over time?  This is a special case of an OODR question.)

Unfortunately, understanding the building blocks of society can also allow the creation of potent societal forces that would harm society.  For instance, understanding human decision-making extremely well might help advertising companies to control public opinion to an unreasonable degree (which arguably has already happened, even with today’s rudimentary agent models), or it might enable the construction of a super-decision-making system that is misaligned with human existence.   

That said, I don’t think this means you have to be super careful about information security around agent foundations work, because in general it’s not easy to communicate fundamental theoretical results in research, let alone by accident. 

Rather, my recommendation for maximizing the positive value of work in this area is to apply the insights you get from it to areas that make it easier to represent societal-scale moral values in AI.  E.g., I think applications of agent foundations  results to interpretability, fairness, computational social choice, and accountability are probably net good, whereas applications to speed up arbitrary ML capabilities are not obviously good.

AF educational value:

Studying and contributing to agent foundations research has the highest educational value for thinking about x-risk among the research areas listed here, in my opinion.  The reason is that agent foundations research does the best job of questioning potentially faulty assumptions underpinning our approach to existential safety.  In particular, I think our understanding of how to safely integrate AI capabilities with society is increasingly contingent on our understanding of agent foundations work as defining the building blocks of society.

AF neglect:

This area is extremely neglected in my opinion.  I think around 50% of the progress in this area, worldwide, happens at MIRI, which has a relatively small staff of agent foundations researchers.  While MIRI has grown over the past 5 years, agent foundations work in academia hasn’t grown much, and I don’t expect it to grow much by default (though perhaps posts like this might change that default).

AF exemplars:

Below are recent exemplars of agent foundations work that I think is of relatively high value to existential safety, mostly via their educational value.  The work is mostly from 

Multi-agent reinforcement learning (MARL)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Multi-agent RL

Zero/Multi

2/10

6/10

5/10

4/10

0/10

MARL is concerned with training multiple agents to interact with each other and solve problems using reinforcement learning.  There are a few varieties to be aware of:

  • Cooperative vs competitive vs adversarial tasks: do the agents all share a single objective, or separate objectives that are imperfectly aligned, or completely opposed (zero-sum) objective?
  • Centralized training vs decentralized training: Is there a centralized process that observes the agents and controls how they learn, or is there a separate (private) learning process for each agent?
  • Communicative vs non-communicative: iIs there a special channel the agents can use to generate observations for each other that are otherwise inconsequential, or are all observations generated in the course of consequential actions?

I think the most interesting MARL research involves decentralized training for competitive objectives in communicative environments, because this set-up is the most representative of how AI systems from diverse human institutions are likely to interact.

MARL (un)helpfulness to existential safety: 

Contributions to MARL research are mostly not very helpful to existential safety in my opinion, because MARL’s most likely use case will be to help companies to deploy fleets of rapidly interacting machines that might pose risks to human society.  The MARL projects with the greatest potential to help are probably those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment, because of its potential to minimize destructive conflicts between fleets of AI systems that cause collateral damage to humanity.  That said, even this area of research risks making it easier for fleets of machines to cooperate and/or collude at the exclusion of humans, increasing the risk of humans becoming gradually disenfranchised and perhaps replaced entirely by machines that are better and faster at cooperation than humans.

MARL educational value: 

I think MARL has a high educational value, because it helps researchers to observe directly how difficult it is to get multi-agent systems to behave well.  I think most of the existential risk from AI over the next decades and centuries comes from the incredible complexity of behaviors possible from multi-agent systems, and from underestimating that complexity before it takes hold in the real world and produces unexpected negative side effects for humanity.

MARL neglect: 

MARL was somewhat neglected 5 years ago, but has picked up a lot.  I suspect MARL will keep growing in popularity because of its value as a source of curricula for learning algorithms.  I don’t think it is likely to become more civic-minded, unless arguments along the lines of this post lead to a shift of thinking in the field.

MARL exemplars:

Recent exemplars of high educational value, according to me:

Preference learning (PL)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Preference Learning

Single/Single

1/10

4/10

5/10

1/10

0/10

This area is concerned with learning about human preferences in a form usable for  guiding the policies of artificial agents.  In an RL (reinforcement learning) setting, preference learning is often called reward learning, because the learned preferences take the form or a reward function for training an RL system.

PL (un)helpfulness to existential safety:

Contributions to preference learning are not particularly helpful to existential safety in my opinion, because their most likely use case is for modeling human consumers just well enough to create products they want to use and/or advertisements they want to click on.  Such advancements will be helpful to rolling out usable tech products and platforms more quickly, but not particularly helpful to existential safety.* 

Preference learning is of course helpful to AI alignment, i.e., the problem of getting an AI system to do something a human wants.  Please refer back to the sections on Defining our objectives and Distinguishing our objectives for an elaboration of how this is not the same as AI existential safety.  In any case, I see AI alignment in turn as having two main potential applications to existential safety:

  1. AI alignment is useful as a metaphor for thinking about how to align the global effects of AI technology with human existence, a major concern for AI governance at a global scale, and
  2. AI alignment solutions could be used directly to govern powerful AI technologies designed specifically to make the world safer.

While many researchers interested in AI alignment are motivated by (a) or (b), I find these pathways of impact problematic.  Specifically, 

  • (a) elides the complexities of multi-agent interactions I think are likely to arise in most realistic futures, and I think the most difficult to resolve existential risks arise from those interactions.
  • (b) is essentially aiming to take over the world in the name of making it safer, which is not generally considered the kind of thing we should be encouraging lots of people to do.

Moreover, I believe contributions to AI alignment are also generally unhelpful to existential safety, for the same reasons as preference learning.  Specifically, progress in AI alignment hastens the pace at which high-powered AI systems will be rolled out into active deployment, shortening society’s headway for establishing international treaties governing the use of AI technologies.

Thus, the existential safety value of AI alignment research in its current technical formulations—and preference learning as a subproblem of it—remains educational in my view.*

(*Would-be-footnote: I hope no one will be too offended by this view.  I did have some trepidation about expressing it on the “alignment’ forum, but I think I should voice these concerns anyway, for the following reason. In 2011 after some months of reflection on a presentation by Andrew Ng, I came to believe that that deep learning was probably going to take off, and that, contrary to Ng’s opinion, this would trigger a need for a lot of AI alignment work in order to make the technology safe.  This feeling of worry is what triggered me to cofound CFAR and start helping to build a community that thinks more critically about the future.  I currently have a similar feeling of worry toward preference learning and AI alignment, i.e., that it is going to take off and trigger a need for a lot more “AI civility” work that seems redundant or “too soon to think about” for a lot of AI alignment researchers today, the same way that AI researchers said it was “too soon to think about” AI alignment.  To the extent that I think I was right to be worried about AI progress kicking off in the decade following 2011, I think I’m right to be worried again now about preference learning and AI alignment (in its narrow and socially-simplistic technical formulations) taking off in the 2020’s and 2030’s.)

PL educational value: 

Studying and making contributions to preference learning is of moderate educational value for thinking about existential safety in my opinion.  The reason is this: if we want machines to respect human preferences—including our preference to continue existing—we may need powerful machine intelligences to understand our preferences in a form they can act on.  Of course, being understood by a powerful machine is not necessarily a good thing.  But if the machine is going to do good things for you, it will probably need to understand what “good for you” means.  In other words, understanding preference learning can help with AI alignment research, which can help with existential safety.  And if existential safety is your goal, you can try to target your use of preference learning concepts and methods toward that goal.

PL neglect: 

Preference learning has always been crucial to the advertising industry, and as such it has not been neglected in recent years.  For the same reason, it’s also not likely to become neglected.  Its application to reinforcement learning is somewhat new, however, because until recently there was much less active research in reinforcement learning.  In other words, recent interest in reward learning is mainly a function of increased interest in reinforcement learning, rather than increased interest in preference learning.  If new learning paradigms supersede reinforcement learning, preference learning for those paradigms will not be far behind.

(This is not a popular opinion; I apologize if I have offended anyone who believes that progress in preference learning will reduce existential risk, and I certainly welcome debate on the topic.)

PL exemplars:

Recent works of significant educational value, according to me:

Human-robot interaction (HRI)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Human-Robot Interaction

Single/Single

6/10

7/10

5/10

4/10

3/10

HRI research is concerned with designing and optimizing patterns of interaction between humans and machines—usually actual physical robots, but not always.

HRI helpfulness to existential safety:

On net, I think AI/ML would be better for the world if most of its researchers pivoted from general AI/ML into HRI, simply because it would force more AI/ML researchers to more frequently think about real-life humans and their desires, values, and vulnerabilities.  Moreover, I think it reasonable (as in, >1% likely) that such a pivot might actually happen if, say, 100 more researchers make this their goal.

For this reason, I think contributions to this area today are pretty solidly good for existential safety, although not perfectly so: HRI research can also be used to deceive humans, which can degrade societal-scale honesty norms, and I’ve seen HRI research targeting precisely that.  However, my model of readers of this blog is that they’d be unlikely to contribute to those parts of HRI research, such that I feel pretty solidly about recommending contributions to HRI.

HRI educational value:

I think HRI work is of unusually high educational value for thinking about existential safety, even among other topics in this post.  The reason is that, by working with robots, HRI work is forced to grapple with high-dimensional and continuous state spaces and action spaces that are too complex for the human subjects involved to consciously model.  This, to me, crucially mirrors the relationship between future AI technology and human society: humanity, collectively, will likely be unable to consciously grasp the full breadth of states and actions that our AI technologies are transforming and undertaking for us.  I think many AI researchers outside of robotics are mostly blind to this difficulty, which on its own is an argument in favor of more AI researchers working in robotics.  The beauty of HRI is that it also explicitly and continually thinks about real human beings, which I think is an important mental skill to practice if you want to protect humanity collectively from existential disasters.

HRI neglect: 

A neglect score for this area was uniquely difficult for me to specify.  On one hand, HRI is a relatively established and vibrant area of research compared with some of the more nascent areas covered in this post.  On the other hand, as mentioned, I’d eventually like to see the entirety of AI/ML as a field pivoting toward HRI work, which means it is still very neglected compared to where I want to see it.  Furthermore, I think such a pivot is actually reasonable to achieve over the next 20-30 years.  Further still, I think industrial incentives might eventually support this pivot, perhaps on a similar timescale.  

So: if the main reason you care about neglect is that you are looking to produce a strong founder effect, you should probably discount my numerical neglect scores for this area, given that it’s not particularly “small” on an absolute scale compared to the other areas here.  By that metric, I’d have given something more like {2015:4/10; 2020:3/10; 2030:2/10}.  On the other hand, if you’re an AI/ML researcher looking to “do the right thing” by switching to an area that pretty much everyone should switch into, you definitely have my “doing the right thing” assessment if you switch into this area, which is why I’ve given it somewhat higher neglect scores.

HRI exemplars:

Side-effect minimization (SEM)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Side-effect Minimization

Single/Single

4/10

4/10

6/10

5/10

4/10

SEM research is concerned with developing domain-general methods for making AI systems less likely to produce side effects, especially negative side effects, in the course of pursuing an objective or task.

SEM helpfulness to existential safety:

I think this area has two obvious applications to safety-in-general:

  1. (“accidents”) preventing an AI agent from “messing up” when performing a task for its primary stakeholder(s), and
  2. (“externalities”) preventing an AI system from generating problems for persons other than its primary stakeholders, either
    1. (“unilateral externalities”) when the system generates externalities through its unilateral actions, or
    2. (“multilateral externalities”) when the externalities are generated through the interaction of an AI system with another entity, such as a non-stakeholder or another AI system.

I think the application to externalities is more important and valuable than the application to accidents, because I think externalities are (even) harder to detect and avoid than accidents.  Moreover, I think multilateral externalities are (even!) harder to avoid than unilateral externalities.  

Currently, SEM research is focussed mostly on accidents, which is why I’ve only given it a moderate score on the helpfulness scale.  Conceptually, it does make sense to focus on accidents first, then unilateral externalities, and then multilateral externalities, because of the increasing difficulty in addressing them.  

However, the need to address multilateral externalities will arise very quickly after unilateral externalities are addressed well enough to roll out legally admissible products, because most of our legal systems have an easier time defining and punishing negative outcomes that have a responsible party.  I don’t believe this is a quirk of human legal systems: when two imperfectly aligned agents interact, they complexify each other’s environment in a way that consumes more cognitive resources than interacting with a non-agentic environment. (This is why MARL and self-play are seen as powerful curricula for learning.)  Thus, there is less cognitive “slack” to think about non-stakeholders in a multi-agent setting than in a single-agent setting.  

For this reason, I think work that makes it easy for AI systems and their designers to achieve common knowledge around how the systems should avoid producing externalities is very valuable.

SEM educational value:

I think SEM research thus far is of moderate educational value, mainly just to kickstart your thinking about side effects.

SEM neglect:

Domain-general side-effect minimization for AI is a relatively new area of research, and is still somewhat neglected.  Moreover, I suspect it will remain neglected, because of the aforementioned tendency for our legal system to pay too little attention to multilateral externalities, a key source of negative side effects for society.

SEM exemplars:

Recent exemplars of value to existential safety, mostly via starting to think about the generalized concept of side effects at all:

Interpretability in ML (IntML)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Interpretability in ML

Single/Single

8/10

6/10

8/10

6/10

2/10

Interpretability research is concerned with making the reasoning and decisions of AI systems more interpretable to humans.  Interpretability is closely related to transparency and explainability.  Not all authors treat these three concepts as distinct; however, I think when useful distinction is drawn between between them, it often looks something like this:

  • a system is “transparent” if it is easy for human users or developers to observe and track important parameters of its internal state;
  • a system is “explainable” if useful explanations of its reasoning can be produced after the fact.
  • a system is “interpretable” if its reasoning is structured in a manner that does not require additional engineering work to produce accurate human-legible explanations.

In other words, interpretable systems are systems with the property that transparency is adequate for explainability: when we look inside them, we find they are structured in a manner that does not require much additional explanation.  I see Professor Cynthia Rudin as the primary advocate for this distinguished notion of interpretability, and I find it to be an important concept to distinguish.

IntML helpfulness to existential safety:

I think interpretability research contributes to existential safety in a fairly direct way on the margin today.  Specifically, progress in interpretability will

  • decrease the degree to which human AI developers will end up misjudging the properties of the systems they build,
  • increase the degree to which systems and their designers can be held accountable for the principles those systems embody, perhaps even before those principles have a chance to manifest in significant negative societal-scale consequences, and
  • potentially increase the degree to which competing institutions and nations can establish cooperation and international treaties governing AI-heavy operations.

I believe this last point may turn out to be the most important application of interpretability work.  Specifically, I think institutions that use a lot of AI technology (including but not limited to powerful autonomous AI systems) could become opaque to one another in a manner that hinders cooperation between and governance of those systems.  By contrast, a degree of transparency between entities can facilitate cooperative behavior, a phenomenon which has been borne out in some of the agent foundations work listed above, specifically:

In other words, I think interpretability research can enable technologies that legitimize and fulfill AI governance demands, narrowing the gap between what policy makers will wish for and what technologists will agree is possible.

IntML educational value:

I think interpretability research is of moderately high educational value for thinking about existential safety, because some research in this area is somewhat surprising in terms of showing ways to maintain interpretability without sacrificing much in the way of performance.  This can change our expectations about how society can and should be structured to maintain existential safety, by changing the degree of interpretability we can and should expect from AI-heavy institutions and systems.

IntML neglect:

I think IntML is fairly neglected today relative to its value.  However, over the coming decade, I think there will be opportunities for companies to speed up their development workflows by improving the interpretability of systems to their developers.  In fact, I think for many companies interpretability is going to be a crucial bottleneck for advancing their product development.  These developments won’t be my favorite applications of interpretability, and I might eventually become less excited about contributions to interpretability if all of the work seems oriented on commercial or militarized objectives instead of civic responsibilities.  But in any case, I think getting involved with interpretability research today is a pretty robustly safe and valuable career move for any up and coming AI researchers, especially if they do their work with an eye toward existential safety.

IntML exemplars:

Recent exemplars of high value to existential safety:

Fairness in ML (FairML)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Fairness in ML

Multie/Single

6/10

5/10

7/10

3/10

2/10

Fairness research in machine learning is typically concerned with altering or constraining learning systems to make sure their decisions are “fair” according to a variety of definitions of fairness.

FairML helpfulness to existential safety:

My hope for FairML as a field contributing to existential safety is threefold:

  1. (societal-scale thinking) Fairness comprises one or more human values that exist in service of society as a whole, and which are currently difficult to encode algorithmically, especially in a form that will garner unchallenged consensus.  Getting more researchers to think in the framing “How do I encode a value that will serve society as a whole in a broadly agreeable way” is good for big-picture thinking and hence for society-scale safety problems.
  2. (social context awareness) FairML gets researchers to “take off their blinders” to the complexity of society surrounding them and their inventions.  I think this trend is gradually giving AI/ML researchers a greater sense of social and civic responsibility, which I think reduces existential risk from AI/ML.
  3. (sensitivity to unfair uses of power)  Simply put, it’s unfair to place all of humanity at risk without giving all of humanity a chance to weigh in on that risk.  More focus within CS on fairness as a human value could help alleviate this risk.  Specifically, fairness debates often trigger redistributions of resources in a more equitable manner, thus working against the over-centralization of power within a given group.  I have some hope that fairness considerations will work against the premature deployment of powerful AI/ML systems that would lead to the hyper-centralizing power over the world (and hence would pose acute global risks by being a single point of failure).
  4. (Fulfilling and legitimizing governance demands) Fairness research can be used to fulfill and legitimize AI governance demands, narrowing the gap between what policy makers wish for and what technologists agree is possible.  This process makes AI as a field more amenable to governance, thereby improving existential safety.

FairML educational value:

I think FairML research is of moderate educational value for thinking about existential safety, mainly via the opportunities it creates for thinking about the points in the section on helpfulness above.  If the field were more mature, I would assign it a higher educational value.  

I should also flag that most work in FairML has not been done with existential safety in mind.  Thus, I’m very much hoping that more people who care about existential safety will learn about FairML and begin thinking about how principles of fairness can be leveraged to ensure societal-scale safety in the not-too-distant future.

FairML neglect:

FairML is not a particularly neglected area at the moment because there is a lot of excitement about it, and I think it will continue to grow.  However, it was relatively neglected 5 years ago, so there is still a lot of room for new ideas in the space.  Also, as mentioned, thinking in FairML is not particularly oriented toward existential safety, so I think research on fairness in service of societal-scale safety is quite neglected in my opinion.  

FairML exemplars:

Recent exemplars of high value to existential safety, mostly via attention to the problem of difficult-to-codify societal-scale values:

Computational Social Choice (CSC)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

2015 Neglect

2020 Neglect

2030 Neglect

Computational Social Choice

Multi/Single

7/10

7/10

7/10

5/10

4/10

Computational social choice research is concerned with using algorithms to model and implement group-level decisions using individual-scale information and behavior as inputs.  I view CSC as a natural next step in the evolution of social choice theory that is more attentive to the implementation details of both agents and their environments.  In my conception, CSC comprises subservient topics in mechanism design and algorithmic game theory, even if researchers in those areas don’t consider themselves to be working in computational social choice.

CSC helpfulness to existential risk:

In short, computational social choice research will be necessary to legitimize and fulfill governance demands for technology companies (automated and human-run companies alike) to ensure AI technologies are beneficial to and controllable by human society.  The process of succeeding or failing to legitimize such demands will lead to improving and refining what I like to call the algorithmic social contract: whatever broadly agreeable set of principles (if any) algorithms are expected to obey in relation to human society.

In 2018, I considered writing an article drawing more attention to the importance of developing an algorithmic social contract, but found this point had already been quite eloquently by Iyad Rahwan in the following paper, which I highly recommend:

Computational social choice methods in their current form are certainly far from providing adequate and complete formulations of an algorithmic social contract.   See the following article for arguments against tunnel-vision on computational social choice as a complete solution to societal-scale AI ethics:

Notwithstanding this concern, what follows is a somewhat detailed forecast of how I think computational social choice research will still have a crucial role to play in developing the algorithmic social contract throughout the development of individually-alignable transformative AI technologies, which I’ll call “the alignment revolution”.

First, once technology companies begin to develop individually-alignable transformative AI capabilities, there will be strong economic and social and political pressures for its developers to sell those capabilities rather than hoarding them.  Specifically:

  • (economic pressure) Selling capabilities immediately garners resources in the form of money and information from the purchasers and users of the capabilities;
  • (social pressure) Hoarding capabilities could be seen as anti-social relative to distributing them more broadly through sales or free services;
  • (sociopolitical pressure) Selling capabilities allows society to become aware that those capabilities exist, enabling a smoother transition to embracing those capabilities.  This creates a broadly agreeable concrete moral argument against capability hoarding, which could become politically relevant.
  • (political pressure) Political elites will be happier if technical elites “share” their capabilities with the rest of the rest of the economy rather than hoarding them.

Second, for the above reasons, I expect individually-alignable transformative AI capabilities to be distributed fairly broadly once they exist, creating an “alignment revolution” arising from those capabilities.  (It’s possible I’m wrong about this, and for that I reason I also welcome research on how to align non-distributed alignment capabilities; that’s just not where most of my chips lie, and not where the rest of this argument will focus.)

Third, unless humanity collectively works very hard to maintain a degree of simplicity and legibility in the overall structure of society*, this “alignment revolution” will greatly complexify our environment to a point of much greater incomprehensibility and illegibility than even today’s world.  This, in turn, will impoverish humanity’s collective ability to keep abreast of important international developments, as well as our ability to hold the international economy accountable for maintaining our happiness and existence.

(*Would-be-footnote: I have some reasons to believe that perhaps we can and should work harder to make the global structure of society more legible and accountable to human wellbeing, but that is a topic for another article.)

Fourth, in such a world, algorithms will be needed to hold the aggregate global behavior of algorithms accountable to human wellbeing, because things will be happening too quickly for humans to monitor.  In short, an “algorithmic government” will be needed to govern “algorithmic society”.  Some might argue this is not strictly unnecessary: in the absence of a mathematically codified algorithmic social contract, humans could in principle coordinate to cease or slow down the use of these powerful new alignment technologies, in order to give ourselves more time to adjust to and govern their use.  However, for all our successes in innovating laws and governments, I do not believe current human legal norms are quite developed enough to stably manage a global economy empowered with individually-alignable transformative AI capabilities.  

Fifth, I do think our current global legal norms are much better than what many  computer scientists naively proffer as replacements for them.  My hope is that more resources and influence will slowly flow toward the areas of computer science most in touch with the nuances and complexities of codifying important societal-scale values.  In my opinion, this work is mostly concentrated in and around computational social choice, to some extent mechanism design, and morally adjacent yet conceptually nascent areas of ML research such as fairness and interpretability.  

While there is currently an increasing flurry of (well-deserved) activity in fairness and interpretability research, computational social choice is somewhat more mature, and has a lot for these younger fields to learn from.  This is why I think CSC work is crucial to existential safety: it is the area of computer science most tailored to evoke reflection on the global structure of society, and the most mature in doing so.

So what does all this have to do with existential safety?  Unfortunately, while CSC is significantly more mature as a field than interpretable ML or fair ML, it is still far from ready to fulfill governance demand at the ever-increasing speed and scale needed to ensure existential safety in the wake of individually-alignable transformative AI technologies.  Moreover, I think punting these questions to future AI systems to solve for us is a terrible idea, because doing so impoverishes our ability to sanity-check whether those AI systems are giving us reasonable answers to our questions about social choice.  So, on the margin I think contributions to CSC theory are highly valuable, especially by persons thinking about existential safety as the objective of their research.

CSC educational value:

Learning about CSC is necessary for contributions to CSC, which I think are currently needed to ensure existentially safe societal-scale norms for aligned AI systems to follow after “the alignment revolution” if it happens.  So, I think CSC is highly valuable to learn about, with the caveat that most work in CSC has not been done with existential safety in mind.  Thus, I’m very much hoping that more people who care about existential safety will learn about and begin contributing to CSC in ways that steer CSC toward issues of societal-scale safety.

CSC neglect:

As mentioned above, I think CSC is still far from ready to fulfill governance demands at the ever-increasing speed and scale that will be needed to ensure existential safety in the wake of “the alignment revolution”.  That said, I do think over the next 10 years CSC will become both more imminently necessary and more popular, as more pressure falls upon technology companies to make societal-scale decisions.  CSC will become still more necessary and popular as more humans and human institutions become augmented with powerful aligned AI capabilities that might “change the game” that our civilization is playing.  I expect such advancements to raise increasingly deep and urgent questions about the principles on which our civilization is built, that will need technical answers in order to be fully resolved in ways that maintain existential safety.

CSC exemplars:

CSC exemplars of particular value and relevance to existential safety, mostly via their attention to formalisms for how to structure societal-scale decisions:

Accountability in ML (AccML)Existing Research Area

Social Application

Helpfulness to Existential Safety

Educational Value

 2015 Neglect

 2020 Neglect

 2030 Neglect

Accountability in ML

Multi/Multi

8/10

3/10

8/10

7/10

5/10

Accountability (AccML) is aimed at making it easier to hold persons or institutions accountable for the effects of ML systems.  Accountability depends on transparency and explainability for evaluating the principles by which a harm or mistake occurs, but it is not subsumed by these objectives.  

AccML helpfulness to existential safety:

The relevance of accountability to existential safety is mainly via the principle of accountability gaining more traction in governing the technology industry.  In summary, the high level points I believe in this area are the following, which are argued for in more detail after the list:

  1. Tech companies are currently “black boxes” to outside society, in that they can develop and implement (almost) whatever they want within the confines of privately owned laboratories (and other “secure” systems), and some of the things they develop or implement in private settings could pose significant harms to society.
  2. Soon (or already), society needs to become less permissive of tech company developing highly potent algorithms, even in settings that would currently be considered “private”, similar to the way we treat pharmaceutical companies developing highly potent biological specimens.
  3. Points #1 and #2 mirror the way in which ML systems themselves are black boxes even to their creators, which fortunately is making some ML researchers uncomfortable enough to start holding conferences on accountability in ML.
  4. More researchers getting involved in the task of defining and monitoring accountability can help tech company employees and regulators to reflect on the principle of accountability and whether tech companies themselves should be more subject to it at various scales (e.g., their software should be more accountable to its users and developers, their developers and users should be more accountable to the public, their executives should be more accountable to governments and civic society, etc.).
  5. In futures where transformative AI technology is used to provide widespread services to many agents simultaneously (e.g., “Comprehensive AI services” scenarios), progress on defining and monitoring accountability can help “infuse” those services with a greater degree of accountability and hence safety to the rest of the world.

What follows is my narrative for how and why I believe the five points above.

At present, society is structured such that it is possible for a technology company to amass a huge amount of data and computing resources, and as long as their activities are kept “private”, they are free to use those resources to experiment with developing potentially misaligned and highly potent AI technologies.  For instance, if a tech company tomorrow develops any of the following potentially highly potent technologies within a privately owned ML lab, there are no publicly mandated regulations regarding how they should handle or experiment with them:

  • misaligned superintelligences
  • fake news generators
  • powerful human behavior prediction and control tools
  • … any algorithm whatsoever

Moreover, there are virtually no publicly mandated regulations against knowingly or intentionally or developing any of these artifacts within the confines of a privately owned lab, despite the fact that the mere existence of such an artifact poses a threat to society.  This is the sense in which tech companies are “black boxes” to society, and potentially harmful as such.

(That’s point #1.)

Contrast this situation with the strict guidelines that pharmaceutical companies are required to adhere to in their management of pathogens.  First, it is simply illegal for most companies to knowingly develop synthetic viruses, unless they are certified to do so by demonstrating a certain capacity for safe handling of the resulting artifacts.  Second, conditional on having been authorized to develop viruses, companies are required to follow standardized safety protocols.  Third, companies are subject to third-party audits to ensure compliance with these safety protocols, and are not simply trusted to follow them without question.

Nothing like this is true in the tech industry, because historically, algorithms have been viewed as less potent societal-scale risks than viruses.  Indeed, present-day accountability norms in tech would allow an arbitrary level of disparity to develop between

  • the potency (in terms of potential impact) of algorithms developed in privately owned laboratories, and
  • the preparedness of the rest of society to handle those impacts if the algorithms were released (such as by accident, harmful intent, or poor judgement).

This is a mistake, and an increasingly untenable position as the power of AI and ML technology increases.  In particular, a number of technology companies are intentionally trying to build artificial general intelligence, an artifact which, if released, would be much more potent than most viruses.  These companies do in fact have safety researchers working internally to think about how to be safe and whether to release things.  But contrast this again with pharmaceuticals.  It just won’t fly for a pharmaceutical company to say “Don’t worry, we don’t plan to release it; we’ll just make up our own rules for how to be privately safe with it.”.  Eventually, we should probably stop accepting this position from tech companies at well.

(That’s point #2.)

Fortunately, even some researchers and developers are starting to become uncomfortable with “black boxes” playing important and consequential roles in society, as evidenced by the recent increase in attention on both accountability and interpretability in service of it, for instance:

This kind of discomfort both fuels and is fueled by decreasing levels of blind faith in the benefits of technology in general.  Signs of this broader trend include:

Together, these trends indicate a decreasing level of blind faith in the addition of novel technologies to society, both in the form of black-box tech products, and black-box tech companies.  

(That’s point #3.)

The European General Data Protection Regulation (GDPR) is a very good step for regulating how tech companies relate with the public.  I say this knowing that GDPR is far from perfect.  The reason it’s still extremely valuable is that it has initialized the variable defining humanity’s collective bargaining position (at least within Europe, and replicated to some extent by the CCPA) for controlling how tech companies use data.  That variable can now be amended and hence improved upon without first having to ask the question “Are we even going to try to regulate how tech companies use data?”  For a while, it wasn’t clear any action would ever be taken on this front, outside of specific domains like healthcare and finance.

However, while GDPR has defined a slope for regulating the use of data, we also need accountability for private uses of computing.  As AlphaZero demonstrates, data-free computing alone is sufficient to develop suer-human strategic competence in a well-specified domain.

When will it be time to disallow arbitrary private uses of computing resources, irrespective of its data sources?  Is it time already?  My opinions on this are outside the scope of what I intend to argue for in this post.  But whenever that time comes to develop and enforce such accountability, it will probably be easier to do that if researchers and developers have spent more time thinking about what accountability is, what purpose are served by various versions of accountability, and how to achieve those kinds of accountability in both fully-automated and semi-automated systems.

(That’s point #4.)

And, even if this research yields no “transfer learning” from the awareness that «black box tech products are insufficiently accountable» to «black box tech companies are insufficiently accountable», at least automated approaches to accountability will have a role to play if we end up in a future with large numbers of agents making use of AI-mediated services, such as in the “Comprehensive AI Services” model of the future.  Specifically, 

  • individual actors in a CAIS economy should be accountable to the principle of not privately developing highly potent technologies without adhere to publicly legitimized safety procedures, and
  • systems for reflecting on and updating accountability structures can be used to detect and remediate problematic behaviors in multi-agent systems, including behaviors that could yield existential risks from distributed systems (e.g., extreme resource consumption or pollution effects).

(That’s point #5)

AccML educational value:

Unfortunately, technical work in this area is highly undeveloped, which is why I have assigned this area a relatively low educational value.  I hope this does not trigger people to avoid contributing to it.

AccML neglect:

Correspondingly, this area is highly neglected relative to where I’d like it to be, on top of being very small in terms of the amount of technical work at its core.

AccML exemplars:

Recent examples of writing in AccML that I think are of particular value to existential safety include:

Conclusion

Thanks for reading!  I hope this post has been helpful to your thinking about the value of a variety of research areas for existential safety, or at the very least, your model of my thinking.  As a reminder, these opinions are my own, and are not intended to represent any institution of which I am a part.

Reflections on scope & omissions

This post has been about:

  • Research, not individuals. Some readers might be interested in the question “What about so-and-so’s work at such-and-such institution?”  I think that’s a fair question, but I prefer this post to be about ideas, not individual people.  The reason is that I want to say both positive and negative things about each area, whereas I’m not prepared to write up public statements of positive and negative judgements about people (e.g., “Such-and-such is not going to succeed in their approach”, or “So-and-so seems fundamentally misguided about X”.)
  • Areas, not directions. This post is an appraisal of active areas of research---topics with groups of people already working on them writing up their findings.  It’s primarily not an appraisal of potential directions---ways I think areas of research could change or be significantly improved (although I do sometimes comment on directions I’d like to see each area taking).  For instance, I think intent alignment is an interesting topic, but the current paucity of publicly available technical writing on it makes it difficult to critique.  As such, I think of intent alignment as a “direction” that AI alignment research could be taken in, rather than an “area”.


Discuss

Sunzi's《Methods of War》- On War

19 ноября, 2020 - 06:06
Published on November 19, 2020 3:06 AM GMT

This is a translation of Chapter 2 of The Art of War by Sunzi.

孙子曰:凡用兵之法,驰车千驷,革车千乘,带甲十万,千里馈粮,内外之费,宾客之用,胶漆之材,车甲之奉,日费千金,然后十万之师举矣。

The ordinary methods of war demand:

  • 1,000 teams of 4 horses each,
  • 1,000 wagons,
  • 100,000 shields,
  • provisions to march 1,000 miles,
  • domestic and foreign expenses,
  • hospitality for guests,
  • construction materials for siege weapons,
  • tanks[1],
  • salaries

…and an army of 100,000 soldiers.

其用战也胜,久则钝兵挫锐,攻城则力屈,久暴师则国用不足。

A long war is an expensive war.

夫钝兵挫锐,屈力殚货,则诸侯乘其弊而起,虽有智者,不能善其后矣。

An expensive war will cause your vassals to rebel against you.

故兵闻拙速,未睹巧之久也。夫兵久而国利者,未之有也。

There is no such thing as a beneficial protracted war.

故不尽知用兵之害者,则不能尽知用兵之利也。

If you do not understand the costs of war then you do not know which wars are worthwhile to fight.

善用兵者,役不再籍,粮不三载,取用于国,因粮于敌,故军食可足也。

Do not conscript troops more than once. Do not resupply your army with grain more than twice. Take what you need from the enemy. The enemy has ample grain and an army of troops.

国之贫于师者远输,远输则百姓贫;

Resupplying an army over long distances impoverishes a country.

近师者贵卖,贵卖则百姓财竭,财竭则急于丘役。

Prices sour in wartime. Levying the peasantry under such circumstances will impoverish them while extracting only forced labor.

力屈财殚,中原内虚于家,百姓之费,十去其七;

The central plains will go unfarmed. Seven tenths of the peasantry's labor will be wasted.

公家之费,破军罢马,甲胄矢弩,戟楯蔽橹,丘牛大车,十去其六。

Supplying an army out of the public purse slows the army down. Horses sicken. Shields split. Oxen tire. Six tenths are wasted.

故智将务食于敌,食敌一钟,当吾二十钟;萁秆一石,当吾二十石。

The wise general eats the enemy's food. A captured bowl of enemy food is worth twenty bowls of your own. A captured ton of enemy grain is worth twenty tons of your own.

故杀敌者,怒也;取敌之利者,货也。车战得车十乘以上,赏其先得者,而更其旌旗,车杂而乘之,卒善而养之,是谓胜敌而益强。

Let your troops kill the enemy in anger, plunder the enemy in greed. A captured enemy tank[2] is worth no fewer than ten of your own. Reward your first soldier to capture one. Replace its flag. Mix it in among your own.

A good soldier steals victory from the enemy.

故兵贵胜,不贵久。故知兵之将,生民之司命,国家安危之主也。

A valuable victory is a quick victory. A general who, understanding this, issues orders to the people—thereupon is the fate of a state determined.

  1. Technically, the word used is "armored vehicle". ↩︎

  2. Technically, the word used is "war vehicle". ↩︎



Discuss

Inner Alignment in Salt-Starved Rats

19 ноября, 2020 - 05:40
Published on November 19, 2020 2:40 AM GMT

Introduction: The Dead Sea Salt Experiment

In this 2014 paper by Mike Robinson and Kent Berridge at University of Michigan (see also this more theoretical follow-up discussion by Berridge and Peter Dayan), rats were raised in an environment where they were well-nourished, and in particular, where they were never salt-deprived—not once in their life. The rats were sometimes put into a test cage with a lever which, if pressed, would trigger a device to spray ridiculously salty water directly into their mouth. The rats pressed this lever once or twice, were disgusted and repulsed by the extreme salt taste, and quickly learned not to press the lever again. One of the rats went so far as to stay tight against the opposite wall—as far from the lever as possible!

Then the experimenters made the rats feel severely salt-deprived, by depriving them of salt. Haha, just kidding! They made the rats feel severely salt-deprived by injecting the rats with a pair of chemicals that are known to induce the sensation of severe salt-deprivation. Ah, the wonders of modern science!

...And wouldn't you know it, almost instantly upon injection, the rats changed their behavior! When shown the lever, they now went right over to that lever and jumped on it and gnawed at it, obviously desperate for that super-salty water.

The end.

Aren't you impressed? Aren’t you floored? You should be!!! I don’t think any standard ML algorithm would be able to do what these rats just did!

Think about it:

  • Is this Reinforcement Learning? No. RL would look like the rats randomly stumbling upon the behavior of “pressing the lever when salt-deprived”, find it rewarding, and then adopt that as a goal via “credit assignment”. That’s not what happened. While the rats were nibbling at the lever, they had never in their life had an experience where pressing the lever led to anything other than an utterly repulsive experience. And they had never in their life had an experience where they were salt-deprived, tasted something extremely salty, and found it gratifying. I mean, they were clearly trying to press that lever—this is a foresighted plan we're talking about—but that plan does not seem to have been reinforced by any experience in their life.
  • Is this Imitation Learning? Obviously not; the rats had never seen any other rat press any lever for any reason.
  • Is this an innate, hardwired, stimulus-response behavior? No, the connection between a lever and saltwater was an arbitrary, learned connection. (I didn't mention it, but the researchers also played a distinctive sound each time the lever appeared. Not sure how important that is. But anyway, that connection is arbitrary and learned, too.)

So what’s the algorithm here? How did their brains know that this was a good plan? That’s the subject of this post.

What does this have to do with inner alignment? What is inner alignment anyway? Why should we care about any of this?

With apologies to the regulars here who already know all this, the so-called “inner alignment problem” occurs when you, a programmer, build an intelligent, foresighted, goal-seeking agent. You want it to be trying to achieve a certain goal, like maybe “do whatever I, the programmer, want you to do” or something. The inner alignment problem is: how do you ensure that the agent you programmed is actually trying to pursue that goal? (Meanwhile, the “outer alignment problem” is about choosing a good goal in the first place.) The inner alignment problem is obviously an important safety issue, and will become increasingly important as our AI systems get more powerful in the future.

(See my earlier post mesa-optimizers vs “steered optimizers” for specifics about how I frame the inner alignment problem in the context of brain-like algorithms.)

Now, for the rats, there’s an evolutionarily-adaptive goal of "when in a salt-deprived state, try to eat salt". The genome is “trying” to install that goal in the rat’s brain. And apparently, it worked! That goal was installed! And remarkably, that goal was installed even before that situation was ever encountered! So it’s worth studying this example—perhaps we can learn from it!

Before we get going on that, one more boring but necessary thing:

Aside: Obligatory post-replication-crisis discussion

The dead sea salt experiment strikes me as trustworthy. Pretty much all the rats—and for key aspects literally every tested rat—displayed an obvious qualitative behavioral change almost instantaneously upon injection. There were sensible tests with control levers and with control rats. The authors seem to have tested exactly one hypothesis, and it's a hypothesis that was a priori plausible and interesting. And so on. I can't assess every aspect of the experiment, but from what I see, I believe this experiment, and I'm taking its results at face value. Please do comment if you see anything questionable.

Outline of the rest of the post

Next I'll go through my hypothesis for how the rat brain works its magic here. Actually, I've come up with three variants of this hypothesis over the past year or so, and I’ll talk through all of them, in chronological order. Then I’ll speculate briefly on other possible explanations.

My hypothesis for how the rat brain did what it did

The overall story

As I discussed here, my starting-point assumption is that the rat brain has a “neocortex subsystem” (really the neocortex, hippocampus, parts of thalamus and basal ganglia, maybe other things too). The neocortex subsystem takes inputs, builds a predictive model from scratch, and then chooses thoughts and actions that maximize reward. The reward, in turn, is issued by a different subsystem of the brain that I’ll call “subcortex”.

To grossly oversimplify the “neocortex builds a predictive model” part of that, let’s just say for present purposes that the neocortex subsystem memorizes patterns in the inputs, and then patterns in the patterns, and so on.

To grossly oversimplify the “neocortex chooses thoughts and actions that maximize reward” part, let’s just say for present purposes that different parts of the predictive model are associated with different reward predictions, the reward predictions are updated by a TD learning system that has something to do with dopamine and the basal ganglia, and parts of the model that predict higher reward are favored while parts of the model that predict lower reward are pushed out of mind.

Since the “predictive model” part is invoked for the “reward-maximization” part, we can say that the neocortex does model-based RL.

(Aside: It's sometimes claimed in the literature that brains do both model-based and model-free RL. I disagree that this is a fundamental distinction; I think "model-free" = "model-based with a dead-simple model". See my old comment here.)

Why is this important? Because that brings us to imagination! The neocortex can activate parts of the predictive model not just to anticipate what is about to happen, but also to imagine what may happen, and (relatedly) to remember what has happened.

Now we get a crucial ingredient: I hypothesize that the subcortex somehow knows when the neocortex is imagining the taste of salt. How? This is the part where I have three versions of the story, which I’ll go through shortly. For now, let’s just assume that there is a wire going into the subcortex, and when it’s firing, that means the neocortex is activating the parts of the predictive model that correspond (semantically) to tasting salt.

Basic setup. The subcortex has an incoming signal that tells it that the neocortex is imagining / expecting / remembering the taste of salt. I’ll talk about several possible sources of this signal (here marked “???”) in the next section. Then the subcortex has a hardwired circuit that, whenever the rat is salt-deprived, issues a reward to the neocortex for activating this signal. The neocortex now finds it pleasing to imagine walking over and drinking the saltwater, and it does so!

And once we have that, the last ingredient is simple: The subcortex has an innate, hardwired circuit that says “If the neocortex is imagining tasting salt, and I am currently salt-deprived, then send a reward to the neocortex.”

OK! So now the experiment begins. The rat is salt-deprived, and it sees the lever appear. That naturally evokes its previous memory of tasting salt, and that thought is rewarded! When the rat imagines walking over and nibbling the lever, it finds that to be a very pleasing (high-reward-prediction) thought indeed! So it goes and does it!

Hypothesis 1 for the “imagining taste of salt” signal: The neocortex API enables outputting a prediction for any given input channel

This was my first theory, I guess from last year. As argued by the “predictive coding” people, Jeff Hawkins, Yann LeCun, and many others, the neocortex is constantly predicting what input signals it will receive next, and updating its models when the predictions are wrong. This suggests that it should be possible to stick an arbitrary input line into the neocortex, and then pull out a signal carrying the neocortex’s predictions for that input line. (It would look like a slightly-earlier copy of the input line, with sporadic errors for when the neocortex is surprised.) I can imagine, for example, that if you put an input signal into cortical mini-column #592843 layer 4, then you look at a certain neuron in the same mini-column, you find those predictions.

If this is the case, then the rest is pretty straightforward. The genome wires the salt taste bud signal to wherever in the neocortex, pulls out the corresponding prediction, and we're done! For the reason described above, that line will also fire when merely imagining salt taste.

Commentary on hypothesis 1: I have mixed feelings.

On the one hand, I haven’t really come across any independent evidence that this mechanism exists. And, having learned more about the nitty-gritty of neocortex algorithms (the outputs come from layer 5, blah blah blah), I don’t think the neocortex outputs carry this type of data.

On the other hand, I have a strong prior belief that if there are ten ways for the brain to do a certain calculation, and each is biologically and computationally plausible without dramatic architectural change, the brain will do all ten! I mean, there is a predictive signal for each input—it has to be there somewhere! I don’t currently see any reason that this signal couldn’t be extracted from the neocortex.

So anyway, all things considered, I don’t put much weight on this hypothesis, but I also won’t strongly reject it.

With that, let’s move on to the later ideas that I like better.

Hypothesis 2 for the “neocortex is imagining the taste of salt” signal: The neocortex is rewarded for “communicating its thoughts”

This was my second guess, I guess dating to several months ago.

The neocortex subsystem has a bunch of output lines for motor control and whatever else, and it has a special output line S (S for salt).

Meanwhile, the subcortex sends rewards under various circumstances, and one of those things is that the RL system is rewarded for sending a signal into S whenever salt is tasted. (The subcortex knows when salt is tasted, because it gets a copy of that same input.)

So now, as the rat lives its life, it stumbles upon the behavior of outputting a signal into S when eating a bite of saltier-than-usual food. This is reinforced, and gradually becomes routine.

The rest is as before: when the rat imagines a salty taste, it reuses the same model. We did it!

Commentary on hypothesis 2: A minor problem (from the point-of-view of evolution) is that it would take a while for the neocortex to learn to send a signal into S when eating salt. Maybe that’s OK.

A much bigger potential problem is that the neocortex could learn a pattern where it sends a signal into S when tasting salt, and also learns a different pattern where it sends a signal into S whenever salt-deprived, whether thinking about salt or not. This pattern would, after all, be rewarded, and I can’t immediately see how to stop it from developing.

So I’m pretty skeptical about this hypothesis now.

Hypothesis 3 for the “neocortex is imagining the taste of salt” signal (my favorite!): Sorta an “interpretability” approach, probably involving the amygdala

This one comes out of my last post, Supervised Learning of Outputs in the Brain. Now we have a separate brain module that I labeled “supervised learning algorithm”, and which I suspect is primarily located in the amygdala. This module does supervised learning: the salt signal (from the taste buds) functions as the supervisory signal, and a random assortment of neurons in the neocortex subsystem (describing latent variables in the neocortex’s predictive model) function as the inputs to the learned model. Then the module learns which patterns in the inputs tend to reliably predict that salt is about to be tasted. Having done that, when it sees those patterns reoccur, that’s our signal that the neocortex is probably expecting the taste of salt … but as described above, it will also see those same patterns when the neocortex is merely imagining or remembering the taste of salt. So we have our signal!

Commentary on Hypothesis 3: There’s a lot I really like about this. It seems to at-least-vaguely match various things I’ve seen in the literature about the functionality and connectivity of the amygdala. It makes a lot of sense from a design perspective—the patterns would be learned quickly and reliably, etc., as far as I can tell. I find it satisfyingly obvious and natural (in retrospect). So I would put this forward as my favorite hypothesis.

It also transfers in an obvious way to AGI programming, where it would correspond to something like an automated "interpretability" module that tries to make sense of the AGI's latent variables by correlating them with some other labeled properties of the AGI's inputs, and then rewarding the AGI for "thinking about the right things" (according to the interpretability module's output), which in turn helps turn those thoughts into the AGI's goals.

(Is this a good design idea that AGI programmers should adopt? I don't know, but I find it interesting, and at least worthy of further thought. I don't recall coming across this idea before in the context of inner alignment.)

What would other possible explanations for the rat experiment look like?

The theoretical follow-up by Dayan & Berridge is worth reading, but I don’t think they propose any real answers, just lots of literature and interesting ideas at a somewhat-more-vague level.

Next: What would Steven Pinker say? I don’t know, but maybe it’s a worthwhile exercise for me to at least try. Well, first, I think he would reject the idea that there's a “neocortex subsystem”. And I think he would more generally reject the idea that there is any interesting question along the lines of "how does the reward system know that the rat is thinking about salt?". Of course I want to pose that question, because I come from a perspective of “things need to learned from scratch” (again see My Computational Framework for the Brain). But Pinker would not be coming from that perspective. I think he wants to assume that a comparatively elaborate world-modeling infrastructure is already in place, having been hardcoded by the genome. So maybe he would say there's a built-in “diet module” which can model and understand food, taste, satiety, etc., and he would say there's a built-in “navigation module” which can plan a route to walk over to the lever, and he would there's a built-in “3D modeling module” which can make sense of the room and lever, etc. etc.

OK, now that possibly-strawman-Steven-Pinker has had his say in the previous paragraph, I can respond. I don't think this is so far off as a description of the calculations done by an adult brain. In ML we talk about “how the learning algorithm works” (SGD, BatchNorm, etc.), and separately (and much less frequently!) we talk about “how the trained model works” (OpenAI Microscope, etc.). I want to put all that infrastructure in the previous paragraph at the "trained model" level, not the "learning algorithm" level. Why? First, because I think there’s pretty good evidence for cortical uniformity. Second—and I know this sounds stupid—because I personally am unable to imagine how this setup would work in detail. How exactly do you insert learned content into the innate framework? How exactly do you interface the different modules with each other? And so on. Obviously, yes I know, it’s possible that answers exist, even if I can’t figure them out. But that’s where I’m at right now.



Discuss

Misalignment and misuse: whose values are manifest?

19 ноября, 2020 - 03:10
Published on November 19, 2020 12:10 AM GMT

By Katja Grace, Nov 18 2020, Crossposted from world spirit sock puppet.

AI related disasters are often categorized as involving misaligned AI, or misuse, or accident. Where:

  • misuse means the bad outcomes were wanted by the people involved,
  • misalignment means the bad outcomes were wanted by AI (and not by its human creators), and
  • accident means that the bad outcomes were not wanted by those in power but happened anyway due to error.

In thinking about specific scenarios, these concepts seem less helpful.

I think a likely scenario leading to bad outcomes is that AI can be made which gives a set of people things they want, at the expense of future or distant resources that the relevant people do not care about or do not own.

For example, consider autonomous business strategizing AI systems that are profitable additions to many companies, but in the long run accrue resources and influence and really just want certain businesses to nominally succeed, resulting in a worthless future. Suppose Bob is considering whether to get a business strategizing AI for his business. It will make the difference between his business thriving and struggling, which will change his life. He suspects that within several hundred years, if this sort of thing continues, the AI systems will control everything. Bob probably doesn’t hesitate, in the way that businesses don’t hesitate to use gas vehicles even if the people involved genuinely think that climate change will be a massive catastrophe in hundreds of years.

When the business strategizing AI systems finally plough all of the resources in the universe into a host of thriving 21st Century businesses, was this misuse or misalignment or accident? The strange new values that were satisfied were those of the AI systems, but the entire outcome only happened because people like Bob chose it knowingly (let’s say). Bob liked it more than the long glorious human future where his business was less good. That sounds like misuse. Yet also in a system of many people, letting this decision fall to Bob may well have been an accident on the part of others, such as the technology’s makers or legislators.

Outcomes are the result of the interplay of choices, driven by different values. Thus it isn’t necessarily sensical to think of them as flowing from one entity’s values or another’s. Here, AI technology created a better option for both Bob and some newly-minted misaligned AI values that it also created—‘Bob has a great business, AI gets the future’—and that option was worse for the rest of the world. They chose it together, and the choice needed both Bob to be a misuser and the AI to be misaligned. But this isn’t a weird corner case, this is a natural way for the future to be destroyed in an economy.

Thanks to Joe Carlsmith for conversation leading to this post.



Discuss

2020 Election: Prediction Markets versus Polling/Modeling Assessment and Postmortem

19 ноября, 2020 - 02:00
Published on November 18, 2020 11:00 PM GMT

Moderation/Commenting: I tried to minimize it, but this post necessarily involves some politics. See note at end of post for comment norms. I hope for this to be the last post I make that has to say anything about the election.

Previously: Evaluating Predictions in Hindsight, PredictIt: Presidential Market is Increasingly Wrong

There have been several posts evaluating FiveThirtyEight’s model and comparing its predictions to those of prediction markets. I believe that those evaluations so far have left out a lot of key information and considerations, and thus been highly overly generous to prediction markets and overly critical of the polls and models. General Twitter consensus and other reactions seem to be doing a similar thing. This post gives my perspective.

The summary is:

  1. The market’s movements over time still look crazy before the election, they don’t look great during the election, and they look really, really terrible after the election.
  2. The market’s predictions for safe states were very underconfident, and it looks quite stupid there, to the extent that one cares about that.
  3. According to the Easy Mode evaluation methods, Nate’s model defeats the market.
  4. The market’s predictions on states relative to each other were mostly copied from the polls slash Nate Silver’s model, which was very good at predicting the order, and the odds of an electoral college versus popular vote split at various margins of the popular vote.
  5. The market’s claim to being better is that they gave Trump a higher chance of winning, implying similar to what Nate predicted if the final popular vote was Biden +4, which is about where it will land.
  6. My model of the market says that it was not doing the thing we want to give it credit for, but rather representing the clash between two different reality tunnels, and the prices over time only make sense in that context.
  7. The market priced in a small but sane chance of a stolen election before the election, but seems to have overestimated that chance after the election and not adjusted it over time.

We will go over the methods given in Evaluating Predictions in Hindsight for using Easy Mode, then ask remaining questions in Hard Mode.

First, we need to mention the big thing that most analysis is ignoring. Estimates were not made once at midnight before election day. Estimates were continuous by all parties until then, and then continued to be continuous for prediction markets until now, and informally and incompletely continued to be given by modelers and other traditional sources. 

Predictions Over Time versus Predictions Once

Even when we look at Easy Mode, we have to decide which method we are using to judge predictions. Are we evaluating only on the morning of the election? Or are we evaluating versus the continuous path of predictions over time? Are we taking the sum of the accuracy of predictions at each point, or are we evaluating whether the changes in probability made sense? 

One method asks, does Nate Silver’s 90% chance for Biden to win, and his probabilities of Biden to win various states, look better than the prediction market’s roughly 60%, together with its chances for Trump to win various states? 

The other method takes that as one moment in time, and one data point in a bigger picture. That second method seems better to me.

It seems especially better to look at the whole timeline when trying to evaluate what is effectively a single data point. We need to think about the logic behind the predictions. Otherwise, a stopped clock will frequently look smart. 

And even more so than we saw last time, the market is looking an awful lot like a largely stopped clock.

FiveThirtyEight’s slash Nate Silver’s predictions are consistent and make sense of the data over time. Over the course of the year, Biden increases his polling lead, and time passes, while events seem to go relatively poorly for Trump versus the range of possible outcomes. 

You can certainly argue that 70% for Biden on June 1 was too high, and perhaps put it as low as 50%. But given we started at 70%, the movements from there seem highly sensible. If there was a bias in the polls, it seems logical to presume that bias was there in June. Movement after that point clearly favored Biden, and given Biden was already the favorite, that makes Biden a very large favorite. 

As I laid out previously, when we compare this to prediction markets, we find a market that treated day after day of good things for Biden and bad things for Trump, in a world in which Trump was already the underdog, as not relevant to the probability that Trump would win the election. Day after day, the price barely moved, and Trump was consistently a small underdog. If you start Trump at 38% and more or less keep him there from July 20 all the way to November 3, that’s not a prediction that makes any sense if you interpret it as an analysis of the available data. That’s something else.

Now we get to extend that into election night and beyond. 

During election night, we saw wild swings to Trump. There should definitely have been movement towards Trump as it became clear things would be closer than the polls predicted, but the markets clearly overreacted to early returns. Part of that was that they focused on Florida, and didn’t properly appreciate how much of the swing was in Miami-Dade and therefore was unlikely to fully carry over elsewhere. But a lot of it was basic forgetting about how blue versus red shifts worked in various places, and typical market night-of overreaction. You could get 3:1 on Biden at one point, and could get 2:1 for hours. 

When things were swinging back during the early morning hours, and Biden was an underdog despite clearly taking the lead in Wisconsin, it was clear that the market was acting bonkers. Any reasonable understanding of how the ballots were coming in would make Biden a large favorite to win Michigan, Wisconsin, Pennsylvania and Nevada, and the favorite in Arizona, with Georgia still up for grabs. Basic electoral math says that Biden was a large favorite by that point.

One could defend the market by saying that Trump would find a way to stop the count or use the courts, or something, in many worlds. That didn’t happen, and in hindsight seems like it was basically never going to happen, but it’s hard to make a good evaluation of how likely we should have considered that possible future given what was known. 

What you can’t defend is that the market was trading at non-zero prices continuously after the election was already over. Even after the networks called the election a day or two later than necessary, the markets didn’t collapse, including at BetFair where the market will be evaluated on projected winners rather than who gets electoral college votes. Other places graded the bets, and Trump supporters went into an uproar, and new markets on who would be president in 2021 were created in several places. None of this was a good look.

As I write this, Trump is still being given over a 10% chance of winning by PredictIt, and a 5.5% chance by BetFair. This is obvious nonsense. The majority of the money I’ve bet on the election, I wagered after the election was over.

Before the election, markets gave an answer that didn’t change when new information was available. After the election, they went nuts. You can’t evaluate markets properly if you neglect these facts.

Easy Mode EvaluationsMethod One: Money Talks, Bull**** Walks

I bet against the prediction markets. I made money. Others in the rationalist sphere also made money. Some made money big, including multiple six figure wins. Money talks, bull**** walks.

Yes, I am assuming the markets all get graded for Biden, because the election is over. Free money is still available if you want it.

You would have lost money betting on Florida or North Carolina, or on Texas which I heard discussed, but according to the model’s own odds, those were clearly worse bets than betting on the election itself or on relatively safe states. There were better opportunities, and mostly people bet on the better opportunities. 

I also made money betting on New York, California, Maryland, Wisconsin and Michigan. 

Method Two: Trading Simulation, Where Virtual Money TalksMethod Three: The Green Knight Test

Normally we let models bet into markets and not vice versa. Seth Burn observed correctly that Nate Silver was talking some major trash and evaluating markets as having no information, so he advanced them straight to The Green Knight Test. That's where you bet into the market, but the market also gets to bet into your fair values. That makes it a fair fight.

On the top line, Nate obviously won, and given liquidity, that’s where most of the money was. We can then see what happens if we treat all states equal to the presidency and enforce Kelly betting. Using Seth’s methodology we get this as his Green Knight test, where the market bets into Silver’s odds and Silver bets into the market’s odds:

STATENate RisksNate WinsOutcomeAK2.56%16.96%-2.56%AZ29.49%16.96%16.96%FL49.21%42.10%-49.21%GA29.71%29.50%29.50%IA9.94%14.03%-9.94%ME3.17%0.39%-3.17%MI205.84%23.45%23.45%MN200.16%20.03%20.03%MT7.66%65.51%-7.66%NC31.58%20.90%-31.58%NH68.99%12.32%12.32%NM128.52%10.28%10.28%NV83.15%15.63%15.63%OH20.71%34.64%-20.71%PA107.33%31.84%31.84%TX15.14%33.84%-15.14%WI168.20%22.70%22.70%

Which adds up to a total of +42.7% for Nate Silver, soundly defeating the market. If Nate had been allowed to use market odds only, he would have won a resounding victory.

Also note that this excludes the non-close states. Those states are a massacre. PredictIt let me buy New York for Biden at 93% odds, which Nate Silver evaluated as over 99%, and so on. Nate Silver wins all of the wagers not listed, no exceptions. 

On the presidential level, Nate Silver passes the Green Knight Test with flying colors.

How about for the Senate, which Seth also provides?

RaceNate RisksNate WinsResultAK-Dem Senate5.34%20.59%-5.34%AL-Dem Senate6.03%36.75%-6.03%AZ-Dem Senate4.84%1.21%1.21%CO-GOP Senate0.93%5.79%-0.93%IA-Dem Senate3.86%3.58%-3.86%KS-Dem Senate2.60%9.54%-2.60%ME-GOP Senate8.76%15.45%15.45%MI-Dem Senate42.78%13.43%13.43%MN-Dem Senate82.02%11.01%11.01%MS-Dem Senate2.49%17.67%-2.49%MT-Dem Senate1.73%3.44%-1.73%NC-Dem Senate9.30%5.50%-9.30%TX-GOP Senate0.44%0.05%0.05%Result So Far  8.87%GA-Dem Special40.85%35.97% 

Georgia’s special election is a special case and a very strange election, where I think Nate made a mistake. Other than that, these disagreements were relatively small, and Nate once again comes out ahead even in a Green Knight test.

Passing a Green Knight test is not easy. You know what’s even harder? Passing a Green Knight test when the opponent took your predictions as a huge consideration when setting their odds.

And again, were Nate to go around picking off free money in the not-close races, he looks even better. Nate wins again.

Method Four: Log Likelihood (Effectively Similar to Brier Score)

Predicting the headline result with higher confidence scores better than lower confidence, and the model’s higher confidence in a lot of small states helps a ton if they count. So evaluating either the main prediction, or all predictions together, Nate wins again, despite the election being closer than expected.

If you evaluate only on the ‘swing’ states, for the right definition of swing, it can be argued that the market finally wins one. That’s kind of a metric chosen to be maximally generous to the market if you’re looking at binary outcomes only.

It’s also worth noting that the market mostly knew which states were the swing states because of the polling and people like Nate Silver. Then the market adjusted the odds in those states towards Trump. Effectively this is a proxy for ‘the election went off Biden +4.5 rather than Biden +7.8’ rather than anything else.

Method Five: Calibration Testing

Nate claims that historically his models are well-calibrated or even a bit underconfident. The market’s predictions come out looking very under confident, especially if you look at the 90%+ part of the distribution. Nate wins big.

Method Six: The One Mistake Rule

We talked about this above. The market has made several absurd predictions. It gives Trump chances in deep blue states, and Biden chances in deep red states. It gives Trump major chances after the election is over. The market fails to adjust over time prior to the election. The market gave a 20%+ chance of a popular vote versus electoral college split, which is a proxy prediction of absurdly low variance in vote shares. The market made a lot of other small absurd predictions.

Nate Silver got some predictions wrong, but can you point to anything and say it was clearly absurd or unreasonable? The Georgia special senate election is a weird special case, but that would be an isolated error and time will tell. Other than that, I can’t find a clear mistake. Maybe he was overconfident in Florida, but the difference there was Miami-Dade and that was largely a surprise, plus Nate is running a hands-off model, and has to be evaluated in light of that. 

Once again, seems to me like Nate wins big.

Hard Mode

We’ve already covered many of the issues we need to consider in pure hard mode

The remaining five questions are:

  1. What about the Talib position that things like elections are basically unpredictable so almost always be very close to 50/50?
  2. Is there still a chance Trump stays in office? If he did, would bets on Trump win?
  3. Was the market predicting the election would often be stolen? If so, was that prediction reasonable?
  4. Which was the better prediction for the mean result, that the popular vote margin would be about 4%, or that the popular vote margin would be 7.8%?
  5. What dynamics slash thinking were causing the market to make its prediction? Should we expect these dynamics to give good results in the future?

One way to see that the market and Nate strongly agreed on the ordering of the states was to look at the market on which state would be the tipping point. Throughout the process, that market not only broadly agreed with Nate, it agreed with changes in Nate’s model, and roughly on the distribution of lower probability tipping point states. 

In light of that fact, we can broadly say that markets mostly gave credit to the polls for giving vote shares in various states relative to other states. All they were asserting was a systematic bias, or the potential thereof, or the existence of factors not covered by the polls (e.g. cheating, stealing, voter suppression, additional turnout considerations, etc, all by either or both sides). 

Thus, I think these five questions sum up our remaining disagreements.

Aren’t Elections Inherently Unpredictable And Thus Always Close to 50/50?

Nope. Nonsense. Gibberish. Long tails are important and skin in the game is a vital concept,  and Talib has taught me more than all but a handful of people, but this position is some combination of nonsense and recency bias, and a pure failure to do Bayesian reasoning. 

If you have evidence that suggests one side is more likely to win, that side is more likely to win. The position is silly, and models have done much, much better than you would do predicting 50/50 every time. Yes, elections have recently been in some sense reliably close, but if you had the 2008 or 1996  elections remotely close, you made a terrible prediction. If you had the 2012 election as close in probabilistic terms going into election day, you made another terrible prediction. The prediction markets were being really dumb in 2008 and 2012 (and as far as I know mostly didn’t exist in 1996). 

Even when something is close that doesn’t make it 50/50. An election that is 70/30 to go either way despite all the polling data, such as the 2016 election, is pretty darn close! So is a football game that’s 70/30 at the start, or a chess game, or a court case, or a battle in a war, or a fistfight at a bar. Most fights, even fights that happen, aren’t close. That’s a lot of why most potential fights are avoided.

That doesn’t mean that Talib hasn’t made valid critiques of Silver’s model in the past. His criticisms of the 2016 model were in large part correct and important. In particular, he pointed to the absurdly high probabilities given by the model at various points during the summer. With so much time left, I think the model was clearly overconfident when it gave Clinton chances in the mid-high 80s at various points during the summer, and panicked too much when it made Trump a favorite on July 30. Nine days later Clinton was at 87% to win. Really? 

I’m not sure if that has been fixed, or if there was much less volatility in this year’s polls, or both. This year, the movements seemed logical. 

Is There Still a Chance Trump Stays in Office? Would Wagers Win If He Did?

There’s always some chance. Until the day a person gives up power, there is some probability they will find a way to retain that power. Never underestimate the man with the nuclear codes who continues to claim he should keep them, and who is also commander in chief of the armed forces, and who a large minority of the population stands behind.

But every day, what slim chances remain get slimmer. Lawsuits get thrown out and dropped, and evidence of fraud fails to materialize. Pushback gets stronger. More and more results are certified and more and more people acknowledge what happened, shutting down potential avenues. State legislatures are making it clear they are not going to throw out the votes and appoint different electors, but even if they did, those electors would lack the safe harbor provision, so they’d need approval from the House that they wouldn’t get, and that results in Acting President Pelosi (or Biden, if they make Biden the new Speaker of the House in anticipation).

Everything that has happened has reinforced my model of Trump as someone who is likely to never admit defeat, and who claims that everything is rigged against him (just like the Iowa Caucuses in 2016, and the general election in 2016 which he won), but who never had a viable plan to do anything about it. Sure, lawsuits will be filed, but they’re about creating a narrative, nothing more. 

It wasn’t a 0% plan, for two reasons. One is that thinking of Trump as having a plan is a category error. There are no goals, only systems that suggest actions. The other reason is that it had a non-zero chance of success. Convince enough Republican voters, put enough pressure on lawmakers at various levels, and hope they one by one fall in line and ignore the election results. To some extent it did happen, it could have happened in full, and if it had, who knows what happens next. Or alternatively, someone else could have come up with an actual tangible plan. Or maybe even someone could have found real fraud. 

But at this point, all of that is vanishingly unlikely and getting less likely every day, and it seems even Trump has de facto acknowledged that Biden has won de facto. 

I’ve actually looked into this stuff in some detail, largely so I could sleep better at night, but also to better inform my real actions. It’s over. I will reiterate, as I did on my previous post, that I am open to discussing in more detail privately, if there are things worth discussing, as I have done with several people to help me understand and game out the situation.

What would happen if Trump pulled off this shot, however unlikely it may seem now? That depends on how he does it, and what rules operate on a given website.

BetFair’s rules seem clear to me, in that they are based on projected winners. Biden has won, no matter what happens. Bets should pay out. That’s why many markets have already paid out, and presumably BetFair isn’t doing so to avoid trouble rather than for any real reason. But if Trump did somehow stay in office, then they are still going to have a big problem, because both sides will say they won.

PredictIt’s rules are effectively ‘it’s our website and we’ll decide it however we want to.’ So it would presumably depend on the exact method. State elections refer to the ‘popular vote’ in those states, so if those votes got ignored by the legislature, it’s hard to say that Trump won those markets. Plenty of ambiguity there all around.

I feel very comfortable grading these wagers now.

Was the market predicting the election would often be stolen? If so, was that prediction reasonable?

Certainly that is what the market is predicting now and what it has been predicting since at least November 5. Otherwise the prices don’t make sense. So the question is, was that something it recently discovered after the election, when things were unexpectedly on edge? Or was it priced in to a large extent all along?

It was priced in to some extent because some participants adjusted their probabilities accordingly. Certainly I priced it in when deciding what my fair values were, and I chose states to bet on with that in mind (and did that somewhat poorly, in hindsight the right places were more like Minnesota). But it couldn’t have been priced in much, because the relative prices of the different states didn’t reflect it. In particular, Pennsylvania was always the obvious place to run a steal, since it was the likely tipping point, had a Republican legislature, existing legal disputes and a lot of rhetoric about Philadelphia already in place. But Pennsylvania was exactly where it should have been relative to other states.

The counterargument is that the market assigned a 20% chance of a split between electoral college and popular vote, and presumably this would be how you get that answer?

My guess is that worries about theft moved the market on the order of 2%, which then expanded to 5%-10% as things proved close, which is roughly consistent, but also proved to be higher than it needed to be. But did we have the information to know that? Were such efforts essentially doomed to fail? 

It’s hard to say, even in hindsight. I wagered as if there was a substantial chance of this, and now consider my estimate to have been too high, but also perhaps emotionally necessary. An obviously stolen election would have been much worse for my life than an apparent clean victory, no matter the outcome on who won.

Which was the better prediction for the mean result, that the popular vote margin would be about 4%, or that the popular vote margin would be 7.8%?

This is the real crux of it all to many people.

We now know that the margin will be something around 4%-5%, and the margin in the electoral college around 0.6%. FiveThirtyEight’s model thought the break-even point for Biden being a favorite in the electoral college was a popular vote win of around 3.5%. That seems like a great prediction now, and we have no reason to think the market substantially disagreed.

Thus, it mostly comes down to this question. Assume the market was not pricing in much theft or fraud, and thus expected Trump to only lose the popular vote by an average of 4%. Was that a good prediction?

A lot of people are saying it was a good prediction because it happened. That’s suggestive, but no. That’s not how this works. 

There were stories about why we should expect Trump to beat his polls, sure. But there were also stories of why Biden should expect to beat his polls instead. 

One theory is ‘they messed up in 2016 so they’ll mess up again the same way.’ This is especially popular in hindsight. I do not think that is fair at all. This result is well within the margin of error. Once all the votes get counted, we are talking about a 3-4% polling error, which is well within historical norms. And when we look at details, the stories about why Trump obviously beat his polls don’t seem right to me at all.

Trump’s turnout operation did a great job relative to expectations, and Biden’s day-of operation did not do as well. Good, but that’s relative to expectations. Polls have to account for Democrats doing lots of mail ballots and Republicans going for election day, and it’s hard to get right. When you get it wrong, you can be wrong in either direction, and also it’s possible that the error was that not all mail ballots were counted. The Covid-19 situation was pretty unique, and trying to correct for that is hard. Nate has speculated that perhaps different types of people were at home more or less often this time, and that wasn’t corrected for sufficiently, and that had interesting effects on error. That’s not something people talked about pre-election anywhere I noticed, even in hindsight.

Shy Trump voters! Every time a populist right-wing candidate with views the media finds unacceptable runs, we have to debunk this, and note that the historical pattern is for such candidates not to beat their polls. In addition, if this was a shy voter effect, then you would expect blue state Trump voters to be shy (social disapproval) but red state Trump voters to not be as shy (social approval). Thus, you’d expect that the bluer the state, the more Trump would beat his polls, if what was happening was that Biden supporters were scaring Trump supporters into keeping things secret. Instead we had the opposite. Trump outperformed his polls more in red areas. 

Historically, polls were accurate in 2018, and Obama beat his polls in 2012, and so on. It goes both ways. 

You could also view this as Republicans and undecideds inevitably coming home to Trump late, and the model underestimating the amount of tightening that would happen. That seems like a reasonable argument, but not big enough to account for predicting outright a 4% margin.

Nate’s explanation is that 90% chance of victory is exactly because a ‘normal’ polling error, like the one we saw, would still likely mean Biden wins, and that’s exactly what happened. 

Does it feel right in hindsight that Trump only lost by 4%? Sure. But suppose he had lost by 12%. Would that have felt right too? Would we have a different story to tell? I think strongly yes.

So why does this feel so different than that?

I think it’s the sum of several different things.

One is that the people set expectations for polls too high. A 4% miss, even with the correct winner, is no longer acceptable with stakes and tensions so high, even though it is roughly expected.

Two is that the blue shift meant that the error looked much bigger than it was. The narratives get set long before we get accurate counts. There’s plenty of blame to go around on this, as California and New York take forever to count ballots. 

Three is that we now have distinct reality tunnels, so a lot of people thought Trump would obviously win regardless of the evidence, either because they supported Trump (MAGA!) or because they were still traumatized by 2016 and assumed he’d somehow win again. Whereas others had never met a Trump supporter and couldn’t imagine how anyone could be one, and assumed Biden victory, so everyone was of course standing by to say how terrible the polls were.

Fourth is because polls involve numbers and are run by nerds who are low status so we are always looking to tear such things down when they are daring to make bold claims. 

Fifth is that polls are being evaluated, as I’ve emphasized throughout, against a polls plus humans hybrid. They are not being evaluated against people who don’t look at polls. That’s not a fair comparison.

All of this leads into the final question, which to me is the key thing to observe.

What dynamics slash thinking were causing the market to make its prediction? Should we expect these dynamics to give good results in the future?

I think the best model is to consider several distinct groups. Here are some of them.

We have the gamblers who are just having fun and doing random stuff.

We have the partisans. There are Democrats and Republicans who will back their candidate no matter what because they are convinced they will win. They don’t do math or anything like that. Yes, the market overall was freaking huge, but with only one market every four years, partisans can wager big in the dark and not go bankrupt ever.

We have the hedgers who are looking for a form of insurance against a win by the wrong party.

We have the modelers who are betting largely on the polls and models.

We have the sharps who are trying to figure things out based on all available information, potentially including institutional traders like hedge funds.

We have the campaigners who bet money to move the market to give the impression their candidate is winning.

One big actor is enough to move the market a lot, and campaigning in this way is an efficient thing to do even with this year’s deeper market. 

My experience suggests that most sharps have much bigger and better and more liquid markets to worry about, especially given the annoyance of moving money to gambling sites. They mostly prefer to let the market be a source of information, and sometimes trade privately among themselves, often at prices not that close to the market. 

If the sharps and modelers had been in charge, prices would move more over time. They exist, but there are not that many of them and they are mostly size limited. 

We see this a lot on major events, as I’ve noted before, like the Super Bowl or the World Cup. If you are on the ball, you’ll bet somewhat more on a big event, but you won’t bet that much more on it than on other things that have similarly crazy prices. So the amount of smart money does not scale up that much. Whereas the dumb money, especially the partisans and gamblers, come out of the woodwork and massively scale up. 

What I think essentially was happening was that we had partisans making dumb bets on Trump, other partisans making dumb bets on Biden (which were profitable, but for dumb reasons), and then modelers having enough money to move things somewhat but not all that much until election night.

Then after election night, the people who are buying the whole ‘Trump won!’ line see a chance to buy Trump cheap and do so, combined with most people who would bet on Biden already having bet what they’re willing to wager, and there needing to be a 10:1 or higher ratio of dollars wagered to balance out. So the price stays crazy.

Similar dynamics create the longshot bias problem, and the ‘adds to far more than 100%’ problem that allows pure free money, which are very real issues with these markets.

Conclusion

Who comes out ahead? If you want to know what will happen in 2024, or in any other election, where should you look? To the market, or to the models? Can you make money betting in prediction markets using models?

My conclusion is that the FiveThirtyEight model comes out better than the markets on most metrics. The market implicitly predicted an accurate final popular vote count, on election morning, but that’s largely a coincidence in multiple ways. Markets still have value and should be part of how you anticipate the future, but until their legal structures improve, they should not be your primary source, and you shouldn’t hesitate to make money off them where it is legal. 

If you literally can only look at one number, then any model will lag when events happen, which is a problem. There will also be events that don’t fit into neat patterns, where the market is the tool you have, and it will be much better than nothing. Those are the biggest advantages of markets.

If I had to choose one source for 2024, and I knew there had been no major developments in the past week, I would take FiveThirtyEight’s model over the prediction markets. 

If I was allowed to look at both the models and markets, and also at the news for context, I would combine all sources. If the election is still far away, I would give more weight to the market, while correcting for its biases. The closer to election day, the more I would trust the model over the polls. In the final week, I expect the market mainly to indicate who the favorite is, but not to tell me much about the degree to which they are favored.

If FiveThirtyEight gives either side 90%, and the market gives that side 62%, in 2024, my fair will again be on the order of 80%, depending on how seriously we need to worry about the integrity of the process at that point. 

The more interesting scenario for 2024 is FiveThirtyEight is 62% on one side and the market is 62% on the other, and there is no obvious information outside of FiveThirtyEight’s model that caused that divergence. My gut tells me that I have that one about 50% each way.

A scenario I very much do not expect is the opposite of the first one. What if the market says 90% for one candidate, and FiveThirtyEight says 62%, with no obvious reason for the disagreement? My brain says ‘that won’t happen’ but what if it does? Prediction markets tend to be underconfident and now they’re making a super strong statement despite structural reasons it’s very hard to do that. But the polls somehow are pretty close. My actual answer is “I would assume the election is likely not going to be free and fair.” Another possibility is someone is pumping a ton of money in to manipulate the market or one fanatic has gone nuts, but I think that getting things much above 70% when the situation doesn’t call for it would get stupidly expensive.

In all these cases I’d also be looking at the path both sources took to get to that point. 

I think that you can definitely make money betting with models into prediction markets. You can also make money betting without models into prediction markets! Sell things that are trading at 5%-15% and don’t seem likely to increase further, and sell things that have gotten less likely without the price moving. Sell things that seem silly when the odds add up to over 110%. Rinse and repeat. But you can make even more if you let models inform you.

I’d love to see prediction markets improve. If the market had been fully legal and trading on the NYSE, I’d expect vastly different behavior the whole way through. Until then, we need to accept these markets for what they are. When 2024 (or perhaps even 2022) rolls around, you may wish to be ready.

Moderation Note: This topic needs to be discussed by those interested in markets, prediction markets, predictions, calibration and probability, but inevitably involves politics. In order to evaluate prediction markets this election, we need to talk about areas where reality tunnels radically differ, because both things those tunnels disagree about and also the tunnels themselves are at the core of what happened. If you are discussing this on LessWrong, your comments will show up in recent discussion. I’d like to keep such comments out of recent discussion, so to avoid that, if you feel the need for your comment to be political, please make the comment on the Don’t Worry About the Vase version of the post. For this post and this post only, I’m going to allow election-relevant political statements and claims on the original version, to the extent that they seem necessary to explore the issues in question. For the LessWrong comment section, the norms against politics will be strictly enforced, so stick to probability and modeling. 



Discuss

The ethics of AI for the Routledge Encyclopedia of Philosophy

18 ноября, 2020 - 20:55
Published on November 18, 2020 5:55 PM GMT

I've been tasked by the Routledge Encyclopedia of Philosophy to write their entry on the ethics of AI.

I'll be starting the literature reviews and similar in the coming weeks. Could you draw my attention to any aspect of AI ethics (including the history of AI ethics) that you think is important and deserves to be covered?

Cheers!



Discuss

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

18 ноября, 2020 - 20:47
Published on November 18, 2020 5:47 PM GMT

An AI actively trying to figure out what I want might show me snapshots of different possible worlds and ask me to rank them. Of course, I do not have the processing power to examine entire worlds; all I can really do is look at some pictures or video or descriptions. The AI might show me a bunch of pictures from one world in which a genocide is quietly taking place in some obscure third-world nation, and another in which no such genocide takes place. Unless the AI already considers that distinction important enough to draw my attention to it, I probably won’t notice it from the pictures, and I’ll rank those worlds similarly - even though I’d prefer the one without the genocide. Even if the AI does happen to show me some mass graves (probably secondhand, e.g. in pictures of news broadcasts), and I rank them low, it may just learn that I prefer my genocides under-the-radar.

The obvious point of such an example is that an AI should optimize for the real-world things I value, not just my estimates of those things. I don't just want to think my values are satisfied, I want them to actually be satisfied. Unfortunately, this poses a conceptual difficulty: what if I value the happiness of ghosts? I don't just want to think ghosts are happy, I want ghosts to actually be happy. What, then, should the AI do if there are no ghosts?

Human "values" are defined within the context of humans' world-models, and don't necessarily make any sense at all outside of the model (i.e. in the real world). Trying to talk about my values "actually being satisfied" is a type error.

Some points to emphasize here:

  • My values are not just a function of my sense data, they are a function of the state of the whole world, including parts I can't see - e.g. I value the happiness of people I will never meet.
  • I cannot actually figure out or process the state of the whole world
  • … therefore, my values are a function of things I do not know and will not ever know - e.g. whether someone I will never encounter is happy right now
  • This isn’t just a limited processing problem; I do not have enough data to figure out all these things I value, even in principle.
  • This isn’t just a problem of not enough data, it’s a problem of what kind of data. My values depend on what’s going on “inside” of things which look the same - e.g. whether a smiling face is actually a rictus grin
  • This isn’t just a problem of needing sufficiently low-level data. The things I care about are still ultimately high-level things, like humans or trees or cars. While the things I value are in principle a function of low-level world state, I don’t directly care about molecules.
  • Some of the things I value may not actually exist - I may simply be wrong about which high-level things inhabit our world.
  • I care about the actual state of things in the world, not my own estimate of the state - i.e. if the AI tricks me into thinking things are great (whether intentional trickery or not), that does not make things great.

These features make it rather difficult to “point” to values - it’s not just hard to formally specify values, it’s hard to even give a way to learn values. It’s hard to say what it is we’re supposed to be learning at all. What, exactly, are the inputs to my value-function? It seems like:

  • Inputs to values are not complete low-level world states (since people had values before we knew what quantum fields were, and still have values despite not knowing the full state of the world), but…
  • I value the actual state of the world rather than my own estimate of the world-state (i.e. I want other people to actually be happy, not just look-to-me like they’re happy).

How can both of those intuitions seem true simultaneously? How can the inputs to my values-function be the actual state of the world, but also high-level objects which may not even exist? What things in the low-level physical world are those “high-level objects” pointing to?

If I want to talk about "actually satisfying my values" separate from my own estimate of my values, then I need some way to say what the values-relevant pieces of my world model are "pointing to" in the real world.

I think this problem - the “pointers to values” problem, and the “pointers” problem more generally - is the primary conceptual barrier to alignment right now. This includes alignment of both “principled” and “prosaic” AI. The one major exception is pure human-mimicking AI, which suffers from a mostly-unrelated set of problems (largely stemming from the shortcomings of humans, especially groups of humans).

I have yet to see this problem explained, by itself, in a way that I’m satisfied by. I’m stealing the name from some of Abram’s posts, and I think he’s pointing to the same thing I am, but I’m not 100% sure.

The goal of this post is to demonstrate what the problem looks like for a (relatively) simple Bayesian-utility-maximizing agent, and what challenges it leads to. This has the drawback of defining things only within one particular model, but the advantage of showing how a bunch of nominally-different failure modes all follow from the same root problem: utility is a function of latent variables. We’ll look at some specific alignment strategies, and see how and why they fail in this simple model.

One thing I hope people will take away from this: it’s not the “values” part that’s conceptually difficult, it’s the “pointers” part.

The Setup

We have a Bayesian expected-utility-maximizing agent, as a theoretical stand-in for a human. The agent’s world-model is a causal DAG over variables X.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , and it chooses actions Xa=x∗a to maximize E[u(X)|do(Xa=x∗a)] - i.e. it’s using standard causal decision theory. We will assume the agent has a full-blown Cartesian boundary, so we don’t need to worry about embeddedness and all that. In short, this is a textbook-standard causal-reasoning agent.

One catch: the agent’s world-model uses the sorts of tricks in Writing Causal Models Like We Write Programs, so the world-model can represent a very large world without ever explicitly evaluating probabilities of every variable in the world-model. Submodels are expanded lazily when they’re needed. You can still conceptually think of this as a standard causal DAG, it’s just that the model is lazily evaluated.

In particular, thinking of this agent as a human, this means that our human can value the happiness of someone they’ve never met, never thought about, and don’t know exists. The utility u(X) can be a function of variables which the agent will never compute, because the agent never needs to fully compute u in order to maximize it - it just needs to know how u changes as a function of the variables influenced by its actions.

Key assumption: most of the variables in the agent’s world-model are not observables. Drawing the analogy to humans: most of the things in our world-models are not raw photon counts in our eyes or raw vibration frequencies/intensities in our ears. Our world-models include things like trees and rocks and cars, objects whose existence and properties are inferred from the raw sense data. Even lower-level objects, like atoms and molecules, are latent variables; the raw data from our eyes and ears does not include the exact positions of atoms in a tree. The raw sense data itself is not sufficient to fully determine the values of the latent variables, in general; even a perfect Bayesian reasoner cannot deduce the true position of every atom in a tree from a video feed.

Now, the basic problem: our agent’s utility function is mostly a function of latent variables. Human values are mostly a function of rocks and trees and cars and other humans and the like, not the raw photon counts hitting our eyeballs. Human values are over inferred variables, not over sense data.

Furthermore, human values are over the “true” values of the latents, not our estimates - e.g. I want other people to actually be happy, not just to look-to-me like they’re happy. Ultimately, E[u(X)] is the agent’s estimate of its own utility (thus the expectation), and the agent may not ever know the “true” value of its own utility - i.e. I may prefer that someone who went missing ten years ago lives out a happy life, but I may never find out whether that happened. On the other hand, it’s not clear that there’s a meaningful sense in which any “true” utility-value exists at all, since the agent’s latents may not correspond to anything physical - e.g. a human may value the happiness of ghosts, which is tricky if ghosts don’t exist in the real world.

On top of all that, some of those variables are implicit in the model’s lazy data structure and the agent will never think about them at all. I can value the happiness of people I do not know and will never encounter or even think about.

So, if an AI is to help optimize for E[u(X)], then it’s optimizing for something which is a function of latent variables in the agent’s model. Those latent variables:

  • May not correspond to any particular variables in the AI’s world-model and/or the physical world
  • May not be estimated by the agent at all (because lazy evaluation)
  • May not be determined by the agent’s observed data

… and of course the agent’s model might just not be very good, in terms of predictive power.

As usual, neither we (the system’s designers) nor the AI will have direct access to the model; we/it will only see the agent’s behavior (i.e. input/output) and possibly a low-level system in which the agent is embedded. The agent itself may have some introspective access, but not full or perfectly reliable introspection.

Despite all that, we want to optimize for the agent’s utility, not just the agent’s estimate of its utility. Otherwise we run into wireheading-like problems, problems with the agent’s world model having poor predictive power, etc. But the agent’s utility is a function of latents which may not be well-defined at all outside the context of the agent’s estimator (a.k.a. world-model). How can we optimize for the agent’s “true” utility, not just an estimate, when the agent’s utility function is defined as a function of latents which may not correspond to anything outside of the agent’s estimator?

The Pointers Problem

We can now define the pointers problem - not only “pointers to values”, but the problem of pointers more generally. The problem: what functions of what variables (if any) in the environment and/or another world-model correspond to the latent variables in the agent’s world-model? And what does that “correspondence” even mean - how do we turn it into an objective for the AI, or some other concrete thing outside the agent’s own head?

Why call this the “pointers” problem? Well, let’s take the agent’s perspective, and think about what its algorithm feels like from the inside. From inside the agent’s mind, it doesn’t feel like those latent variables are latent variables in a model. It feels like those latent variables are real things out in the world which the agent can learn about. The latent variables feel like “pointers” to real-world objects and their properties. But what are the referents of these pointers? What are the real-world things (if any) to which they’re pointing? That’s the pointers problem.

Is it even solvable? Definitely not always - there probably is no real-world referent for e.g. the human concept of a ghost. Similarly, I can have a concept of a perpetual motion machine, despite the likely-impossibility of any such thing existing. Between abstraction and lazy evaluation, latent variables in an agent’s world-model may not correspond to anything in the world.

That said, it sure seems like at least some latent variables do correspond to structures in the world. The concept of “tree” points to a pattern which occurs in many places on Earth. Even an alien or AI with radically different world-model could recognize that repeating pattern, realize that examining one tree probably yields information about other trees, etc. The pattern has predictive power, and predictive power is not just a figment of the agent’s world-model.

So we’d like to know both (a) when a latent variable corresponds to something in the world (or another world model) at all, and (b) what it corresponds to. We’d like to solve this in a way which (probably among other use-cases) lets the AI treat the things-corresponding-to-latents as the inputs to the utility function it’s supposed to learn and optimize.

To the extent that human values are a function of latent variables in humans’ world-models, this seems like a necessary step not only for an AI to learn human values, but even just to define what it means for an AI to learn human values. What does it mean to “learn” a function of some other agent’s latent variables, without necessarily adopting that agent’s world-model? If the AI doesn’t have some notion of what the other agent’s latent variables even “are”, then it’s not meaningful to learn a function of those variables. It would be like an AI “learning” to imitate grep, but without having any access to string or text data, and without the AI itself having any interface which would accept strings or text.

Pointer-Related Maladies

Let’s look at some example symptoms which can arise from failure to solve specific aspects of the pointers problem.

Genocide Under-The-Radar

Let’s go back to the opening example: an AI shows us pictures from different possible worlds and asks us to rank them. The AI doesn’t really understand yet what things we care about, so it doesn’t intentionally draw our attention to certain things a human might consider relevant - like mass graves. Maybe we see a few mass-grave pictures from some possible worlds (probably in pictures from news sources, since that’s how such information mostly spreads), and we rank those low, but there are many other worlds where we just don’t notice the problem from the pictures the AI shows us. In the end, the AI decides that we mostly care about avoiding worlds where mass graves appear in the news - i.e. we prefer that mass killings stay under the radar.

How does this failure fit in our utility-function-of-latents picture?

This is mainly a failure to distinguish between the agent’s estimate of its own utility E[u(X)], and the “real” value of the agent’s utility u(X) (insofar as such a thing exists). The AI optimizes for our estimate, but does not give us enough data to very accurately estimate our utility in each world - indeed, it’s unlikely that a human could even handle that much information. So, it ends up optimizing for factors which bias our estimate - e.g. the availability of information about bad things.

Note that this intuitive explanation assumes a solution to the pointers problem: it only makes sense to the extent that there’s a “real” value of u(X) from which the “estimate” can diverge.

Not-So-Easy Wireheading Problems

The under-the-radar genocide problem looks roughly like a typical wireheading problem, so we should try a roughly-typical wireheading solution: rather than the AI showing world-pictures, it should just tell us what actions it could take, and ask us to rank actions directly.

If we were ideal Bayesian reasoners with accurate world models and infinite compute, and knew exactly where the AI’s actions fit in our world model, then this might work. Unfortunately, the failure of any of those assumptions breaks the approach:

  • We don’t have the processing power to predict all the impacts of the AI’s actions
  • Our world models may not be accurate enough to correctly predict the impact of the AI’s actions, even if we had enough processing power
  • The AI’s actions may not even fit neatly into our world model - e.g. even the idea of genetic engineering might not fit the world-model of premodern human thinkers

Mathematically, we’re trying to optimize E[u(X)|do(XAI=x∗AI)], i.e. optimize expected utility given the AI’s actions. Note that this is necessarily an expectation under the human’s model, since that’s the only context in which u(X) is well-defined. In order for that to work out well, we need to be able to fully evaluate that estimate (sufficient processing power), we need the estimate to be accurate (sufficient predictive power), and we need XAI to be defined within the model in the first place.

The question of whether our world-models are sufficiently accurate is particularly hairy here, since accuracy is usually only defined in terms of how well we estimate our sense-data. But the accuracy we care about here is how well we “estimate” the values of latent variables and u(X). What does that even mean, when the latent variables may not correspond to anything in the world?

People I Will Never Meet

“Human values cannot be determined from human behavior” seems almost old-hat at this point, but it’s worth taking a moment to highlight just how underdetermined values are from behavior. It’s not just that humans have biases of one kind or another, or that revealed preferences diverge from stated preferences. Even in our perfect Bayesian utility-maximizer, utility is severely underdetermined from behavior, because the agent does not have perfect estimates of its latent variables. Behavior depends only on the agent’s estimate, so it cannot account for “error” in the agent’s estimates of latent variable values, nor can it tell us about how the agent values variables which are not coupled to its own choices.

The happiness of people I will never interact with is a good example of this. There may be people in the world whose happiness will not ever be significantly influenced by my choices. Presumably, then, my choices cannot tell us about how much I value such peoples’ happiness. And yet, I do value it.

“Misspecified” Models

In Latent Variables and Model Misspecification, jsteinhardt talks about “misspecification” of latent variables in the AI’s model. His argument is that things like the “value function” are latent variables in the AI’s world-model, and are therefore potentially very sensitive to misspecification of the AI’s model.

In fact, I think the problem is more severe than that.

The value function’s inputs are latent variables in the human’s model, and are therefore sensitive to misspecification in the human’s model. If the human’s model does not match reality well, then their latent variables will be something wonky and not correspond to anything in the world. And AI designers do not get to pick the human’s model. These wonky variables, not corresponding to anything in the world, are a baked-in part of the problem, unavoidable even in principle. Even if the AI’s world model were “perfectly specified”, it would either be a bad representation of the world (in which case predictive power becomes an issue) or a bad representation of the human’s model (in which case those wonky latents aren’t defined).

The AI can’t model the world well with the human’s model, but the latents on which human values depend aren’t well-defined outside the human’s model. Rock and a hard place.

Takeaway

Within the context of a Bayesian utility-maximizer (representing a human), utility/values are a function of latent variables in the agent’s model. That’s a problem, because those latent variables do not necessarily correspond to anything in the environment, and even when they do, we don’t have a good way to say what they correspond to.

So, an AI trying to help the agent is stuck: if the AI uses the human’s world-model, then it may just be wrong outright (in predictive terms). But if the AI doesn’t use the human’s world-model, then the latents on which the utility function depends may not be defined at all.

Thus, the pointers problem, in the Bayesian context: figure out which things in the world (if any) correspond to the latent variables in a model. What do latent variables in my model “point to” in the real world?



Discuss

Normativity

18 ноября, 2020 - 19:52
Published on November 18, 2020 4:52 PM GMT

Now that I've written Learning Normativity, I have some more clarity around the concept of "normativity" I was trying to get at, and want to write about it more directly. Whereas that post was more oriented toward the machine learning side of things, this post is more oriented toward the philosophical side. However, it is still relevant to the research direction, and I'll mention some issues relevant to value learning and other alignment approaches.

How can we talk about what you "should" do?

A Highly Dependent Concept

Now, obviously, what you should do depends on your goals. We can (at least as a rough first model) encode this as a utility function (but see my objection).

What you should do also depends on what's the case. Or, really, it depends on what you believe is the case, since that's what you have to go on.

Since we also have uncertainty about values (and we're interested in building machines which should have value uncertainty as well, in order to do value learning), we have to talk about beliefs-about-goals, too. (Or beliefs about utility functions, or however it ends up getting formalized.) This includes moral uncertainty.

Even worse, we have a lot of uncertainty about decision theory -- that is, we have uncertainty about how to take all of this uncertainty we have, and make it into decisions. Now, ideally, decision theory is not something the normatively correct thing depends on, like all the previous points, but rather is a framework for finding the normatively correct thing given all of those things. However, as long as we're uncertain about decision theory, we have to take that uncertainty as input too -- so, if decision theory is to give advice to realistic agents who are themselves uncertain about decision theory, decision theory also takes decision-theoretic uncertainty as an input. (In the best case, this makes bad decision theories capable of self-improvement.)

Clearly, we can be uncertain about how that is supposed to work.

By now you might get the idea. "Should" depends on some necessary information (let's call them the "givens"). But for each set of givens you claim is complete, there can be reasonable doubt about how to use those givens to determine the output. So we can create meta-level givens about how to use those givens.

Rather than stopping at some finite level, such as learning the human utility function, I'm claiming that we should learn all the levels. This is what I mean by "normativity" -- the information at all the meta-levels, which we would get if we were to unpack "should" forever. I'm putting this out there as my guess at the right type signature for human values.

I'm not mainly excited about this because I'm especially excited about including moral uncertainty or uncertainty about the correct decision theory into a friendly AI -- or because I think those are going to be particularly huge failure modes which we need to avert. Rather, I'm excited about this because it is the first time I've felt like I've had any handles at all for getting basic alignment problems right (wireheading, human manipulation, goodharting, ontological crisis) without a feeling that things are obviously going to blow up in some other way. 

Normative vs Descriptive Reasoning

At this stage you might accuse me of committing the "turtles all the way down" fallacy. In Passing The Recursive Buck, Eliezer describes the error of accidentally positing an infinite hierarchy of explanations:

The general antipattern at work might be called "Passing the Recursive Buck". 

[...]

How do you stop a recursive buck from passing?

You use the counter-pattern:  The Recursive Buck Stops Here.

But how do you apply this counter-pattern?

You use the recursive buck-stopping trick.

And what does it take to execute this trick?

Recursive buck stopping talent.

And how do you develop this talent?

Get a lot of practice stopping recursive bucks.

Ahem.

However, In Where Recursive Justification Hits Rock Bottom, Eliezer discusses a kind of infinite-recursion reasoning applied to normative matters. He says:

But I would nonetheless emphasize the difference between saying:

"Here is this assumption I cannot justify, which must be simply taken, and not further examined."

Versus saying:

"Here the inquiry continues to examine this assumption, with the full force of my present intelligence—as opposed to the full force of something else, like a random number generator or a magic 8-ball—even though my present intelligence happens to be founded on this assumption."

Still... wouldn't it be nice if we could examine the problem of how much to trust our brains without using our current intelligence?  Wouldn't it be nice if we could examine the problem of how to think, without using our current grasp of rationality?

When you phrase it that way, it starts looking like the answer might be "No".

So, with respect to normative questions, such as what to believe, or how to reason, we can and (to some extent) should keep unpacking reasons forever -- every assumption is subject to further scrutiny, and as a practical matter we have quite a bit of uncertainty about meta-level things such as our values, how to think about our values, etc.

This is true despite the fact that with respect to the descriptive questions the recursive buck must stop somewhere. Taking a descriptive stance, my values and beliefs live in my neurons. From this perspective, "human logic" is not some advanced logic which logicians may discover some day, but rather, just the set of arguments humans actually respond to. Again quoting another Eliezer article

The phrase that once came into my mind to describe this requirement, is that a mind must be created already in motion.  There is no argument so compelling that it will give dynamics to a static thing.  There is no computer program so persuasive that you can run it on a rock.

So in a descriptive sense the ground truth about your values is just what you would actually do in situations, or some information about the reward systems in your brain, or something resembling that. In a descriptive sense the ground truth about human logic is just the sum total of facts about which arguments humans will accept.

But in a normative sense, there is no ground truth for human values; instead, we have an updating process which can change its mind about any particular thing; and that updating process itself is not the ground truth, but rather has beliefs (which can change) about what makes an updating process legitimate. Quoting from the relevant section of Radical Probabilism:

The radical probabilist does not trust whatever they believe next. Rather, the radical probabilist has a concept of virtuous epistemic process, and is willing to believe the next output of such a process. Disruptions to the epistemic process do not get this sort of trust without reason.

I worry that many approaches to value learning attempt to learn a descriptive notion of human values, rather than the normative notion. This means stopping at some specific proxy, such as what humans say their values are, or what humans reveal their preferences to be through action, rather than leaving the proxy flexible and trying to learn it as well, while also maintaining uncertainty about how to learn, and so on.

I've mentioned "uncertainty" a lot while trying to unpack my hierarchical notion of normativity. This is partly because I want to insist that we have "uncertainty at every level of the hierarchy", but also because uncertainty is itself a notion to which normativity applies, and thus, generates new levels of the hierarchy.

Normative Beliefs

Just as one might argue that logic should be based on a specific set of axioms, with specific deduction rules (and a specific sequent calculus, etc), one might similarly argue that uncertainty should be managed by a specific probability theory (such as the Kolmogorov axioms), with a specific kind of prior (such as a description-length prior), and specific update rules (such as Bayes' Rule), etc.

This general approach -- that we set up our bedrock assumptions from which to proceed -- is called "foundationalism".

I claim that we can't keep strictly to Bayes' Rule -- not if we want to model highly-capable systems in general, not if we want to describe human reasoning, and not if we want to capture (the normative) human values. Instead, how to update in a specific instance is a more complex matter which agents must figure out.

I claim that the Kolmogorov axioms don't tell us how to reason -- we need more than an uncomputable ideal; we also need advice about what to do in our boundedly-rational situation.

And, finally, I claim that length-based priors such as the Solomonoff prior are malign -- description length seems to be a really important heuristic, but there are other criteria which we want to judge hypotheses by.

So, overall, I'm claiming that a normative theory of belief is a lot more complex than Solomonoff would have you believe. Things that once seemed objectively true now look like rules of thumb. This means the question of normativity correct behavior is wide open even in the simple case of trying to predict what comes next in a sequence.

Now, Logical Induction addresses all three of these points (at least, giving us progress on all three fronts). We could take the lesson to be: we just had to go "one level higher", setting up a system like logical induction which learns how to probabilistically reason. Now we are at the right level for foundationalism. Logical induction, not classical probability theory, is the right principle for codifying correct reasoning.

Or, if not logical induction, perhaps the next meta-level will turn out to be the right one?

But what if we don't have to find a foundational level?

I've updated to a kind of quasi-anti-foundationalist position. I'm not against finding a strong foundation in principle (and indeed, I think it's a useful project!), but I'm saying that as a matter of fact, we have a lot of uncertainty, and it sure would be nice to have a normative theory which allowed us to account for that (a kind of afoundationalist normative theory -- not anti-foundationalist, but not strictly foundationalist, either). This should still be a strong formal theory, but one which requires weaker assumptions than usual (in much the same way reasoning about the world via probability theory requires weaker assumptions than reasoning about the world via pure logic).

Stopping at ℵ0.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

My main objection to anti-foundationalist positions is that they're just giving up; they don't answer questions and offer insight. Perhaps that's a lack of understanding on my part. (I haven't tried that hard to understand anti-foundationalist positions.) But I still feel that way.

So, rather than give up, I want to provide a framework which holds across meta-levels (as I discussed in Learning Normativity).

This would be a framework in which an agent can balance uncertainty at all the levels, without dogmatic foundational beliefs at any level.

Doesn't this just create a new infinite meta-level, above all of the finite meta-levels?

A mathematical analogy would be to say that I'm going for "cardinal infinity" rather than "ordinal infinity". The first ordinal infinity is ω, which is greater than all finite numbers. But ω is less than ω+1. So building something at "level ω" would indeed be "just another meta-level" which could be surpassed by level ω+1, which could be surpassed by ω+2, and so on.

Cardinal infinities, on the other hand, don't work like that. The first infinite cardinal is ℵ0, but ℵ0+1=ℵ0 -- we can't get bigger by adding one. This is the sort of meta-level I want: a meta-level which also oversees itself in some sense, so that we aren't just creating a new level at which problems can arise.

This is what I meant by "collapsing the meta-levels" in Learning Normativity. The finite levels might still exist, but there's a level at which everything can be put together.

Still, even so, isn't this still a "foundation" at some level?

Well, yes and no. It should be a framework in which a very broad range of reasoning could be supported, while also making some rationality assumptions. In this sense it would be a theory of rationality purporting to "explain" (ie categorize/organize) all rational reasoning (with a particular, but broad, notion of rational). In this sense it seems not so different from other foundational theories.

On the other hand, this would be something more provisional by design -- something which would "get out of the way" of a real foundation if one arrived. It would seek to make far fewer claims overall than is usual for a foundationalist theory.

What's the hierarchy?

So far, I've been pretty vague about the actual hierarchy, aside from giving examples and talking about "meta-levels".

The ℵ0 analogy brings to mind a linear hierarchy, with a first level and a series of higher and higher levels. Each next level does something like "handling uncertainty about the previous level".

However, my recursive quantilization proposal created a branching hierarchy. This is because the building block for that hierarchy required several inputs.

I think the exact form of the hierarchy is a matter for specific proposals. But I do think some specific levels ought to exist:

  • Object-level values.
  • Information about value-learning, which helps update the object-level values.
  • Object-level beliefs.
  • Generic information about what distinguishes a good hypothesis. This includes Occam's razor as well as information about what makes a hypothesis malign.
Normative Values

It's difficult to believe humans have a utility function.

It's easier to believe humans have expectations on propositions, but this still falls apart at the seams (EG, not all propositions are explicitly represented in my head at a given moment, it'll be difficult to define exactly which neural signals are the expectations, etc).

We can try to define values as what we would think if we had a really long time to consider the question; but this has its own problems, such as humans going crazy or experiencing value drift if they think for too long.

We can try to define values as what a human would think after an hour, if that human had access to HCH; but this relies on the limited ability of a human to use HCH to accelerate philosophical progress.

Imagine a value-learning system where you don't have to give any solid definition of what it is for humans to have values, but rather, can give a number of proxies, point to flaws in the proxies, give feedback on how to reason about those flaws, and so on. The system would try to generalize all of this reasoning, to figure out what the thing being pointed at could be.

We could describe humans deliberating under ideal conditions, point out issues with humans getting old, discuss what it might mean for those humans to go crazy or experience value drift, examine how the system is reasoning about all of this and give feedback, discuss what it would mean for those humans to reason well or poorly, ...

We could never entirely pin down the concept of human values, but at some point, the system would be reasoning so much like us (or rather, so much like we would want to reason) that this wouldn't be a concern.

Comparison to Other Approaches

This is most directly an approach for solving meta-philosophy.

Obviously, the direction indicated in this post has a lot in common with Paul-style approaches. My outside view is that this is me reasoning my way around to a Paul-ish position. However, my inside view still has significant differences, which I haven't fully articulated for myself yet.



Discuss

What would a world of widespread statistical numeracy look like?

18 ноября, 2020 - 19:23
Published on November 18, 2020 4:23 PM GMT

I was just reading the thread comparing covid and tobacco, and it made me start wondering about the effect of statistical numeracy in general.

Personally, I have a lot of room for improvement when it comes to these skills (but at least I am aware of this). I do regularly notice the difference in my impression when someone talks about a 3x increase vs a 300% increase, or 1/1000 vs 0.1%, etc.; and I often make a quick conversion in my head when it's convenient. I also know a few mortality stats by heart which I can use to very roughly benchmark certain claims I hear about risk and safety.

Frequently, when I practice this minimal numeracy, it is accompanied by a sense of futility. When the stakes mostly involve policy-making or group action, my own statistical literacy may be inactionable--it may make basically no difference to my life or the world. What matters instead is what sorts of political messages resonate with voters, or what sorts of heuristics will catch on, etc.

So to sharpen my question, suppose you went back in time 20 years, magically caused the whole world to be much more numerate, and then just lived normally for the next 20 years. What about this world, if anything, would be drastically different from our own world?

(For those who want to get serious about the hypothetical: Let's say that in this alternate world, a typical high school graduate in the US has been trained in the habits of mind outlined in the second paragraph. Let's also say that numeracy and literacy track closely--so in any given country, you would be just as surprised to witness base-rate neglect as you would be to witness an inability to read road signs. Feel free to ask for more details or to tweak the hypothetical yourself.)
 



Discuss

Should we postpone AGI until we reach safety?

18 ноября, 2020 - 18:43
Published on November 18, 2020 3:43 PM GMT

Should we postpone AGI, until its risks have fallen below a certain level, thereby applying the precautionary principle? And if so, would setting up policy be a promising way to achieve this?

As I argued here in the comments, I think calling for precautionary principle-policy, notably towards political decision makers, would be a good idea. I've had a great LW and telephone discussion with Daniel Kokotajlo about this, who disagrees, with the arguments below. I think it is valuable to make our lines of reasoning explicit and sharpen them by discussion, which is why I'm promoting them to a post.

Assuming AGI in a relevant time span, there are two ways in which humanity can decrease x-risk to acceptable levels: 1) AI alignment, consisting both of technical AI safety and reasonable values alignment, and 2) AGI postponement until 1 can be achieved with sufficient existential safety (this may be anywhere between soon and never).

Since it is unsure whether we can achieve 1 at all, and also whether we can achieve it in time assuming we can achieve it in principle, we should aim to achieve 2 as well. The main reason is that this could lead to a significant reduction of total existential risk if successful, and we don't know much about how hard it is, so it could well be worthwhile. Companies and academics are both not incentivized to postpone limited AGI development in accordance with the precautionary principle. Therefore, I think we need a different body calling for this, and I think states make sense. As a positive side effect, pressure from possible postponement on companies and academia would incentivize them to invest significantly more in alignment, thereby again reducing existential risk.

Daniel disagreed with me mostly because of three reasons:

  1. Calling for AGI postponement until safety has been achieved, he thinks, would alienate the AI community from the AI safety community and that would hurt the chances of achieving AI safety.
  2. It would be rather difficult to get useful regulations passed, because influencing governments is generally hard and influencing them in the right ways is even harder.
  3. Restricting AGI research while allowing computer hardware progress, other AI progress, etc. to continue should mean that we are making the eventual takeoff faster, by increasing hardware overhang and other kinds of overhang. A faster takeoff is probably more dangerous.

On the first argument, I replied that I think a non-AGI safety group could do this, and therefore not hurt the principally unrelated AGI safety efforts. Such a group could even call for reduction of existential risk in general, further decoupling the two efforts. Also, even if there would be a small adverse effect, I think it would be outweighed by the positive effect of incentivizing corporations and academia into funding more AI safety (since this is now also stimulated by regulation). Daniel said that if this would really be true, which we could establish for example by researching respondents' behaviour, that could change his mind.

I agree with the second counterargument, but if the gain would be large assuming success (which I think is true), and the effort uncertain (which I think is also true), I think exploring the option makes sense.

The third counterargument could be a good one. I think it would be less relevant if we are currently already heading for a fast take-off (which I really don't know). I think this argument requires more thought.

I'm curious about other's opinions on the matter. Do you also think postponing AGI until we reach safety would be a good idea? How could this be achieved? If you disagree, could you explicitly point out which part of the reasoning you agree with (if any), and where your opinion differs?



Discuss

Sunzi's《Methods of War》- Introduction

18 ноября, 2020 - 11:23
Published on November 18, 2020 8:23 AM GMT

This is a translation of the first chapter of The Art of War by Sunzi. No English sources were used. The original text and many of the interpretations herein come from 古诗文网.

孙子曰:兵者,国之大事,死生之地,存亡之道,不可不察也。

War determines life and death of troops, existence and destruction of a country. It cannot be ignored.

故经之以五事,校之以计,而索其情:一曰道,二曰天,三曰地,四曰将、五曰法。

Five aspects are of paramount important:

  1. Dao
  2. Heaven
  3. Earth
  4. Generalship
  5. Method

道者,令民与上同意也,故可以与之死,可以与之生,而不畏危。

"Dao" concerns alignment. Your side must be unified. By dying together, living together, you shall be unafraid.

天者,阴阳,寒暑、时制也。

"Heaven" concerns timing, yin and yang, winter and summer.

地者,远近、险易、广狭、死生也。

"Earth" concerns the near and far, impassable and passable, open fields and choke points, death and life.

将者,智、信、仁、勇、严也。

"Generalship" is a matter of wisdom, fidelity, benevolence, bravery and severity.

法者,曲制、官道、主用也。

"Method" concerns tactics, doctrine and organization.

凡此五者,将莫不闻,知之者胜,不知者不胜。

A commander must not ignore these five aspects. Understanding them brings victory. Lack of understanding does not bring victory.

故校之以计,而索其情,曰:主孰有道?将孰有能?天地孰得?法令孰行?兵众孰强?士卒孰练?赏罚孰明?

Ask yourself: Are ruler and subjects aligned? Is the general capable? Heaven (climate) and Earth (geography) in your favor? Methods effective? Troops strong? Trained? Enlightenedly punished?

吾以此知胜负矣。将听吾计,用之必胜,留之;将不听吾计,用之必败,去之。计利以听,乃为之势,以佐其外。势者,因利而制权也。

These things determine victory and defeat.

兵者,诡道也。故能而示之不能,用而示之不用,近而示之远,远而示之近;利而诱之,乱而取之,实而备之,强而避之,怒而挠之,卑而骄之,佚而劳之,亲而离之。攻其无备,出其不意。此兵家之胜,不可先传也。

The art of war depends on local conditions. The near informs you about the far. The far informs you about the near.

  • If the enemy is clever then tempt.
  • If the enemy is disordered then raid.
  • If the enemy is capable then prepare.
  • If the enemy is mighty then run.
  • If the enemy is angry then provoke.
  • If the enemy is humble then be arrogant.
  • If the enemy is dissolute then toil honorably.

Attack where the enemy is unprepared. Do what is least expected. But do not forget the five aspects. They are of primary importance.

夫未战而庙算胜者,得算多也;未战而庙算不胜者,得算少也。多算胜,少算不胜,而况于无算乎?吾以此观之,胜负见矣。

A war cannot be won without lots of equipment. This facet of war too must be examined.



Discuss

Comparative Advantage Intuition

18 ноября, 2020 - 04:23
Published on November 18, 2020 1:23 AM GMT

On an intuitive level, how much should comparative advantage be taken into account when deciding on a life path?

It seems unrealistic to analyze every factor involved to maximize utility, but without doing that, there is a wide range of extents to which comparative advantage can be favored or disfavored compared to focusing on more important issues.

There are, I'm sure, other factors that don't instantly spring to mind that affect this significantly, like how easily a given person develops new skills. I am also asking because I don't know what all of those factors are.



Discuss

Thoughts on Voting Methods

17 ноября, 2020 - 23:23
Published on November 17, 2020 8:23 PM GMT

I've been nerd-sniped by voting theory recently. This post is a fairly disorganized set of thoughts.

Condorcet Isn't Utilitarian

The condorcet criterion doesn't make very much sense to me. My impression is that a good chuck of hard-core theorists think of this as one of the most important criteria for a voting method to satisfy. (I'm not really sure if that's true.)

What the condorcet criterion says is: if a candidate would win pairwise elections against each other candidate, they should win the whole election.

Here's my counterexample.

Consider an election with four candidates, and three major parties. The three major parties are at each other's throats. If one of them wins, they will enact laws which plunder the living daylights out of the losing parties, transferring wealth to their supporters.

The fourth candidate will plunder everyone and keep all the wealth. However, the fourth candidate is slightly worse at plundering than the other three.

We can model this scenario with just three voters for simplicity. Here are the voter utilities for the different candidates:

 CandidatesVoters  ABCD110000120100013001001

D would beat everyone in a head-to-head election. But D is the worst option from a utilitarian standpoint!! Furthermore, I think I endorse the utilitarian judgement here. This is an election with only terrible options, but out of those terrible options, D is the worst.

VSE Isn't Everything

VSE is a way of basically calculating a utilitarian score for an election method, based on simulating a large number of elections. This is great! I think we should basically look at VSE first, as a way of evaluating proposed systems, and secondarily evaluate formal properties (such as the condorcet criterion, or preferably, others that make more sense) as a way of determining how robust the system is to crazy scenarios.

But I'm also somewhat dissatisfied with VSE; I think there might be better ways of calculating statistical scores for voting methods.

Candidate Options Matter

As we saw in the example for Condorcet, an election can't give very good results if all the candidates are awful, no matter how good the voting method.

Voting Methods Influence Candidate Selection

Some voting methods, specifically plurality (aka first-past-the-post) and instant runoff voting, are known to create incentive dynamics which encourage two-party systems to eventually emerge.

In order to model this, we would need to simulate many rounds of elections, with candidates (/political parties) responding to the incentives placed upon them for re-election. VSE instead simulates many independent elections, with randomly selected candidates.

Candidate Selection Systems Should Be Part of the Question

Furthermore, even if we ignore the previous point and restrict our attention to single elections, it seems really important to model the selection of candidates. Randomly selected candidates will be much different from those selected by the republican and democratic parties. These democratically selected candidates will probably be much better, in fact -- both parties know that they have to select a candidate who has broad appeal.

Furthermore, this would allow us to try and design better candidate selection methods.

I admit that this would be a distraction if the goal is just to score voting methods in the abstract. But if the goal is to actually implement better systems, then modeling candidate selection seems pretty important.

Utilitarianism Isn't Friendly

Suppose I modify the example from the beginning, to make the fourth candidate significantly worse at plundering the electorate:

 CandidatesVoters  ABCD110000332010003330010033

Candidate D is still the utilitarian-worst candidate, by 1 utilon. But now (at least for me), the condorcet-winner idea starts to have some appeal: D is a good compromise candidate.

We don't just want a voting method to optimize total utility. We also want it to discourage unfair outcomes in some sense. I can think of two different ways to formalize this:

  • Discourage wealth transfers. This is the more libertarian/conservative way of thinking about it. Candidates A, B, and C are bad because they take wealth from one person and give it to another person. This encourages rent-seeking behavior through regulatory capture.
  • Encouraging equitable outcomes. A different way of thinking of it is that candidates A, B, and C are terrible because they create a large amount of inequality. This could be formalized by maximizing the product of utilities in the population rather than the sum, in keeping with Nash bargaining theory. Or, more extreme, we could maximize the minimum (in keeping with Rawls).

These two perspectives are ultimately incompatible, but the point is, VSE doesn't capture either of them. It doing so, it allows some very nasty dynamics to be counted as high VSE.

Obviously, the Condorcet criterion does capture this -- but, like maximizing the minimum voter's utility, I would say it strays too far from utilitarianism.

Selectorate Theory

This subsection and those that follow are based on reading The Dictator's Handbook by Bruce Bueno de Mesquita. (You might also want to check out The Logic of Political Survival, which I believe is a more formal version of selectorate theory.) For a short summary, see The Rules for Rulers video by CPG Grey.

The basic idea is that rulers do whatever it takes to stay in power. This means satisfying a number of key supporters, while maintaining personal control of the resources needed to maintain that satisfaction. If the number of supporters a ruler needs to satisfy is smaller, the government is more autocratic; if it is larger, the government is more democratic. This is a spectrum, with the life of the average citizen getting worse as we slide down the scale from democracy to autocracy.

Bruce Bueno de Mesquita claims that the size of the selectorate is the most important variable for governance. I claim that VSE does little to capture this variable.

Cutting the Pie

Imagine the classic pie-cutting problem: there are N people and 1 pie to share between them. Players must decide on a pie-cutting strategy by plurality vote.

There is one "fair" solution, namely to cut a 1/N piece for each player. But the point of this game is that there are many other equilibria, and none of them are stable under collusion.

If the vote would otherwise go to the fair solution, then half-plus-one of the people could get together and say "Let's all vote to split the pie just between us!". 

But if that happened, then slightly more than half of that group could conspire together to split the pie just between them. And so on.

This is the pull toward autocracy: coalitions can increase their per-member rewards by reducing the number of coalition members.

Note that VSE is unable to see a problem here, because of its utilitarian foundation. By definition, a pie-cutting problem results in the same total utility no matter what (and, the same average utility) -- even if the winner wins on a tiny coalition.

VSE's failure to capture this also goes back to its failure to capture the problem of poor options on ballots. If the fair pie-cut was always on the ballot, then a coalition of less than 50% should never be able to win. (This is of course not a guarantee with plurality, but we know plurality is bad.)

Growing the Pie

Of course, the size of the pie is not really fixed. A government can enact good policies to grow the size of the pie, which means more for everyone, or at least more for those in power.

Bruce Bueno de Mesquita points out that the same public goods which grow the economy make revolution easier. Growing the pie is not worth the risk for autocracies. The more autocratic a government, the less such resources it will provide. The more democratic it is, the more it will provide. Growing the pie is the only way a 100% democracy can provide wealth to its constituents, and is still quite appealing to even moderately democratic governments. (He even cites research suggesting that between states within the early USA, significant economic differences can be largely explained by differences in the state governments. The effective amount of support needed to win in state elections in the early USA differed greatly. These differences explain the later economic success of the northern states better than several other hypotheses. See Chapter 10 of The Dictator's Handbook.)

Bruce Bueno de Mesquita argues that this is the reason that domocracy and autocracy are each more or less stable. A large coalition has a tendency to promote further democratization, as growing the coalition has a tendency to grow the pie further. A small coalition has no such incentive, and instead has a tendency to contract further.

VSE can, of course, capture the idea that growing the pie is good. But I worry that by failing to capture winning coalition size, it fails to encourage this in the long term.

How can we define the size of the winning coalition for election methods in general, and define modifications of VSE which take selectorate theory into account?



Discuss

How the Moderna vaccine works, and a note about mRNA vaccines

17 ноября, 2020 - 20:22
Published on November 17, 2020 5:22 PM GMT

Epistemic status: Pretty confident I have it right. I'm not an expert, but I'm asking for feedback from experts, and changes would be added here.

What the Moderna vaccine does is it contains a piece of code (RNA) which asks your body to create a protein that is very similar to the SARS-COV-2 spike protein (which it uses to attach to your cells), but modified in such a way that it doesn't change shape when it touches the ACE-2 receptor (another protein that's on the surface of your cell). That's because the easiest place for an antibody to attach to gets hidden otherwise. Your body then says, "Hey, that's something new! Let's attack it!" and tries out a bunch of different things.

The good thing about the vaccine is that there are no viral particles. The only thing it contains are the code (plus other code that doesn't get used to generate things, but are useful in keeping the rest of the code stable). And, unlike the Pfizer vaccine, it's not self-propagating. In other words, the amount of RNA you get in your two shots would be all that's needed, and your body doesn't make extra strands of RNA that would ask for more spike proteins. And it gets a better immune response than normal COVID, which is awesome.

Side effects are minimal. Less than 2% of the people get a fever, and most don't even have a headache. Just some muscle pain etc near the injection site.

(The Pfizer vaccine works differently, but it's a similar idea. RNA code to produce the antigen to get an antibody response.)

These are the first ever mRNA vaccines. I've been following Moderna for a few years, and they were working on a MERS vaccine in the past. 4 days after the SARS-COV-2 genome was uploaded, they'd decided which sections they want to use, and which edits needed to be made, and they started manufacturing it (not mass production) literally the next day, and that was still in January. It's very modular, easy to change and modify, and most labs would be able to make their own vaccines with machines they already have.



Discuss

Comparing Covid and Tobacco

17 ноября, 2020 - 19:13
Published on November 17, 2020 4:13 PM GMT

Tobacco kills 5 million people every year [1]. Covid probably won't pass 5 million this year regardless of our policies or behavior. And yet, Covid has been the focus of far greater scarce political attention than Tobacco. We have accepted an increase of 150 million people in global severe poverty and the trillions in economic damage to prevent Covid deaths. Tobacco eradication has received far less attention. What do you think are the biggest reasons for this difference?

  1. Covid is new, people over-react to new threats.
  2. Covid affects the most politically organized and powerful world demographics: Old people in rich countries. Tobacco affects poor old people in poor countries.
  3. Covid is more tractable than Tobacco (the cost of preventing a Covid death is lower than preventing a Tobacco death). This seems unlikely, but I'm open to an argument.
  4. The tax incentives cause governments to neglect the Tobacco problem.
  5. People individually do not mind staying at home and watching Netflix. They therefore share/read/write more about Covid.
  6. The world's policy elite knows people with Covid but knows very few tobacco addicts in Lebanon or China.
  7. Policy Elite believes people can rationally decide to consume tobacco (hurt themselves) but not decide to social distance (hurt others)
  8. The Covid attention results from a massive availability cascade. Once an issue becomes available enough to the policy elite, it's salience is self-reinforcing.
  9. Individuals can affect Covid but organizing Tobacco policy NGO's for developing countries is a more complicated model.

Thoughts?

  1. WHO (World Health Organization). 2012b. ‘‘Why Tobacco Is a Public Health Priority.’’ www.who.int/tobacco/health_priority/en/.


Discuss

Страницы