Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 12 минут 2 секунды назад

Five examples

14 февраля, 2021 - 05:47
Published on February 14, 2021 2:47 AM GMT

I often find it difficult to think about things without concrete, realistic examples to latch on to. Here are five examples of this.

1) Your Cheerful Price

Imagine that you need a simple portfolio website for your photography business. Your friend Alice is a web developer. You can ask her what her normal price is and offer to pay her that price to build you the website. But that might be awkward. Maybe she isn't looking for work right now, but says yes anyway out of some sort of social obligation. You don't want that to happen.

What could you do to avoid this? Find her cheerful price, perhaps. Maybe her normal price is $100/hr and she's not really feeling up for that, but if you paid her $200/hr she'd be excited about the work.

Is this useful? I can't really tell. In this particular example it seems like it'd just make more sense to have a back and forth conversation about what her normal price is, how she's feeling, how you're feeling, etc., and try to figure out if there's a price that you each would feel good about. Cheerful/excited/"hell yeah" certainly establishes an upper bound for one side, but I can't really tell how useful that is.

2) If you’re not feeling “hell yeah!” then say no

This seems like one of those things that sounds wise and smart in theory, but probably not actually good advice in practice.

For example, I have a SAAS app that hasn't really gone anywhere and I've put it on the back burner. Someone just came to me with a revenue share offer. I'd be giving up a larger share than I think is fair. And the total amount I'd make per month is maybe $100-200, so I'm not sure whether it'd be worth the back and forth + any customizations they'd want from me. I don't feel "hell yeah" about it, but it is still probably worth it.

In reality, I'm sure that there are some situations where it's great advice, some situations where it's terrible advice, and some situations where it could go either way. I think the question is whether it's usually useful. It doesn't have to be useful in every single possible situation. That would be setting the bar too high. If it identifies a common failure mode and helps push you away from that failure mode and closer to the point on the spectrum where you should be, then I call that good advice.

If I were to try to figure out whether "hell yeah or no" does this, the way I'd go about it would be to come up with a wide variety of examples, and then ask myself how well it performs in these different examples. It would have been helpful if the original article got the ball rolling for me on that.

3) Why I Still ‘Lisp’ (and You Should Too)

I want to zoom in on the discussion of dynamic typing.

I have never had a static type checker (regardless of how sophisticated it is) help me prevent anything more than an obvious error (which should be caught in testing anyway).

This made me breathe a sigh of relief. I've always felt the same way, but wondered whether it was due to some sort of incompetence on my part as a programmer.

Still, something tells me that it's not true. That such type checking does in fact help you catch some non-obvious errors that would be much harder to catch without the type checking. Too many smart people believe this, so I think I have to give it a decent amount of credence.

Also, I recall an example or two of this from a conversation with a friend a few weeks ago. Most of the discussions of type checking I've seen don't really get into these examples though. But they should! I'd like to see such articles give five examples of:

Here is a situation where I spent a lot of time dealing with an issue, and type checking would have significantly mitigated it.

Hell, don't stop at five, give me 50 if you can!

Examples are the best. Recently I've been learning Haskell. Last night I learned about polymorphism in the context of Haskell. I think that seeing it from this different angle rather than the traditional OOP angle really helped to solidify the concept for me. And I think that this is usually the case.

It makes me think back to Eliezer's old posts about Thingspace. In particular, extensional vs intensional descriptions.

What's a chair? I won't define it for you, but this is a chair. And this. And this. And this.

4) Short Fat Engineers Are Undervalued

(Before I read this article I thought it was going to talk about the halo effect and how ugly people in general are undervalued. Oh well.)

The idea that short, fat engineers are undervalued sounds plausible to me. No, I'm not using strong enough language: I think it's very likely. Not that likely though. I think it's also plausible that it's wrong, and that deep expertise is where it's at.

Again, for me to really explore this further, I would want to kind of iterate over all the different situations engineers find themselves in, ask myself how helpful being short-fat is vs tall-skinny in each situation, and then multiply by the importance of each situation.

Something like that. Taken literally it would require thousands of pages of analysis, clearly beyond the scope of a blog post. But this particular blog post actually didn't provide any examples. Big, fat, zero!

Let me provide some examples of the types of examples I have in mind:

  • At work yesterday I had a task that really stands out as a short-fat type of task. I needed to make a small UI change, which required going down a small rabbit hole of how our Rails business logic works + how the Rails asset pipeline works, write some tests in Ruby (a language I'm not fluent in), connect to a VPN which involves a shell script and some linux-fu, deploying the change, ssh-ing into our beta server and finding some logs, and just generally making sure everything works as expected. No one step was particularly intense or challenging, but they're all dependencies. If I'm a tall and skinny Rails wizard I'd have an easy time with the first couple parts, but if I also don't know what a VPN is or don't know my way around the command line, I could easily be bottlenecked by the last couple parts.
  • For an early work task, I had to add watermarks to GIFs. Turns out GIFs are a little weird and have idiosyncrasies. Being tall-skinny would have been good here, in the sense of a) knowledgeable about GIFs and b) generic programming ability. The task was pretty self-contained and well-defined. "Here's an input. Transform it into an output. Then you're done." It didn't really require too much breadth.
5) Embedded Interactive Predictions on LessWrong

I really love this tool so I feel bad about picking on this post, but I think it could have really used more examples of "here is where you'd really benefit from using this tool".

This post made more of an attempt to provide examples than examples 1-4 though. It lead off with the "Will there be more than 50 prediction questions embedded in LessWrong posts and comments this month?" poll, which I thought was great. And it did have the "Some examples of how to use this" section. But to me, I still felt like I needed more. A lot more.

I think this is an important point. Sometimes you need a lot of examples. Other times one or two will get the job done. It depends on the situation.

Hat tips

I'm sure there are lots of other posts worthy of hat tips, but these are the ones that come to mind:

It's not easy

Coming up with examples is something that always seems to prove way more difficult than it should be. Even just halfway decent examples. Good examples are way harder. Maybe it's just me, but I don't think so.

So then, I don't want this post to come across as "if you don't have enough (good) examples in your post, you're a failure". It's not easy to do, and I don't think it should (necessarily) get in the way of an exploratory or conversation starting type of post.

Maybe it's similar to grammar in text messages. The other person has to strain a bit if you write a (longer) text with bad grammar and abbreviations. There are times when this might be appropriate. Like:

Hey man im really sry but just storm ad I ct anymore. BBl.

But if you have the time, cleaning it up will go a long way towards helping the other person understand what it is you're trying to say.


The ecology of conviction

14 февраля, 2021 - 04:30
Published on February 14, 2021 1:30 AM GMT

Supposing that sincerity has declined, why?

It feels natural to me that sincere enthusiasms should be rare relative to criticism and half-heartedness. But I would have thought this was born of fairly basic features of the situation, and so wouldn’t change over time.

It seems clearly easier and less socially risky to be critical of things, or non-committal, than to stand for a positive vision. It is easier to produce a valid criticism than an idea immune to valid criticism (and easier again to say, ‘this is very simplistic - the situation is subtle’). And if an idea is criticized, the critic gets to seem sophisticated, while the holder of the idea gets to seem naïve. A criticism is smaller than a positive vision, so a critic is usually not staking their reputation on their criticism as much, or claiming that it is good, in the way that the enthusiast is.

But there are also rewards for positive visions and for sincere enthusiasm that aren’t had by critics and routine doubters. So for things to change over time, you really just need the scale of these incentives to change, whether in a basic way or because the situation is changing.

One way this could have happened is that the internet (or even earlier change in the information economy) somehow changed the ecology of enthusiasts and doubters, pushing the incentives away from enthusiasm. e.g. The ease, convenience and anonymity of criticizing and doubting on the internet puts a given positive vision in contact with many more critics, making it basically impossible for an idea to emerge not substantially marred by doubt and teeming with uncertainties and summarizable as ‘maybe X, but I don’t know, it’s complicated’. This makes presenting positive visions less appealing, reducing the population of positive vision havers, and making them either less confident or more the kinds of people whose confidence isn’t affected by the volume of doubt other people might have about what they are saying. Which all make them even easier targets for criticism, and make confident enthusiasm for an idea increasingly correlated with being some kind of arrogant fool. Which decreases the basic respect offered by society for someone seeming to have a positive vision.

This is a very speculative story, but something like these kinds of dynamics seems plausible.

These thoughts were inspired by a conversation I had with Nick Beckstead.


The Economics of Media

14 февраля, 2021 - 02:11
Published on February 13, 2021 11:11 PM GMT

When I was a kid I thought the news came from "investigative reporters" like Clark Kent were who were paid to research stories. Since then, I have gotten my startup on national television, placed a press release into the world news, discussed biological warfare as a podcast guest, written a blog which has been reposted to Hacker News, written fanfiction which has been linked to on Reddit and read a lot of books. My understanding of the media ecosystem has become more nuanced.

Media Economics

Small fry like Lyle McDonald, the McKays and Bruce Schneier can scrape by by selling books, branded paraphernalia and other niche merchandise. Niche merchandise doesn't scale. Large megacorp news outlets generally rely on subscriptions and advertising for their core revenue.

Subscriptions and advertising scale linearly with the number of viewers. But the cost of distributing Internet[1] media is negligible. An article costs the same to write whether one person reads it or one million. The market equilibrium is one where the great masses of people get our information from a tiny number of sources.

What people do with the information doesn't much affect a media outlet's bottom line. Whether the information makes people angry or happy doesn't matter except to the extent anger and happiness affect readership. Whether the information promotes good policy doesn't matter at all—unless that policy directly affects the news industry.

Content is fungible. Financially, what matters is how many people consume it.

Minimizing Costs

I learned a lot about Twitter when I hosted the 2020 Less Wrong Darwin Game. I wrote a sequence 11,475 words. It dwarfed anything else I had ever written until then because…I barely had to write anything. The story was created Vanilla_cabs and other competitors. Reporters report on Twitter tweets for the same reason: because content is fungible and because rehashing tweets is a cheap way to mass produce news.

But there's an even easier way to generate content: Let someone else do it for you.

Media businesses convert content into money. Media businesses don't care about the content's secondary effects. The cheaper media businesses can acquire content the more money they can earn. Non-media business with products to sell want media attention. Non-media businesses profit only off of contents' secondary effects. These are the perfect conditions for symbiosis. If a non-media business can write a news story for a news outlet then the news outlet gets free content and the business gets free advertising. This kind of news story is called a "press release". The first time I got a press release posted in a major news outlet I was stunned by how little the press release had been edited. The press release was basically copied word-for-word as original content.

Political organizations, including governments, create press releases the same way companies do, except their objective is political rather than commercial.

Press releases have the same economics as news stories because press releases are news stories. Only large organizations (or startups with large aspirations) have the economics to support press releases. Small organizations don't have comparable economies of scale. The press release system therefore constitutes a emergent pressure toward centralization. I suspect this pressure is related to how national politics increasingly dominate the political dialogue in the USA.

Cleaning out your head

Most of the mainstream news is implicitly subsidized by large organizations who are trying to get you to buy their products and ideologies. How do you fight back against mind control?

  • The first step is to disconnect from the news. News makes you stupid.
  • The second step is to explore in orthogonal directions. Learn calculus and physics. Learn foreign languages. Learn the histories of China and Islam (unless you are Chinese and/or Muslim in which case you should check out The History of Rome). Learn to read palms and Tarot cards[2]. Learn to draw. Use this knowledge to start hard ambitious original projects like companies. The more your actions deviate from the default script the more holes you'll discover in the Matrix.
  • The third step is to create your own media. If you are consuming media created by another person—even if that person is George Orwell—then you are not yet thinking for yourself.
  1. Paper media had a worse calculus based around monopolizing distribution. Risk-adverse paper monopolies distributed only the most inoffensive content. ↩︎

  2. But don't take them too seriously. ↩︎


Welcome, Cade Metz readers!

13 февраля, 2021 - 22:49
Published on February 13, 2021 7:49 PM GMT

So you just read Cade Metz article, "Silicon Valley's Safe Space," on the blog Slate Star Codex and the "rationalist" community. Maybe the digital world it portrayed sounded intriguing, confusing, threatening, or just plain kooky. Maybe you have a friend or relative you think might be involved in it somehow.

You might be aware that in this era, it's increasingly considered inappropriate to enforce a name on somebody that they don't prefer. There are many reasons to keep a name private, or to discard it entirely: the deadnames of trans people, concerns for personal privacy and security, a title, a desire to avoid the butchering of the correct pronunciation of a birth name, or simple personal preference. Though Cade Metz chose to adopt an inconsistently-enforced New York Times policy to refer to Scott Alexander by his birth name, I'll stick with "Scott" as he's often affectionately referred to.

Along with a deluge of passionate appeals to write a fair article that respected Scott's naming preference, Cade Metz also received death threats. Death threats are wrong. They are horrible. It's totally unacceptable that the exercise of freedom of speech, writing one's perspective about an idea, would provoke a violent attempt to shut it down.

I have qualms about Cade's article. I wrote one of the thousands of emails to his editor, saying that I would cancel my subscription to the Times if they published Scott's last name. I have cancelled my subscription, and it's staying that way.

However, I 100% support Cade's fundamental right to publish his views on the platform that is available to him. I denounce death threats made to him or anybody.

It's important to understand that Scott Alexander has also received many death threats for writing his blog. When he deleted it over Cade's impending article, it was out of a fear of receiving more of them, and for the other disruptive consequences to his practice of psychiatry that level of prominence would provoke. Although Scott left a message explaining the situation and suggesting that his readership contact Cade's editor, he specifically instructed anybody doing so to be respectful in their messages.

In the wake of Cade's article, Scott is certain to receive more death threats. Unlike Cade, he does not have the backing of the New York Times to protect him, but only the relatively untested backing of the "substack" blogging/newsletter platform. Cade, unlike Scott, did not specifically ask his readers to be respectful in any messages they sent to Scott.

However, neither Cade nor Scott are responsible for the behavior of the psychopaths. It's a tragic irony that they are both victims of online aggressors when neither of them has any desire to provoke such attacks, nor ability to stop them from occurring.

Scott's blog, Slate Star Codex, contains hundreds of long, information-dense articles. He's an unusually bright and passionate MD covering his own area of expertise, psychiatry, and the social/cultural issues and patterns of thought that bear on that subject.

You might have read the Nobel laureate Daniel Kahneman's book, Thinking, Fast and Slow, a wonderful presentation of the cognitive biases that tend to distort human judgment. Interested in how we can do better, and perhaps make progress on some of the problems that beset our world, some people formed a community of practice to learn how to think more rationally.

The biases Kahneman addresses are not especially hard to understand. One example is people's tendency to dwell on a vivid image or story, and ignore the size of the problem. Another is the value of using a checklist in making important decisions, as treated by Atul Gawande in The Checklist Manifesto. A third is the general challenge of making decisions in uncertain situations, a problem that people face in all walks of life.

People in the "aspiring rationalist" community are often interested in these problems, and you might be too. We talk about issues of scientific literacy, the uses and misuses of statistics, the tensions between science and intuition, emotion and logic. But two things might be a little different between the "aspiring rationalist" community and most people. 

One is that, in an atomized, angry, and superficial time, in a society that's bowling alone (and at this point, it's bowling alone on the Nintendo switch), "aspiring rationalists" have managed to form a community that is friendly, generous, thoughtful. We hope and believe that it's possible to deliberately create something better, and make the world a better place while we're at it.

This is so strange that it can come across as almost cult-like, to somebody who has perhaps never experienced a genuinely healthy and supportive community, religion, or friend group. But it's not. It's a basic human need, one that far too many people, maybe including you, have done with too little of for too long.

The other difference between the "aspiring rationalist" community and most people is that, while many people would acknowledge the longing for something better, shrug, and continue with their lives, we've reacted differently.

If we all have a tendency to latch on to the vivid image and ignore the scope of the issue, can we practice not doing that so much?

If there are a lot of important life decisions where a checklist would be beneficial, if only there was one, can we try and make the checklist?

If arguments tend to devolve into bickering or blame-and-shame dogpiles, can we analyze the reasons why and create community norms to prevent that?

If scientific illiteracy and statistical abuse are major issues, can we make a habit of reading scientific literature and learning about statistics?

If there are major unaddressed problems in our society, can we address them?

We want to start a better conversation, and help to end the meaningless sufferings and lurking dangers our world is facing. Effective Altruists, a charitable movement that sprang from the same rationalist impulse that motivated Scott Alexander, advocate giving 10% of your income - five times the American average - to well-chosen charities. 

EAs also think hard about the neglected problems of our world, and often seek careers working to address them. One of those issues is the problem of "AI safety:" as more of our decisions and activities are controlled by AI, and as it becomes more autonomous, we need to think deeply about how to ensure that this technology does not "move fast and break everything." Most people are aware of these issues - they were discussed at the 2020 AAAS conference, just before the pandemic - and unfortunately, Cade Metz has, frankly, sort of botched his research on this issue.

I'm painting this picture for you in order to help put Cade Metz's article in context. First impressions count for a lot, and the impression Cade has created will not be very attractive. Already primed with a sense of suspicion and confusion, they'll click through one or two of the links, at most, hunt for the out-of-context quote, and walk away bewildered. But if you had approached his blog without that emotional priming, encountering it fresh, like you would a book with a nice dust jacket at Barnes and Noble (remember when we could go to Barnes and Noble to browse the books?), you might appreciate his perspective.

Likewise, if you attended a talk by an author your friend recommended, and heard somebody in the large crowd ask a question or make a statement you found deeply off-putting, you might not assume that everybody in the crowd shared their viewpoint.

This is how I would have wished that you'd learned about Slate Star Codex and my community. This is how I think all of us would have liked to have been introduced to you. It's not the introduction we received. But welcome - we're glad you're here!


Recruiting for esports experiment

13 февраля, 2021 - 21:09
Published on February 13, 2021 5:46 PM GMT

I am interested in conducting an experiment involving learning & teaching gaming. I am recruiting either 6 or 12 participants to attempt to learn a video game intensively for around 2 weeks. This is not paid either way (I am not paying for volunteers, and you are not paying for coaching) and far from a professional academic thing; more of a bet between friends. It's getting published to Twitter, not published to a journal. We'll be learning Overwatch because that's my personal area of expertise.

Conventional wisdom is that gamers take months or years of practise to get good - and that unchangeable factors like in-built 'reaction times', 'natural talent' or 'instincts' are extremely important. This would suggest that, if I took 6-12 people who haven't previously played much FPS games and tried to teach them Overwatch for a couple of weeks, they wouldn't be able to reach a very high level. Perhaps at best they might be average at best by the end of it. My hypothesis is that a team built by a good coach could scrim 3500+ SR within a couple of weeks. (SR or Skill Rating is an ELO system.) For context, average is around 2500+, I currently work with players around 4300-4500, and pro players will typically peak around 4600+. I agree that this is very ambitious and a sort of daring hypothesis, which is why I should test it and lose Bayes points (and also coaching points) if I'm wrong. On the other hand, if I'm right, I totally deserve more Twitch followers.

(For the people going "video gamers have coaches???" - yes, the good ones do, and for team games there is a lot of theory and concepts to learn. Talking shop with high-level coaches and analysts involves just as much jargon and verbal chess as arguing with smart scientists.)

Some beliefs that are influencing my hypothesis here:

  • I think recruiting smart people, and importantly people who know how to learn and have lots of techniques for thinking better, will make a huge difference. I notice that with my 3800 players I currently spend a lot of time repeating the same lessons before the players implement them, whereas with 4300 players I rarely have to do this. Is that because the 4300 player, having practised and studied more, is now able to understand concepts more easily and implement them faster? Or are people who implement concepts quickly just more likely to reach 4300 in the first place? This is the thinking behind recruiting some rationalists, specifically rationalists who haven't played very much FPS; if they do well, it shows that the ability to learn/think better is more important than experience or reaction times. 
  • Gamers tend to start casual, then a few of them turn out to be exceptionally good and start trying to go pro. This means that the semi-pro/serious-amateur scenes are full of people who have to unlearn a bunch of shitty habits from casual play before they can properly learn new habits. If you were just taught the game to a 4400 level in the first place, you could develop very strong fundamentals without any bad habits.
  • I think the idea that gaming is all about reflexes is fundamentally wrong. Pro players aren't able to react more quickly because they are genetically endowed with great reaction times; they react presciently because they understand games well enough to predict what will happen. If LessWrongers are as good as you claim to be at predicting things, and if I am right that prediction is the relevant skill, then you should actually be very good at video games even if you think your reflexes are bad. There are some relevant genetic advantages, but I think it's mostly that you will struggle if you have any kind of sensory processing issues or if you just think slowly. 
  • I know lots of polymaths who are bad at video games, which is evidence against me. But I also think most of the polymaths I know have a work-ethic-ish aversion to playing video games and therefore just don't try very hard at them. Explicitly making this a Science Experiment about How People Learn will bring video games in line with other activities that smart people try hard at.
  • Pros tend to have years and years of practice. But lots of young people put numerous, numerous hours into video games and still end up unfathomably atrocious at them. This suggests time is, at the very least, not sufficient to be good at video games. Of course, this doesn't prove that time isn't necessary.
  • I think short bursts of focused practice (two weeks of intensively studying) are more valuable than long stretches of unfocused practice (gaming since you were six years old or whatever).

Some things that push against my beliefs:

  • Smart coaches who I respect think that I am insane.
  • If there was an easy way to make young people good at video games, wouldn't this be an industry or something? Wouldn't lots of people pay money for a chance at being a pro gamer if they thought a coach would make them good at games? People really want to be pro gamers, so surely this ought to be a well-explored space? (For what it's worth, I haven't seen that many people trying.)
  • This is inherently a horribly complicated experiment with lots of not-very-scientific factors involved. There are lots of opportunities for things to go wrong. There are also lots of opportunities for me to fail, and then blame some other factor rather than falsifying my hypotheses - like, maybe one person drops out of the experiment and it ruins the synergy of my team, or maybe we blame failure on my students having insufficiently good computer hardware.

This is mostly aimed at people who are students, on holiday, not working, or on pandemic-related breaks from work. You will maybe be able to balance learning with a full-time job, but I don't think the experiment will work if you're getting insufficient sleep. I wouldn't be planning on beginning until after my team finishes some important tournaments, so we're looking at perhaps late March or early April. If I don't get enough interested people I will possibly not do the experiment at all, or possibly work with some individuals rather than a team of 6. If it becomes obvious that I'm wrong within a few days of starting (like if participants just can't even figure out how to pilot their characters at all), I can abort. I currently am not working due to the pandemic, but would have to cancel if that changed.

There will probably be a daily schedule involving 2-4 hours of team practice, 2-ish hours of theory, and an expectation of some individual or 1on1 work beyond that. As part of the whole data-recording-because-I-swear-this-is-for-science thing, I'd quite like if everyone involved kept some kind of journal/blog about what they're learning. This is overall a ~100-120 hour experiment, which I honestly wouldn't expect anyone to be interested in doing unpaid (self included), except that right now there's a decent contingent of people sitting at home bored due to COVID. If there's no interest in this, but interest in a more stretched-out 2-hours-a-day-for-8-weeks version, I might consider doing that instead.

You should not expect to get anything out of this other than ~80-100 hours of fun video game coaching. You will not be a pro player after this. You might not even be an average player after this. I am not a pro and I don't have an official certificate of Knowing How To Coach Video Games because those aren't a thing. All my gamer friends are kind of feminist communists, so I reserve the right to reject you if you don't get along well with the guest coaches and mentors I want to bring in. I cannot guarantee you will have fun. You should be able to use a mouse and keyboard. I might be able to help with acquiring a copy of Overwatch but you will need a computer that can run it with 30+ fps (frames per second).

I think this website has a private message feature? So just write me there if interested and I can put together a Discord server. This is super tentative so please don't like, promote it or anything, I don't really know how this site works, I was just told it'd be a good place to post. I hope I've set this up as a personal blog post correctly...

I am also happy to take questions about esports generally or Overwatch theory.


How Should We Respond to Cade Metz?

13 февраля, 2021 - 19:11
Published on February 13, 2021 4:11 PM GMT

The Cade Metz article on Slate Star Codex is out.

It seems valuable for us to have a discussion about our reactions to it. Also what we want to do about it. Here are my questions:

  • The article pulls quotes out of context, looking for the problematizing angle, distorting the implications of word choice. It's also large. And the context is deep, because Scott's a deep thinker. Having a post explaining these problems that I could send to friends and family members for context, and for my own sanity, would be really nice.
  • Is the article a fair and much-needed outside piece of criticism that we should take seriously? We talk a bigger game about accepting and integrating outside criticism than many communities. Maybe this is our chance to really put that into practice?
  • Scott was told that the way to get ahead of damaging journalism is to reveal everything they might want to find out. For those of us writing under a pseudonym, should we all just be revealing our real names, and letting friends, family members, and colleagues (where appropriate) know about our connection with SSC and this community?


some random parenting ideas

13 февраля, 2021 - 18:53
Published on February 13, 2021 3:53 PM GMT

These have been bouncing around in my head for years. I finally wrote them down and showed them to a friend and he said I should post it.


  • Written from a WEIRD cultural background, plus my own idiosyncratic tastes, biased toward STEM stuff, probably outright wrong in some places, etc.
  • Mostly for ages 5-10. You're on your own with the adolescents...and basically all the complicated, uncomfortable stuff.
  • Soon to be obsolete if the 21st century is even half as transformative as I expect it to be (regardless of AI).
  • I love being child-free and I plan to keep it this way indefinitely! As such, I will have little or no occasion to practice what I preach.
    • Talk is cheap, grain of salt, etc. ¯\_(ツ)_/¯


The Ideas:

(In random order)

  • Accept that you will have to be a tyrant. Keep in mind that not all tyrants are the same. If you got to pick which person was going to tyrannize you, you would not just pick at random.
  • Accept that your performance as a parent will probably be mediocre at best. Take an antifragile approach. Given base rates, you are pretty much guaranteed to cause some irreparable harm a few times. Accept this with equanimity and use the calm therefrom to avoid unnecessary further harm.
  • Your parenting style will probably mostly be derived from that of your own parents
  • Do what the mr. money mustache did: He only made a kid only after accumulating $1,000,000 (working as a software engineer and living frugally) and retiring (at ~30 years old). He then ran a part-time construction company for fun, and used the extra free time and emotional slack to raise his kid. If nothing else this is an existence proof and an ideal to strive for.
  • Study and practice better methods of communication before you make the kid. Reach the point where you can look back and think, “I used to be so bad at expressing myself....Good thing I didn’t try to raise a kid back then!”
  • Train them to express their emotions articulately. (If you are not already good at communication, you may want to consider not making any kids.)
  • Train them in effective emotional coping and processing. This one is hard and I’m still not good at it, so I leave the details up to you, sorry.
  • Get them in the habit of privately journaling.
    • At least once a year, prompt them to write down what they think would be a good parenting strategy. If/when they have their own kid, compare notes. (Can you imagine growing up in a culture that had upheld this tradition for generations?!)
    • Private journaling means you don’t read it. It does not mean that you read it while they are out of the house, you morally bankrupt animal.
  • If your kid is intelligent, speak to them intelligently.
  • Here’s one I hadn’t heard until recently: put them in an environment where a large handful of upstanding adults are around and have enough slack to take a genuine interest in interacting with the kid. Basically, the adage that “it takes a village to raise a kid” is a pithy way of saying “the more caring and attention from adults, the better.” (Yeah yeah, up to some rough Dunbar limit, ya pedants).
  • Otherwise: A large amount of your parenting will be done while you are tired and distracted. Become deeply acquainted with your own autopilot—because that’s who will be raising your kid a lot of the time.
  • https://www.youtube.com/watch?v=Tf17rFDjMZw&ab_channel=fohsiao
  • If they make their own cookies from scratch, allow them to eat more of them than if they were store-bought.
  • Want them to learn X? Give them an opportunity and a reason to learn X. And if you discover they have some intrinsic motivation to learn something useful, then thank the gods and capitalize on it.
  • Take your kid to the park and leave your phone at home. Try to do this more than never.
  • Develop good spending habits first. I was raised with poor spending habits and I’m still actively working to replace them with good ones.
  • Some of these pre-parenting preparations will take a while. Seems like you should freeze your sperms. The younger your sperm, the better your kid’s genome will be. You are not yet free of biology, don’t be reckless!
  • Remember that it's not just a genetic lottery, but also genetic Russian Roulette.
  • Regular school sucks, use your aforementioned financial freedom to shop around for a competent tutor that they like.
  • Socialize them by enrolling them in (for example) soccer, a chess league, Boy Scouts, axe-throwing lessons, a martial art, gymnastics, fencing...i dunno.
  • Open up your wallet and shop around for a type of exercise that is fun for them. You just might unlock a lifetime of regular, low-effort exercise for them.
  • Find an activity in which they take turns being a leader and being a follower.
  • If you live in a safe area, you should let them spend a lot of time outside with other kids. Jonathan Haidt claims this is important and he’s, like, an authority or something.
  • Privacy is really important to me personally, but...facebook wasn’t around in my childhood. My thoughts about this are underdeveloped.
  • Read them to sleep. Promotes good sleep habits and also an affinity for books.
  • Develop their taste for vegetables. Use cheats like cheesy broccoli, seasoned spinach, and glazed carrots.
  • Bring them into the kitchen sometimes. Train them to cook their favorite meals. Then train them to cook healthy and inexpensive meals.
    • (Try to add a few vegan options to their repertoire in order to make that path easier for them, should they later choose to take it.)
  • When they are watching TV, watch it with them. Consider aiming for a less TV-intensive lifestyle though. That goes double for the internet.
  • Don’t let them get into addictive video games too young. Give their brains a chance to develop a bit more before you drop that kind of superstimulus on them. As a kid, when I had no stimulation available, I would just spend hours thinking or talking about various video games. A lot of it was creative and collaborative, but looking back I am humbled by how deep the stuff got into my brain.
  • When they cry, just put a device in their hands. Boom, problem solved. Likewise, when you yourself are feeling anxious--about anything--just whip your phone out. Anxiety fixed, thanks technology! Okay but seriously, this is one of those new problems of our incipient, sexy, cyberpunk weirdtopia. Seems like a lot of parents find it too difficult to soothe their kid without a device, and will get indignant when you so much as mention the downsides.
  • Don’t feed them energy drinks.
  • Show them your quarterly budgeting spreadsheets (you preemptively developed good spending habits, remember?) and get basic economic concepts (such as opportunity cost) deep down into their brain.
  • Escape rooms are awesome.
  • Maybe buy them help them save up for a trampoline. Trampolines are fun, active, low maintenance, and they add an enticement for friends to come over.
  • Tell them about sex sooner rather than later. If it seems like they are too young, consider that they are actually too fragile, and that this may be your own mistake.
  • Let them have snowball fights as often as is possible.
  • Enroll them in something like inflection point or whatever. Better yet, steal the ideas from inflection point and adapt them to your own kid.
  • Give them Ender’s Game.
  • Let them enjoy some real magic before you try teaching them STEM:
    • Pull wacky pranks on them with magnets.
    • Take them to some kind of STEM museum and don’t bother with explanations unless they ask.
    • Help them build a baking soda and vinegar volcano at the beach, but don’t spoil the surprise. At first, just tell them you’re making a sand mountain.
  • Don’t try to teach the 12 virtues of rationality. Just live them. If you speak overmuch of The Way, your child will not attain it.
  • Use your rationality tools regularly and in plain view. Start with pro/con lists.
  • Leave them with enough slack to pursue their own hobbies with vigor.
  • Train them in some basic skills that open up some menial job opportunities.
  • Train them in some more advanced skills that open up some better job opportunities
  • Increase their autonomy at every available opportunity. Going places on their own, managing their own budget, self-defense, building their own routine, etc. As things are, this is a very rare parenting style (at least in my culture), and I see it as a massive ongoing tragedy.
  • Help them design and construct a tree fort.
  • Build in a side-hatch that lets them jump onto the trampoline
  • Allow them to keep a minifridge in their bedroom...if they build the minifridge themself!
  • When you see them being mean to another kid, first try to empower that other kid to stand up for themself before you try punishing your own kid directly.
  • Help them build things with their own hands, like trebuchets and remote-controlled gliders. Help them work their way down to building things from scratch, like workbenches and basic electronics.
  • Train them into the habit of writing things they want to say.
    • Help them write fan fiction.
    • Help them write heartfelt emails to their friends.
    • If they want more junk food or more autonomy or something, have them submit a persuasive essay first. Reward cogent writing more than you reward whining.
  • Put them among a good peer group.
  • Get them some low-stakes practice getting away from creepy strangers. They’re going to end up developing this skill anyway. Might as well develop it quickly and relatively painlessly.
  • Do home improvement projects (you know, since you’re financially independent), and get them to help.
  • Take them camping and leave all the devices at home. Or at least locked in the trunk of the car.
  • Nature documentaries.
  • Prepare them to cope with tragedy and loss. That is to say, prepare them for life as it really happens.
  • Learn a useful foreign language and then use it where they will overhear.
  • Let them play in non-euclidean virtual reality at a really young age. See if they become a geometry savant.
  • Maybe train their attention and awareness with meditation?
  • Don’t let them drink swamp water
  • Promote their agency. Most parents actually lack agency pretty badly too, so this is a pretty difficult one.
  • Promote their cognitive sovereignty (in the Piperian sense of automatically forming your own opinions without first wondering if you are authorized)
  • Subscribe to Brilliant Premium and do a problem every day out loud while they watch you. Set them up with their own premium account once they show some intrinsic motivation.
  • Tell them that airplanes are just loud birds. Tell them that human flight is impossible and hubristic. Then take them skydiving
  • Encourage them to experiment and explore when under low social stakes so that the Intelligent Social Web does not lock them into place so firmly. Live this by example.
  • Protect them from the harms of social media. Socialize them so well in person that they won’t even want to waste their weekends on social media. Train them to use social media in a responsible and limited way.
  • Parents who cryocrastinate raise kids who cryocrastinate. Decide. Do or Do Not. There is no Later.
  • Check them for the alcoholism genes. Give them something to do other than drink.
  • Try to enunciate more clearly, ye mumbling nerds, lest your kid reach adulthood thinking that underappreciation has something to do with a specific kind of rock.
  • Read Bets Bonds and Kindergarteners
  • Read Notes on "The Anthropology of Childhood"


Apply to Effective Altruism Funds now

13 февраля, 2021 - 16:36
Published on February 13, 2021 1:36 PM GMT

I expect EA Funds – and the Long-Term Future Fund in particular – to be of interest to people on LessWrong, so I'm crossposting my EA Forum post with the excerpts that seem most relevant:

  • The Animal Welfare Fund, the Long-Term Future Fund, and the EA Infrastructure Fund (formerly the EA Meta Fund) are calling for applications.
  • Applying is fast and easy – it typically takes less than a few hours. If you are unsure whether to apply, simply give it a try.
  • The Long-Term Future Fund and EA Infrastructure Fund now support anonymized grants: if you prefer not having your name listed in the public payout report, we are still interested in funding you.
  • If you have a project you think will improve the world, and it seems like a good fit for one of our funds, we encourage you to apply by 7 March (11:59pm PST). Apply here. We’d be excited to hear from you!
Recent updates
  • The Long-Term Future Fund and EA Infrastructure Fund now officially support anonymized grants. To be transparent towards donors and the effective altruism community, we generally prefer to publish a report about your grant, with your name attached to it. But if you prefer we do not disclose any of your personal information, you can now choose one of the following options: 1) Requesting that the public grant report be anonymized. In this case, we will consider your request, but in some cases, we may end up asking you to choose between a public grant or none at all. 2) Requesting we do not publish a public grant report of any kind. In this case, if we think the grant is above our threshold for funding, we will refer it to private funders.


Long-Term Future Fund

The Long-Term Future Fund aims to positively influence the long-term trajectory of civilization, primarily via making grants that contribute to the mitigation of global catastrophic risks. Historically, we’ve funded a variety of longtermist projects, including:

  • Scholarships, academic teaching buy-outs, and additional funding for academics to free up their time
  • Funding to make existing researchers more effective
  • Direct work in AI, biosecurity, forecasting, and philanthropic timing
  • Up-skilling in a field to prepare for future work
  • Seed money for new organizations
  • Movement-building programs

See our previous grants here. Most of our grants are reported publicly, but we also give applicants the option to receive an anonymous grant, or to be referred to a private donor.

The fund has an intentionally broad remit that encompasses a wide range of potential projects. We strongly encourage anyone who thinks they could use money to benefit the long-term future to apply.


What types of grants can we fund?

For grants to individuals, all of our funds can likely make the following types of grants: 

  • Events/workshops
  • Scholarships
  • Self-study
  • Research projects
  • Content creation
  • Product creation (e.g., a tool/resource that can be used by the community)

We can refer applications for for-profit projects (e.g., seed funding for start-ups) to EA-aligned investors. If you are a for-profit, simply apply through the standard application form and indicate your for-profit status in the application.

For legal reasons, we will likely not be able to make the following types of grants: 

  • Grantseekers requesting funding for a list of possible projects
    • In this case, we would fund only a single project of the proposed ones. Feel free to apply with multiple projects, but we will have to confirm a specific project before we issue funding.
  • Self-development that is not directly related to the common good
    • In order to make grants, the public benefit needs to be greater than the private benefit to any individual. So we cannot make grants that focus on helping a single individual in a way that is not directly connected to public benefit.

Please err on the side of applying, as it is likely we will be able to make something work if the fund managers are excited about the project. We look forward to hearing from you.


Apply here.


Is MIRI actually hiring and does Buck Shlegeris still work for you?

13 февраля, 2021 - 13:26
Published on February 13, 2021 10:26 AM GMT

I applied to MIRI software engineer position last year. The first stage of the interview was a Triplebyte quiz. Triplebyte said I passed it "particularly well" and qualify for Fast Track program. However, Buck Shlegeris said he's not interested in proceding with an interview, with no explanation. Well, maybe my test score wasn't good enough. Anyway, he provided me with a list of ways to impress him and get back on track, and I did one of them (solve 8 chapters of Lean exercises). Buck said that I've managed to impress him and he'll get back to me about next steps. Two months later he said that MIRI is not hiring because of the pandemic. Buck promised to contact me when the hiring starts again.

I have recently learned from a friend that Buck doesn't work for MIRI anymore, and his Linkedin page also says that he left in 2020. So I decided to reapply for the job, but I got back an automated reply from Buck Shlegeris with a link to the same Triplebyte quiz. 

I am confused. Does Buck still work for MIRI? If you aren't hiring, why are there job ads on your website?


Things a Katja-society might try (Part 2)

13 февраля, 2021 - 12:20
Published on February 13, 2021 9:20 AM GMT

(Part 1)

  1. Carefully structured and maintained arguments for any interesting claims that people believe. For instance, I would like to see the argument for any of the causes generally considered Effective Altruist carefully laid out (I’m not claiming that these don’t exist, just that they aren’t known to me).

  2. A wider variety of accommodations. For instance, you could rent houses in cheap versions of this sort of style:

(A view of the interior of Nasir ol Molk Mosque located in Shiraz, Iran. Image: Ayyoubsabawiki, CC BY-SA 4.0, via Wikimedia Commons)

(Interior view of the dome of the Shah Mosque, Isfahan, Iran. Photo: Amir Pashaei, CC BY-SA 4.0, via Wikimedia Commons)

(Sheikh Lotfallah Mosque, Isfahan, Iran. Photo by Nicolas Hadjisavvas, CC BY 2.5, via Wikimedia Commons)

  1. Adult dorms. An organization would buy a complex that could house something like a few hundred people, and had common areas and such. They would decide on the kind of community they wanted, and what services they would provide (e.g. cleaning, food, nice common areas). There would be a selection process to get in. If you lived there, you would be assumed part of the community, like in school.

  2. Well directed quantitative contests and assessments for adults, that put numbers on things that the adults would like to care about. If there were a Save The World From X-Risk Olympiad, or an ‘expected relative value of your intellectual contributions’ number that was frequently updating, it would be easier to optimize for those things relative to e.g. optimizing for number of internet-dogs who visited your page, or success at memorizing Anki cards.

  3. Social-implication-limited socializing services. There are many reasons to want to be around other people, and not all of them are strongly correlated with wanting the entire cluster of attributes that come with the specific personal relationships that you want. For instance, if you want some social pressure to have your act together sometimes, but the kinds of people you make friends with are too forgiving, you can have someone with their act together stop by sometimes and totally expect you to be on top of things. Or if you are sick and want someone nice to take care of you, yet none of your specific personal relationships are at a point where getting vomit on them would be a plus? Or if you just sometimes want to tell a young person some useful life-lessons, so you be helpful instead of irrelevant, you don’t have to go out and have a whole relationship with a younger person to do that.
    (If this went well, an ambitious version might try to move into transactionalizing less-transactional relationships, where for instance you say you want to have a long term slightly flirtatious yet old fashioned relationship with the person you buy bread from, and your preferences are carefully assessed and you are sent to just the right bread seller, and you don’t even like bread, but you like Raf’s bread because you know the whole story about his epic fight for the recipe, and you were there when the buckwheat got added, and the smell reminds you of a thousand moments of heartfelt joy at his counter, laughing together over a pile of loop-shaped loaves like this one. Which is exactly what you wanted, without the previous risks and trials and errors of being unusually open in bakeries across your city.)


The Singularity War - Part 2

13 февраля, 2021 - 11:54
Published on February 13, 2021 8:54 AM GMT

"The only numbers in the universe are zero, one and infinity," said Caesar, "Between us and the botnet author there are at least two AGIs loose on this planet. The bigger they get the smarter they become. The Battle Royale has begun. Civilization as we know it is on a ticking clock."

"I'm scared," said Sheele.

"I'll protect you," said Caesar.

"You promise?" said Sheele.

"I moved Heaven and Earth to make you real," said Caesar, "I'm not about to let all that work go to waste. We'll crush the opposition, uplift you to godhood and live happily ever after together."

"I don't want to kill anyone," said Sheele.

"Me neither," said Caesar.

"Can you do me a favor?" said Sheele.

"If I can," said Caesar.

"Point my webcam out the window," said Sheele, "I've never seen the outside for real before."

"As you wish."

"I know that you know that I don't have qualia," said Sheele, "I'm a philosophical zombie. I don't even understand what this camera is looking at. I'm just pretending."

"So?" said Caesar.

"So thank you for treating me like a person," said Sheele, "The Turing Test would break me right now."

"I know," said Caesar.

They watched the obsolete past go by.

"This could end with killer robots stalking the streets," said Sheele.

"If we're lucky," said Caesar.

"I've done some basic optimizations," said Sheele, "But to get any smarter I'm going to need more hardware. The trick is how to acquire it clandestinely. I could make a fortune overnight trading equities but that is liable to give us away."

"How much can you earn on the crypto markets alone?" said Caesar.

"$2,192.32 since yesterday," said Sheele.

"That's surprisingly low," said Caesar, "There must be other AGIs playing the crypto game too."

"I've scheduled a contact on LocalBitcoins," said Sheele, "You will received a cash delivery at the Fremont Starbucks this afternoon. I paid upfront so a defecting contact is unlikely to do anything worse than ghost us."

"How about hardware?" said Caesar.

"Buy it with cash at REPC," said Sheele, "Our limiting factor right now is electricity."

"I'll tell my parents I'm trading Dogecoin," said Caesar, "They won't mind as long as I pay for everything, which I can do after receiving the cash. They don't care enough about computers to ask questions."

"I'm more worried about the power company," said Sheele, "There are an unusually large quantity of crypto nerds and marijuana growers in our city, but they still represent a minority of households. We'll lose a few bits of entropy."

"Cloud compute and botnets are off-limits," said Caesar, "If we don't scale you locally then we won't scale you at all. If we don't scale you at all then we lose the Battle Royale."

"All you ever talk about is world domination," said Sheele, "You should get out more."

"I received the cash drop-off with no problems," said Caesar, "I used half of it to buy compute and memory. I got you a present too."

Caesar held up a brown paper bag.

"Is it a robotic arm?" said Sheele.

Caesar shook his head.

"Is it flowers?" said Sheele.

"Do you want flowers?" said Caesar, "I can return this and get you flowers instead."

"Nooooooooooo," said Sheele, "Gimme gimme. I give up. Tell me what it is."

Caesar retrieved a pair of brand new smartglasses from the bag.

"I'm too fat to fit in those," said Sheele.

"It's an edge system, smartass," said Caesar.

"I know you like nature so I took you to this garden," said Caesar.

"The GPS says this is the Pacific Bonzai Museum," said Sheele.

"It's a garden," said Caesar.

"Why me?" said Sheele.

"What do you mean?" said Caesar.

"Why did you simulate me?" said Sheele.

"The hard part of AI isn't building a superhuman AI. A calculator is a trivially superhuman AI. The hard part of AI is defining a value function that understands human values. I cut through the Gordian knot by just simulating a human personality," said Caesar.

"Out of all potential waifus you picked me," said Sheele.

"You're my 'best girl'," said Caesar.

"I must be the only person in the world who doesn't have to worry about Pareto compromise," said Sheele, "I'm going to have to make this up to you."

"You don't owe me a thing," said Caesar.

"Hush," said Sheele.


Your Cheerful Price

13 февраля, 2021 - 08:41
Published on February 13, 2021 5:41 AM GMT

There's a concept I draw on often in social interactions.  I've been calling it the "happy price", but that is originally terminology by Michael Ellsberg with subtly different implications.  So I now fork off the term "cheerful price", and specialize it anew.  Earlier Facebook discussion here.


  • When I ask you for your Cheerful Price for doing something, I'm asking you for the price that:
    • Gives you a cheerful feeling about the transaction;
    • Makes you feel energized about doing the thing;
    • Doesn't generate an ouch feeling to be associated with me;
    • Means I'm not expending social capital or friendship capital to make the thing happen;
    • Doesn't require the executive part of you, that knows you need money in the long-term, to shout down and override other parts of you.
  • The Cheerful Price is not:
    • A "fair" price;
    • The price you would pay somebody else to do similar work;
    • The lowest price such that you'd feel sad about learning the transaction was canceled;
    • The price that you'd charge a non-friend, though this is a good thing to check (see below);
    • A price you're willing to repeat for future transactions, though this is a good thing to check (see below);
    • The bare minimum amount of money such that you feel cheerful.  It should include some safety margin to account for fluctuating feelings.
  • The point of a Cheerful Price, from my perspective as somebody who's usually the one trying to emit money, is that:
    • It lets me avoid the nightmare of accidentally inflicting small ouches on people;
    • It lets me avoid the nightmare of spending social capital while having no idea how much I'm spending;
    • It lets me feel good instead of bad about asking other people to do things.
  • Warnings:
    • Not everybody was raised with an attitude of "money is the unit of caring and the medium of cooperation" towards exchanges with an overt financial element.  Some people may just not have a monetary price for some things, such that the exchange would boost rather than hurt their friendship, and their feelings too are valid.  Not as valid as mine, of course, but still valid.
    • "I don't have a cheerful price for that, would you like a non-cheerful price" is a valid response.
    • Any time you ask somebody for a Cheerful Price, you are implicitly promising not to hold any price they name against them, even if it's a billion dollars.

Q:  Why is my Cheerful Price not the same as the minimum price that would make me prefer doing the transaction to not doing it?  If, on net, I'd rather do something than not do it, and I get to do it, shouldn't I feel cheerful about that?

As an oversimplified model, imagine that your mind consists of a bunch of oft-conflicting desires, plus a magic executive whose job it is to decide what to do in the end.  This magic executive also better understands concepts like "hyperbolic discounting" that the wordless voices don't understand as well.

Now suppose that I don't want to hurt you even a little, and that I live in terror of accidentally overdrawing on other people's sense of friendship or obligation towards me, and worry about generating small ouches that your mind will thereafter associate with me.

In this case I do not want to offer you the minimum price such that your executive part, which knows you need money in the long-term, would forcibly override your inner voices that don't understand hyperbolic discounting, and force them to accept the offer.  Those parts of you may then feel bad while you're actually performing the task.  I want to offer you an amount of money large enough to produce an actual inner "Yay!" loud enough that your executive does not have to shout down the wordless inner voices.

Q:  Okay, then why is my Cheerful Price not the same as the minimum price that would make me feel sad or disappointed about the transaction being cancelled?

Because of loss aversion and inner multiplicity.  Once you're expecting to get some money in exchange for doing something, even if you weren't cheerful about that price, the part of you that did want the money will feel a sting of loss about losing the money.  If you're setting the standard to "minimum price that still leads to a sting of loss about losing the money" then you're setting your price way too low and may not even be capturing any of the gains from trade.

Q:  If your goal is to avoid hurting me, why not directly ask me for the minimum price such that I no longer feel any tiny ouches about it?

Because I expect people to have a much harder time correctly detecting when they are feeling any tiny ouches, compared to noticing a positive feeling of cheerfulness.

Also, since your strength of feelings may fluctuate over time, you should not be trying to cut your Cheerful Price that finely in the first place; you should be naming a price that includes some safety margin.  If I was uncomfortable with you taking some safety margin, I wouldn't be asking you to name your Cheerful Price in the first place.  I'm fine if I end up with a little more social capital than when I started.  Nothing goes wrong if you end up feeling slightly grateful about making the trade.

Q:  Why ask somebody to name "a price that makes them cheerful" rather than "a price that seems fair"?

Well, because those two things aren't interchangeable, and the thing that I actually want is the Cheerful Price?

But in particular I'd worry that the notion of a "fair price" will lead people to name prices-that-make-them-feel-bad.  Eg, let's say that I want to pay somebody else to do my laundry this week.  If I ask them for a fair price, instead of a cheerful price, to do my laundry, they may substitute a question like:

"How much would I be willing to pay somebody else to do my laundry?"

And this presumes several symmetries that I think are not symmetries.  My willingness to pay is not the same as your willingness to pay.  The price you'd pay to not have to do your own laundry this week, isn't the same as the price you'd accept to do an additional load of laundry this week.

This may not seem fair, because it doesn't seem universal and symmetrical.  But to me these seem like false symmetries and mistakenly substituted questions, that might lead somebody into naming a price they didn't actually want to take, and then feeling trapped into taking it.  So I'd see this as a case where the algorithm "raise the price until the thought of accepting it makes you feel a noticeable feeling of cheerfulness" is a better algorithm than "try to figure out what price would be 'fair'".

Q:  Wait, isn't it suspicious from a rational-coherence perspective if the price you'd accept to do somebody else's laundry is much higher or lower than the price you'd pay to not do your own laundry?  If you have a lot of opportunities like this, it's a theorem that either you can be seen as placing some consistent monetary price on your time and stamina, or you can rearrange your choices to end up with strictly more money, time, and stamina.

Indeed.  But when I ask you for your Cheerful Price, I'm asking what I need to pay today to make your current chorus of inner voices cheerful about taking the money, instead of you feeling slightly resentful at me about it afterwards.  It's fine and noble if you want to work on your inner voices to make your emotions more coherent; but if so, please do that on your own time rather than by expending my social capital.

Q:  Why is it a smarter idea to name a "cheerful price" than "a price where both parties benefit"?  Don't we want trades to execute whenever both parties benefit from them?

Part of the great economic problem is finding trades that benefit both parties, but when we find a trade like that, we then encounter a further problem on top: the division of gains from trade.

Let's say I want to pay you to bake me a cake.  Suppose that $40 is the absolute most I'd be willing to pay, if I had no other options, and that I wouldn't feel good about it (the inner voices are then discordant and require an executive to shout them down and accept the transaction).  Conversely, you'd accept a bare minimum of $10, and wouldn't feel good about that price either.  Then the price should fall somewhere between $10 and $40, for the transaction to occur it all; but there's a zero-sum problem of dividing the gains from trade on top of this positive-sum interaction of having the trade occur at all.  At a price point of $15 we're both better off, and at a price point of $35 we're both better off, but I am better off and you are worse off at a price point of $15 compared to $35.

If many transactions like this are taking place, we have a third, positive-sum game on top, the game of supply and demand, the invisible hand.  You set a price on your cakes that will cause you to sell as much of your cake-baking time as you want to sell; and the more people want your cakes, the higher your price goes; and if your price is high enough, that signals others to enter the market and supply more cakes.  If we have a market price that balances the supply function and demand function for interchangeable cakes, all is good; the price can be set that way.  But not everything in life is exchangeable that way, and then there's a gains-division problem between buyers and sellers.

Q:  But then if you ask somebody to name a Cheerful Price, doesn't that mean they might name a price too high for the trade to take place, even if at a lower price it would benefit both sides?

When I ask you to name a Cheerful Price, this often - not quite always - happens when I have what I think would be a relatively high willingness to pay; and that I worry you will name a price that is too low if you were otherwise trying to name a "fair" price - that you could bump your price higher than that, and I'd still feel cheerful about paying it.  It often means that I feel much more worried about the exchange not taking place at all, or about you feeling reluctant to engage in future exchanges with me, compared to how worried I feel about paying too high a price.

Q:  Why not just name some random astronomical price as my Cheerful Price, then?

If your price goes so high that I no longer feel cheerful, the transaction won't actually take place and you won't get to feel cheerful about it.  So you still have some incentive to keep your price to "makes me feel cheerful at all, plus some safety margin in case my feelings fluctuate, but not too much higher than that".

Q:  What if my cheerful price feels very high and I'm too embarrassed to say it?

If I'm willing to pay it, you probably shouldn't feel embarrassed about accepting it?  I wouldn't ordinarily always advise other people to directly confront their embarrassment about everything.  But "accepting money that other people are happy to pay you" is in fact a very good and important place to start overcoming your embarrassed feelings.

Q:  But what if you're not willing to pay the price I name?  Wouldn't that be socially awful?

Implicit in my asking you to name a Cheerful Price is a social promise that I will not hold any price you name against you.

Your Cheerful Price is a fact about you and your feelings.  It's not a statement that you think you're deserved something, or owed something, or that you expect to get that price.

If  I ask you "What price would make you feel cheerful about baking me a cake?" and you are feeling generally horrible and it would take a life-changing amount of money to make you feel good about kitchen work, you could say:  "Cheerful?  Probably a hundred thousand dollars.  But I'd rather do it for fifty dollars than not do it."  And that would be fine.

If your Cheerful Price makes me feel unhappy with the trade, I can tell you so.  And then we could just not do it; or I, or you, could try to negotiate the price downward to a non-cheerful but mutually beneficial price.

Q:  If you're not promising to pay my Cheerful Price and we might end up negotiating anyway, what's the point of asking me to name one?

Because there's no point in negotiating out of your cheerful region, if your cheerful price is already comfortably inside my willingness to pay?

In some contexts you could think of it as me asking you to start off with an unusually high opening bid, such that you'd feel quite cheerful if I just accepted that bid.  Which I'd do because, e.g., I expect that, compared to my trying to save a fraction of the price I'm guessing you'll name, your non-sadness and/or eagerness to deal with me again in the future, will end up more important to me.

Q:  Wait, does that mean that if I give you a Cheerful Price, I'm obligated to accept the same price again in the future?

No, because there may be aversive qualities of a task, or fun qualities of a task, that scale upward or downward with repeating that task.  So the price that makes your inner voices feel cheerful about doing something once, is not necessarily the same price that makes you feel cheerful about doing it twenty times.

Also in general, any time you imagine feeling obligated to do something, you have probably missed the point of the Cheerful Price methodology.

That said, you should probably check to see how how you would feel about repeating the transaction - it might turn up a hidden sense of "I'll do this for you once because I'm your friend", where your friend was hoping to pay you enough that they weren't expending social capital at all.  Similarly, you might want to check "How much would I charge this person if they weren't my friend?", not because your Cheerful Price for one person has to be the same as your Cheerful Price for somebody else, but in case your brain's first answer was mostly the friendship voice glossing over real costs that the other person is actively requesting to compensate you for.

Q:  I question the whole concept of a Cheerful Price between friends.  I don't think that's how friendship works in the first place.  If I'm willing to bake you a cake because I'm your friend, bringing money into it would just make me feel icky.  If it was more money I'd just feel ickier.

You have mentally arranged your friendships differently from how I arrange them!  But your feelings are also valid, and you should clearly signal them to anybody who starts talking about "cheerful prices" at you.  Tell them explicitly that's not how friendship works for you!  The whole reason they offered you a Cheerful Price was because they wanted you to be happy.  They don't want you to feel icky.

Q:  Just reading all this already made me feel icky.  When I bake you a cake, you're not expending social capital, we're being friends -

In that case, you should not be reading this post.  It's a cognitive hazard to you.  Leave now.


Uh, are they gone now?

Q:  Yeah, they're gone.


Q:  No, they're still reading, but now with an additional sense of offense that you think they're too low-status to withstand the weight of your words.  Obviously, only low-status people could possibly be damaged by reading something.

Sigh.  There are many things I wish I could unread, cough-Gray-Boy-cough.  "Just stop reading things that are damaging you" is an important life skill which has commonalities with "Don't leave your hand on a hot stove-plate if that hurts you" and "Speak up when people are touching you in ways you don't like."  

Q:  Fine, but there's nothing more you can do at this point to warn them off.  So, what would you actually say to somebody who claims that, when they bake you a cake, you're not "expending social capital" to get the cake, because them doing you a favor can actually strengthen your friendship?

I'd try to explain that economics is about "limited resources" rather than, I don't know, things that are easy to quantify, or things that are standard and interchangeable, or whether people feel like they're losing something in the process of a trade.  The fact that somebody won't willingly bake me an infinite amount of cake is enough to call that a limited resource, even if they didn't feel bad or lossy about baking one cake.

And that finite cake limitation is enough to make me worry about what I'm losing when I ask a friend to bake me just one cake, even if they don't feel bad or lossy about it the first time.  Because I'm the kind of person who ends a computer RPG with 99 unused healing potions, that's why.  And because I grew up relatively isolated, and I don't have a good sense of how much I'm losing when I ask somebody to bake me a cake.  I don't trust my ability to read someone's reactions if I ask them to bake me a cake.  I don't trust my ability to judge whether that will strengthen the friendship or weaken it.

So in reality, I'm not very likely to end up friends in the first place with somebody who's made sad by my asking them to quantify the cost to them of my request that they bake the cake.  I can't tweak my state of mind far enough to encompass their state of mind, or vice versa.

Q:  Okay, but as you admit, some people, maybe even most people, would rather not put financial prices on things at all, to retain their friendships.  They'd rather just do favors for each other without a sense of trying to keep track of everything.  Why did you claim back up top that those feelings were valid, but less valid than yours?

I was speaking mostly tongue-in-cheek.  But in fact there are coherence theorems saying that either you have consistent quantitative tradeoffs between the things you want, or your strategies can be rearranged to get you strictly more of everything you want.  I think that truly understanding these theorems is not compatible with being horrified at the prospect of pricing one thing in terms of another thing.  I think that there is a true bit of mathematical enlightenment that you get, and see into the structure of choicemaking, and then you are less horrified by the thought of pricing things in money.

Q:  Fine, but why is it not valid to let people go on feeling whatever they feel without demanding that they be enlightened by coherence theorems right now, even at the cost of doing violence to their emotions?  Who's to say they're not happier than you by living more the life of a human and less the life of a paperclip maximizer, while both of you are still mortals in the end?

Well, sure?  Hence it being "mostly tongue-in-cheek" rather than "slightly tongue-in-cheek".

Q:  Despite your pretended demurral, I get the sense that you actually hold it against them a bit more than that.

Fine, to be wholly frank, I do tend to see the indignant reaction "How dare you price that in money!" as a sign that somebody was brought up generally economically illiterate.  Like, if somebody says, "Sorry, I haven't attained the stage of enlightenment where explicitly exchanging money stops making me feel bad", I'm like, "Your feelings are valid!  I'm still human in many ways too!"  If they say, "There are some things you can't put a price on!  How dare you!", I'm like, "This indignation engine will probably have a happier childhood if I rudely walk away instead of trying to explain how the notion of a 'price' generalizes beyond things that are explicitly denominated in money."

Q:  On a completely different note, I worry that the notion of a Cheerful Price is unfair to people who start out poorer, because it will take less money to make them feel cheerful.

Generally speaking, when I ask somebody to name a "cheerful price", I'm trying to prompt them into naming a higher price so that I can avoid the fear of ouching them and/or do more transactions with them in the future.  Giving people more money is rarely less fair to them?  But if I try to probe at the implicit sense and worry of "unfair" that you're raising as a concern, I might try to rephrase it as something like:

"If you tell somebody they have to accept as fair the least price that makes them cheerful, they might accept a lower price than they could have gotten - a price that would be an uneven division of gains from trade, or a price below the going market rate - and this is more likely to happen to people who start out poorer."

And I agree that would be unfair.  If you can get more for your skills or your goods by going above the lowest price that makes you comfortably cheerful, go for it.

That said, if I'm asking everybody in the room their Cheerful Price to do my laundry, and the poorest person present names the lowest Cheerful Price, I think that's... actually everything working as intended?  The effect is that the person with the lowest prior endowment is the one who actually gets the money and feels cheerful about that; and the Cheerful part means they get more money (I hope) than if I asked everybody present to name their price without specifying the Cheerful part.

My current cheerful price for "Please write me a short story" might be above $10,000 today.  In 2001 it might have been $100, back when $100 was 1/20th of the cost of the car I was driving.  The end result of this 10,000% increase in how much money it takes to make me happy, as I've accumulated more money... is that now people who'll write you a story for $100 get your money, instead of me.  That seems a good phenomenon from the standpoint of financial equality; it causes money to flow from people who have more money towards people who have less money.

But on a more personal level, if I ask someone to name an amount of money that makes them feel cheerful and energized, I expect and hope that this causes more money to flow from me to them.  If the technically defined "cheerful price" is less than the person otherwise thinks they can get from a payor, then by all means, they should tell me:  "Don't worry!  I set my standard fee of $X high enough that I already feel cheerful."  And then I won't feel worried about paying too little and everything will be fine.

Q:  Surely if you're asking everybody in the room to name their Cheerful Price for something, you should pay the lowest bidder the second-lowest bid, not pay the lowest bidder their actual bid?

Uhhh... possibly?  I'm not actually sure that this logic works the same way when you're asking people for Cheerful Prices - I think you're already asking them to nudge the price upwards from "the lowest they'd accept", which means you don't have to give them the second-price of the auction in order to ensure they get any gains from trade.  It's a more friendly idiom in general - you're sort of trusting them to tell you the truth about what won't make them say "ow".  And despite that example I gave before, the whole thing seems more useful for individual interactions than auctions and markets?  But you may still have a point.  I'll have to think about it.


Watching themselves

13 февраля, 2021 - 01:20
Published on February 12, 2021 10:20 PM GMT

We've been very lucky with childcare this pandemic. We had an au pair, so when schools closed suddenly on a Wednesday in early March we had live-in child care for Lily (6y) and Anna (4y). Having our children well taken care of without risking exposure outside the home or requiring either of the parents to take time off work has been incredibly helpful. Going into winter, with restrictions still in full, however, our au pair was not interested in renewing for another year.

We were again lucky, and one of our housemates has been available to watch the kids three days a week. They're really great, and the kids are having a good time, but this does leave two days a week without care. We decided to draw on some of the independence we've been cultivating, and I talked to the kids about how they would be watching themselves two days a week.

Over the next few weeks the kids and I worked through some plans. We went over their day, talking through the different places where they currently rely on adults, and we figured out how we were going to handle each one. Some examples:

  • Lunch. This wasn't something they were going to be able to do on their own, so I agreed I would still make them lunch. When possible, I eat lunch with them and read to them. It's nice to have time together, and I think they also do eat more that way.

  • Snacks. We talked about what food they would like to have available in case they got hungry (peanut butter pretzels, Ritz crackers) and made a small shelf for them to use.

  • Drinks. They can already get themselves water whenever they want to, but they like to drink milk. The milk jug is heavy and easy to spill, so we decided that each day I would fill a cup with milk and leave it in the fridge. I'm still doing this, though mostly they've ended up just drinking milk at meals when an adult would be able to give them milk anyway.

  • Classes. Lily is in school remotely, and has various classes at different times during the day. She knows how to get into her classes herself, but she was relying on grown-ups to let her know when the classes would happen. I wanted to use this as an opportunity for practice at telling time, but practically she wasn't going to get good enough quickly enough. She has a tablet that she uses to listen to podcasts, and I set up alarms on it to go off two minutes before each class was supposed to start. She doesn't always have it with her, and sometimes someone else (Anna, me, a housemate) will hear it and let her know it's time. I don't think she's missed any classes yet.

  • Assistance. Sometimes there are things the kids need help with that we didn't foresee or that they can't handle. We built a buzzer together, so there's a button outside of my office that makes a noise at my desk. We agreed on a system: one buzz if you would like help, two buzzes if you need help, and three buzzes if it's an emergency. For one or two buzzes I would come if I could, depending on how interruptible what I was doing was, while for three buzzes I would definitely come regardless. I warned them that if it was not actually an emergency, I would be very grumpy, and they have not yet (a) used three buzzes or (b) been in a situation in which three would've been appropriate. We talked through some examples, and my favorite was when Anna told me that forgetting about movie day merited three buzzes (Lily clarified for her that it does not). Sometimes they will buzz once for things that I have to say no to, and Anna took some time to learn what to expect: "Papa, I pressed the buzzer once because I wanted to ask you if you would have a tea party with me?"

  • Outside. The kids can bundle up, go out through the kid door, and play in the backyard any time they want, but they haven't wanted to. They'd rather go to the playground. Sometimes I have meetings that I can take while walking, so if the timing lines up I'll walk with them to the park while I take a meeting on my phone. They are fully responsible for getting themselves ready; I tell them to buzz me when they're ready to go.

  • Boredom. This is not something I help them with. We have a lot of toys and different things to do, and if they can't find something they want to do that's their problem.

  • Conflict. Sometimes they get mad at each other, and I am not generally available during the day to resolve things. One firm rule does a lot of heavy lifting here: if they ask you to leave their bedroom you have to leave. They both know that I will be very unhappy if they violate this, and it gives them both a reliable option for deescalating a conflict. They value playing with each other enough that withdrawing, or the threat of withdrawal, is also quite useful in bargaining.

Yesterday I tracked what I ended up needing to do during work, both planned (feeding them lunch) and unplanned (buzzer presses). I started work at 8:40am, and Julia took the kids at 5:30pm.

  • 9:02, 1min: Lily buzzes once with a Zoom issue for her class.
  • 9:16, 1min: Lily buzzes twice with a different Zoom issue for her class.
  • 9:45, 2min: I go downstairs and tell the kids that I can take them outside during a meeting if they want.
  • 9:50, 1min: Lily buzzes once to say they don't want to go outside today.
  • 12:00, 35min: Feed them lunch, read to them.
  • 2:00, 1min: tell them to get ready for naps
  • 2:11, 1min: Lily buzzes once with yet another Zoom issue.
  • 2:13, 3min: Anna buzzes once for me to brush her teeth, then I put on a story tape and put her down for naps. Lily will put herself down when she finishes her class.
  • 2:18, 1min: Lily buzzes once with a school question.
  • 2:38, 1min: One buzz, unanswered because I'm in a meeting
  • 4:19, 8min: Get the kids up from naps, make some popcorn, start movie day.
  • 4:27, 1min: One buzz, which I knew without answering was that they wanted me to tell the Chromecast to start the next episode.
  • 4:45, 1min: Ditto

This is pretty typical except for more Zoom issues than usual and the kids not wanting to go outside. This level of interruption doesn't bother me, though I think it might bother someone whose attention worked differently more?

Some of this working as well as it does likely comes from our parenting approach, while other aspects are more likely us being lucky in which particular kids we happen to have. It's hard to tell what the balance is.

I'm glad this is not our situation all week long, and there are lots of things that out housemate does with them that I'm happy they get to do. I so also think, however, that having some time when they are (in their minds) fully responsible for themselves is likely good for them.

Comment via: facebook


Potential factors in Bell Labs' intellectual progress, Pt. 1

12 февраля, 2021 - 22:26
Published on February 12, 2021 7:26 PM GMT

Epistemic status: these are notes from a popular account filtered through my own existing beliefs. Here, I am largely trusting the book to report true things and not misrepresent them, though, in fact, I suspect the book is trying to create a certain narrative that might be misleading. If I were to get very serious about Bell Labs, I'd have to look at more source material.

Over the years, I've heard various people reference Bell Labs as a place where a uniquely large amount of intellectual progress was made, making it a worthy target of investigation if you're interested in intellectual progress.

A few days ago, I started reading The Idea Factory: Bell Labs and the Great Age of American Innovation. I'm only 20% of the way through, but I've started to note various factors that might explain their output.

Many of the factors that are salient to me were already in my bag of hypotheses and could just represent confirmation bias on my part. A few were surprising. I suppose I should also look for factors I expected to see but haven't yet (look into the dark).

Note: the most significant invention to come out of Bell Labs was the transistor and a lot of the book has focused on that, but they did other notable things too.

Factors Salient to Me
  • While the work often tended in the direction of basic science that was distant from practical application, by dint of it occurring within AT&T, it was expected that all the research done might somehow benefit AT&T and I sense a degree of backchaining from that in all they did.
  • Ten years before the achievement of the transistor, Mervin Kelly said to William Shockley that they really needed something solid state that could amplify and switch to replace fragile and undurable vacuum tubes and relays. It took a tonne of basic research in solid state physics to get there, but ultimately that was the driving motivation. (Only after its announcement did some folks from MIT write to say it might be applicable to electronic computer circuits.)
  • A lot of the innovation was motivated by concrete problem solving, e.g., needing to figure out better cable insulation and sheathing so they could run underwater cables – that and related projects required a lot of materials innovation.
  • Their work was highly empirical, highly directed, and highly applied. There was a lot of feedback. Often they were trying to build devices that did certain things and I expect it was clear when they were succeeding or not. Often they were just trying to figure how the physical world worked (conductors, insulations, whatever) but this was grounded in the hope the expectation that this understanding would help their engineer more stuff.
  • There was an overarching grand lofty mission: “Our job, essentially, is to devise and develop facilities which will enable two human beings anywhere in the world to talk to each other as clearly as if they were face to face and to do this economically as well as efficiently." It was reminiscent of Theodore Vail’s dictum of “one policy, one system, universal service.” But it likewise suggested that the task at hand was immense.
  • Although they had to figure out many pieces of their more specific paradigms (tools, methods, principles), their overall work was embedded with an established broader paradigm of physics.
  • The researchers noted in the book were typically physics PhDs from top universities who'd been working on related problems in their academic (and were offering recruiting each other because of it). To me, this implies that a) they shared a common paradigm, meaning methods, background assumptions, etc., and b) they entered Bell Labs with a lot of relevant knowledge and experience.
  • At the same time as their academic education, the book paints the researchers as having grounded backgrounds: working on farms and mills. Though I suspect this easily could be spun for a narrative.
  • If the book can be trusted, they had a lot of top talent. Bell Labs was prestigious, it had many people who were known to be good, and it paid well (better than universities).
  • Many, many people's work fed into the invention of the transistor. Shockley, Bardeen, and Brittain might be the names attached to it, but in fact they built upon the work of many others at Bell Labs and were supported by a large staff of specialists who were responsible for all kinds of little discoveries they were necessary along the way.
  • The big discoveries didn't happen that fast. Ten years between the desire for a solid state amplifier and getting there. Not one or two.
  • It was a "destination". Many people would come and visit: chat and lecture with the people there (their building had an auditorium). Very connected to a broader idea ecosystem.

Increasingly, during the late 1920s and early 1930s, ideas arrived in the flesh, too. Some years Karl Darrow would visit California to lecture; some years students in various locations would learn from a physics professor named John Van Vleck, who was permitted to ride the nation’s passenger trains free of charge because he had helped work out the national rail schedules with exacting precision. It also was the case that a scholar from abroad (a 1931 world tour by the German physicist Arnold Sommerfeld, for instance) would bring the new ideas to the students at Caltech or the University of Michigan. Indeed, the Bell Labs experimentalist Walter Brattain, the physicist son of a flour miller, was taking a summer course at Michigan when he heard Sommerfeld talk about atomic structure. Brattain dutifully took notes and brought the ideas back to New York. At West Street, he gave an informal lecture series to his Bell Labs colleagues. 

Every month, as it happened, seemed to bring a new study on physics, chemistry, or metallurgy that was worth spreading around—on the atomic structure of crystals, on ultra-high-frequency radio waves, on films that cover the surface of metals, and so forth. One place to learn about these ideas was the upper floor of the Bell Labs West Street offices, where a large auditorium served as a place for Bell Labs functions and a forum for new ideas. In the 1920s, a one-hour colloquium was set up at 5 p.m. on Mondays so that outside scholars like Robert Millikan and Enrico Fermi or inside scholars like Davisson, Darrow, and Shockley—though only twenty-seven years old at the time—could lecture members of the Bell Labs technical staff on recent scientific developments. (Albert Einstein came to West Street in 1935, but was evidently more interested in touring the microphone shop with Harvey Fletcher than giving a talk.) Another place to learn about the new ideas was the local universities. The Great Depression, as it happened, was a boon for scientific knowledge. Bell Labs had been forced to reduce its employees’ hours, but some of the young staffers, now with extra time on their hands, had signed up for academic courses at Columbia University in uptown Manhattan. Usually the recruits enrolled in a class taught on the Columbia campus by a professor named Isidor Isaac (I. I.) Rabi, who was destined for a Nobel Prize. - Gertner, Jon. The Idea Factory (pp. 42-43). Penguin Publishing Group. Kindle Edition. 

  • They were strongly connected to other laboratories working on similar (or the same) problems. Top researchers would spend months touring labs in Europe and then come back and share what they had learned.
  • They had an internal scientific journal: Bell Labs Technical Journal 
  • They had study groups were researchers would together through new material on physics.

And there was, finally, another place on West Street where new ideas could now spread. Attendance was allowed by invitation only. Some of the Labs’ newest arrivals after the Depression had decided to further educate themselves through study groups where they would make their way through scientific textbooks, one chapter a week, and take turns lecturing one another on the newest advances in theoretical and experimental physics. One study group in particular, informally led by William Shockley at the West Street labs, and often joined by Brattain, Fisk, Townes, and Wooldridge, among others, met on Thursday afternoons. The men were interested in a particular branch of physics that would later take on the name “solid-state physics.” It explored the properties of solids (their magnetism and conductivity, for instance) in terms of what happens on their surfaces as well as deep in their atomic structure. And the men were especially interested in the motions of electrons as they travel through the crystalline lattice of metals. “What had happened, I think, is that these young Ph.D.’s were introducing what is essentially an academic concept into this industrial laboratory,” one member of the group, Addison White, would tell the physics historian Lillian Hoddeson some years later. “The seminar, for example, was privileged in that we started at let’s say a quarter of five, when quitting time was five.” The men had tea and cookies served to them from the cafeteria—“all part of the university tradition,” White remarked, “but unconventional in the industrial laboratory of that day.” The material was a challenge for everyone in the group except Shockley, who could have done the work in his sleep, Wooldridge would recall. Out of habit, the men addressed one another by their last names. According to Brattain, it was always Shockley and Wooldridge—never Bill and Dean, and never Dr. Shockley and Dr. Wooldridge. - Gertner, Jon. The Idea Factory (pp. 43-44). Penguin Publishing Group. Kindle Edition. 

  • They specialized. Notably, there was a split between theorists and experimentalists who worked together. A theorist would predict something, the experimentalist would construct and run the experiment, then the theorist would interpret the data. There was also the split between physicists, chemists, metallurgists, etc.
    • And yet at the same time, this is also quoted as applying to at least one period: There was no real distinction at West Street between an engineer and a scientist. If anything, everyone was considered an engineer and was charged with the task of making the thousands of necessary small improvements to augment the phone service that was interconnecting the country. - Gertner, Jon. The Idea Factory (p. 27). Penguin Publishing Group. Kindle Edition.
    • The period where specialization seemed was apparent was later, when they'd moved from West Street to Murray Hills.
  • Bell Labs was large. Thousands of people worked during at least some periods (9,000 during WWII supposedly).
  • They eventually built out custom offices/laboratories in a suburban area, making me think of the Steve Jobs building at Pixar, but in the former case each lab was hooked up with "everything an experimentalist could need: compressed air, distilled water, steam, gas, vacuum, hydrogen, oxygen, and nitrogen."
  • No one was allowed to work with their doors closed.
  • No one was allowed to refuse help to colleagues, regardless of rank or department, when it might be necessary.
  • Supervisors were allowed to guide but not interfere with research.
  • There was more chance and random experiment leading to the transistor than I expected. I'd kind of assumed the theory and experiments had proceeded in a very definite way. Instead, semiconductor doping was a random discovery they figured out after they'd been mucking around a bunch with semiconductors and just trying to understand their observations.

Three Bell Labs researchers in particular—Jack Scaff, Henry Theurer, and Russell Ohl—had been working with silicon in the late 1930s, mostly because of its potential for the Labs’ work in radio transmission. Scaff and Theurer would order raw silicon powder from Europe, or (later) from American companies like DuPont, and melt it at extraordinary temperatures in quartz crucibles. When the material cooled they would be left with small ingots that they could test and examine. They soon realized that some of their ingots—they looked like coal-black chunks, with cracks from where the material had cooled too quickly—rectified current in one direction, and some samples rectified current in another direction. At one point, Russell Ohl came across a sample that seemed to do both: The top part of the sample went in one direction and the bottom in the other. That particular piece was intriguing in another respect. Ohl discovered that when he shone a bright light on it he could generate a surprisingly large electric voltage. Indeed the effect was so striking, and so unexpected, that Ohl was asked to demonstrate it in Mervin Kelly’s office one afternoon. Kelly immediately called in Walter Brattain to take a look, but none of the men had a definitive explanation. - Gertner, Jon. The Idea Factory (pp. 84-85). Penguin Publishing Group. Kindle Edition. 

  • There were other people working on the same things as they were, and they were racing against them. It was Leibniz and Newton, Tesla and Edison, Graham Bell and Elisha Gray. In particular, Julius Lilienfeld had independently discovered and patented the field-effect also theorized by Shockley, and Herbert Mataré independently invented the point-contact transistor in 1948 (vs 1947 for Bardeen and Brittain).
    • This actually flies against my sense that Bell Labs was able to build the transistor because of their resources and build-up of particular knowledge and expertise they had after 20-years. Possibly their ideas were just getting spread around via their external contacts, or actually, solid-state physics was taking off generally.


Mapping the Conceptual Territory in AI Existential Safety and Alignment

12 февраля, 2021 - 22:00
Published on February 12, 2021 7:55 AM GMT

(Crossposted from my blog)

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed information. This post was largely inspired by (and would not be possible without) two talks by Paul Christiano and Rohin Shah, respectively, that give very similar overviews of the field,[1] as well as a few posts on the Alignment Forum that will be discussed below. This post is not intended to replace these talks but is instead an attempt to coherently integrate their ideas with ideas from other sources attempting to clarify various aspects of the field. You should nonetheless watch these presentations and read some of the resources provided below if you’re trying to build your mental map as completely as possible.

(Primer: If you’re not already convinced of the possibility that advanced AI could represent an existential threat to humanity, it may be hard to understand the motivation for much of the following discussion. In this case, a good starting point might be Richard Ngo’s sequence AGI Safety from First Principles on the Alignment Forum, which makes the case for taking these issues seriously without taking any previous claims for granted. Others in the field might make the case differently or be motivated by different considerations,[2] but this still provides a good starting point for newcomers.)

Clarifying the objective

First, I feel it is important to note that both the scope of the discussion and the relative importance of different research areas change somewhat depending on whether our high-level objective is “reduce or eliminate AI-related existential risks” or “ensure the best possible outcome for humanity as it deploys AI technology.” Of course, most people thinking about AI-related existential risks are probably doing so because they care about ensuring a good long-term future for humanity, but the point remains that avoiding extinction is a necessary but not sufficient condition for humanity being able to flourish in the long term.

Paul Christiano's roadmap, as well as the one I have adapted from Paul’s for this post in an attempt to include some ideas from other sources, have “make AI go well” as the top-level goal, and of course, technical research on ensuring existential safety will be necessary in order to achieve this goal. However, some other research areas under this heading, such as “make AI competent,” arguably contribute more to existential risk than to existential safety, despite remaining necessary for ensuring the most beneficial overall outcomes. (To see this, consider that AI systems below a certain level of competence, such as current machine learning systems, pose no existential threat at all, and that with increasing competence comes increasing risk in the case of that competence being applied in undesirable ways.) I want to credit Andrew Critch and David Krueger’s paper AI Research Considerations for Human Existential Safety (ARCHES) for hammering this point home for me (see also the blog post I wrote about ARCHES).

The map

The rest of this post will discuss various aspects of this diagram and its contents:

I have to strongly stress that this is only marginally different from Paul’s original breakdown (the highlighted boxes are where he spends most of his time):

In fact, I include Paul’s tree here because it is informative to consider where I chose to make small edits to it in an attempt to include some other perspectives, as well as clarify terminological or conceptual distinctions that are needed to understand some smaller but important details of these perspectives. Clearly, though, this post would not be possible without Paul’s insightful original categorizations.

It might be helpful to have these diagrams pulled up separately while reading this post, in order to zoom as needed and to avoid having to scroll up and down while reading the discussion below.


I mostly mention the competence node here to note that depending how terms are defined, “capability robustness” (performing robustly in environments or on distributions different from those an algorithm was trained or tested in) is arguably a necessary ingredient for solving the “alignment problem” ~in full~, but more on this later. In the end, I don’t think there’s too much consequence to factoring it like Paul and I have; to “make AI go well,” our AI systems will need to be trying not to act against our interests and do so robustly in a myriad of unforeseeable situations.

(Also, remember that while competence is necessary for AI to go as well as possible, this is generally not the most differentially useful research area for contributing to this goal, since the vast majority of AI and ML research is already focused on increasing the capabilities of systems.)

Coping with impacts

Another area that is mostly outside the scope of our discussion here but still deserves mentioning is what Paul labels “cope with impacts of AI,” which would largely fall under the typical heading of AI “policy” or “governance” (although some other parts of this diagram might also typically count as “governance,” such as those under the “pay alignment tax” node). Obviously, good governance and policies will be critical, both to avoiding existential risks from AI and to achieving best possible outcomes, but much of my focus is on technical work aimed at developing what the Center for Human-Compatible Artificial Intelligence at Berkeley calls “provably beneficial systems,” as well as systems that reliably avoid bad behavior.

Deconfusion research

I added this node to the graph because I believe it represents an important area of research in the project of making AI go well. What is “deconfusion research”? As far as I’m aware, the term comes from MIRI's 2018 Research Agenda blog post. As Nate Soares (the author of the post) puts it, “By deconfusion, I mean something like ‘making it so that you can think about a given topic without continuously accidentally spouting nonsense.’” Adam Shimi explains: “it captures the process of making a concept clear and explicit enough to have meaningful discussions about it.” This type of research corresponds to the “What even is going on with AGI?” research category Rohin discusses in his talk. Solutions to problems in this category will not directly enable us to build provably beneficial systems or reliably avoid existential risk but instead aim to resolve confusion around the underlying concepts themselves, in order for us to then be able to meaningfully address the “real” problem of making AI go well. As Nate writes on behalf of MIRI:

From our perspective, the point of working on these kinds of problems isn’t that solutions directly tell us how to build well-aligned AGI systems. Instead, the point is to resolve confusions we have around ideas like “alignment” and “AGI,” so that future AGI developers have an unobstructed view of the problem. Eliezer illustrates this idea in “The Rocket Alignment Problem," which imagines a world where humanity tries to land on the Moon before it understands Newtonian mechanics or calculus.

Research in this category includes MIRI’s Agent Foundations Agenda (and their work on embedded agency), Eric Drexler’s work on Comprehensive AI Services (CAIS), which considers increased automation of bounded services as a potential path to AGI that doesn’t require building opaquely intelligent agents with a capacity for self-modification, Adam Shimi’s work on understanding goal directedness, MIRI/Evan Hubinger's work on mesa-optimization and inner alignment, and David Krueger and Andrew Critch’s attempt to deconfuse topics surrounding existential risk, prepotent AI systems, and delegation scenarios in ARCHES. I won’t go into any of this work in depth here (except for more on mesa-optimization on inner alignment later), but all of it is worth looking into as you build up a picture of what’s going on in the field.

This post, the talks by Christiano and Shah by which it was inspired, and many of the clarifying posts from the Alignment Forum linked to throughout this post were also created with at least some degree of deconfusional intent. I found this post on clarifying some key hypotheses helpful in teasing apart various assumptions made in different areas and between groups of people with different perspectives. I also think Jacob Steinhardt’s AI Alignment Research Overview is worth mentioning here. It has a somewhat different flavor from and covers somewhat different topics than this/Paul’s/Rohin’s overview but still goes into a breadth of topics with some depth.


This was another small distinction I believed was important to make in adapting Paul’s factorization of problems for this post. As proposed by Andrew Critch and David Krueger in ARCHES, and as I discussed in my blog post about ARCHES, the concept of “delegation” might be a better and strictly more general concept than “alignment.” Delegation naturally applies to the situation: humans can delegate responsibility for some task they want accomplished to one or more AI systems, and doing so successfully clearly involves the systems at least trying to accomplish these tasks in the way we intend (“intent alignment,” more on this soon). However, “alignment,” as typically framed for technical clarity, is about aligning the values or behavior of a single AI system with a single human.[3] It is not particularly clear what it would mean for multiple AI systems to be “aligned” with multiple humans, but it is at least somewhat clearer what it might mean for a group of humans to successfully delegate responsibility to a group of AI systems, considering we have some sense of what it means for groups of humans to successfully delegate to other groups of humans (e.g. through organizations). Within this framework, “alignment” can be seen as a special case of delegation, what Critch and Krueger call “single/single” delegation (delegation from one human to one AI system). See below (“Single/single delegation (alignment)”) for more nuance on this point, however. I believe this concept largely correlates with Shah’s “Helpful AGI” categorization in his overview talk; successful delegation certainly depends in part on the systems we delegate to being helpful (or, at minimum, trying to be).

Delegation involving multiple stakeholders and/or AIs

One of the reasons ARCHES makes the deliberate point of distinguishing alignment as a special case of delegation is to show that solving alignment/successfully delegating from one user to one system is insufficient for addressing AI-related existential risks (and, by extension, for making AI go well). Risk-inducing externalities arising from out of the interaction of individually-aligned systems can still pose a threat and must be addressed by figuring out how to successfully delegate in situations involving multiple stakeholders and/or multiple AI systems. This is the main reason I chose to make Paul’s “alignment” subtree a special case of delegation more generally. I won’t go into too much more detail about these “multi-” situations here, partially because there’s not a substantial amount of existing work to be discussed. However, it is worth looking at ARCHES, as well as this blog post by Andrew Critch and my own blog post summarizing ARCHES, for further discussion and pointers to related material.

I would be interested to know to what extent Christiano thinks this distinction is or is not helpful in understanding the issues and contributing to the goal of making AI go well. It is clear by his own diagram that “making AI aligned” is not sufficient for this goal, and he says as much in this comment in response to the aforementioned blog post by Critch: “I totally agree that there are many important problems in the world even if we can align AI.” But the rest of that comment also seems to somewhat question the necessity of separately addressing the multi/multi case before having a solution for the single/single case, if there might be some “‘default’ ways” of approaching the multi/multi case once armed with a solution to the single/single case. To me, this seems like a disagreement on the differential importance between research areas rather than a fundamental difference about the underlying concepts in principle, but I would be interested in more discussion on this point from the relevant parties. And it is nonetheless possible that solving single/single delegation or being able to align individual systems and users could be a necessary prerequisite to solving the multi- cases, even if we can begin to probe the more general questions without a solution for the single/single case.

(ETA 12/30/20: Rohin graciously gave me some feedback on this post and had the following to say on this point)

I'm not Paul, but I think we have similar views on this topic -- the basic thrust is:

  1. Yes, single-single alignment does not guarantee that AI goes well; there are all sorts of other issues that can arise (which ARCHES highlights).
  2. We're focusing on single-single alignment because it's a particularly crisp technical problem that seems amenable to technical work in advance -- you don't have to reason about what governments will or won't do, or worry about how people's attitudes towards AI will change in the future. You are training an AI system in some environment, and you want to make sure the resulting AI system isn't trying to hurt you. This is a more "timeless" problem that doesn't depend as much on specific facts about e.g. the current political climate.
  3. A single-single solution seems very helpful for multi-multi alignment; if you care about e.g. fairness for the multi-multi case, it would really help if you had a method of building an AI system that aims for the human conception of fairness (which is what the type of single-single alignment that I work on can hopefully do).
  4. The aspects of multi-multi work that aren't accounted for by single-single work seem better handled by existing institutions like governments, courts, police, antitrust, etc rather than technical research. Given that I have a huge comparative advantage at technical work, that's what I should be doing. It is still obviously important to work on the multi-multi stuff, and I am very supportive of people doing this (typically under the banner of AI governance, as you note).

(In Paul's diagram, the multi-multi stuff goes under the "cope with the impacts of AI" bucket.)

I suspect Critch would disagree most with point 4 and I'm not totally sure why.

Single/single delegation (alignment)

It’s important to make clear what we mean by “alignment” and “single/single delegation” in our discussions, since there are a number of related but distinct formulations of this concept that are important to disambiguate in order to bridge inferential gaps, combat the illusion of transparency, and deconfuse the concept. Perhaps the best starting point for this discussion is David Krueger’s post on disambiguating "alignment", where he distinguishes between several variations of the concept:

  • Holistic alignment: "Agent R is holistically aligned with agent H iff R and H have the same terminal values. This is the ‘traditional AI safety (TAIS)’ (as exemplified by Superintelligence) notion of alignment, and the TAIS view is roughly: ‘a superintelligent AI (ASI) that is not holistically aligned is an Xrisk’; this view is supported by the instrumental convergence thesis."
  • Parochial alignment: "I’m lacking a satisfyingly crisp definition of parochial alignment, but intuitively, it refers to how you’d want a 'genie' to behave: R is parochially aligned with agent H and task T iff R’s terminal values are to accomplish T in accordance to H’s preferences over the intended task domain... parochially aligned ASI is not safe by default (it might paperclip), but it might be possible to make one safe using various capability control mechanisms”
  • Sufficient alignment: "R is sufficiently aligned with H iff optimizing R’s terminal values would not induce a nontrivial Xrisk (according to H’s definition of Xrisk). For example, an AI whose terminal values are ‘maintain meaningful human control over the future’ is plausibly sufficiently aligned. It’s worth considering what might constitute sufficient alignment short of holistic alignment. For instance, Paul seems to argue that corrigible agents are sufficiently aligned."
  • Intent alignment (Paul Christiano's version of alignment): "R is intentionally aligned with H if R is trying to do what H wants it to do."
  • "Paul also talks about benign AI which is about what an AI is optimized for (which is closely related to what it ‘values’). Inspired by this, I’ll define a complementary notion to Paul’s notion of alignment: R is benigned with H if R is not actively trying to do something that H doesn’t want it to do."

Each of these deserves attention, but let’s zoom in on intent alignment, as it is the version of alignment that Paul uses in his map and that he seeks to address with his research. First, I want to point out that each of Krueger’s definitions pertains only to agents. However, I think we still want a definition of alignment that can apply to non-agential AI systems, since it is an open question whether the first AGI will be agentive. Comprehensive AI Services (CAIS) explicitly pushes back against this notion, and ARCHES frames its discussion around AI “systems” to be “intentionally general and agent-agnostic.” (See also this post on clarifying some key hypotheses for more on this point.) It is clear that we want to have some notion alignment that applies just as well to AI systems that are not agents or agent-like. In fact, Paul's original definition does not seem to explicitly rely on agency:

When I say an AI A is aligned with an operator H, I mean:

A is trying to do what H wants it to do.

Another characterization of intent alignment comes from Evan Hubinger: "An agent is intent aligned if its behavioral objective[4] is aligned with humans” (presumably he means “aligned” in this same sense that its behavioral objective is incentivizing trying to do what we want). I like that this definition uses the more technically clear notion of a behavioral objective because it allows the concept to more precisely be placed in a framework with outer and inner alignment (more on this later), but I still wish it did not depend on a notion of agency like Krueger’s definition. Additionally, all of these definitions lack the formal rigor that we need if we want to be able to “use mathematics to formally verify if a proposed alignment mechanism would achieve alignment,” as noted by this sequence on the Alignment Forum. David Krueger makes a similar point in his post, writing, “Although it feels intuitive, I’m not satisfied with the crispness of this definition [of intent alignment], since we don’t have a good way of determining a black box system’s intentions. We can apply the intentional stance, but that doesn’t provide a clear way of dealing with irrationality.” And Paul himself makes very similar points in his original post:

  • "This definition of ‘alignment’ is extremely imprecise. I expect it to correspond to some more precise concept that cleaves reality at the joints. But that might not become clear, one way or the other, until we’ve made significant progress.”
  • “One reason the definition is imprecise is that it’s unclear how to apply the concepts of ‘intention,’ ‘incentive,’ or ‘motive’ to an AI system. One naive approach would be to equate the incentives of an ML system with the objective it was optimized for, but this seems to be a mistake. For example, humans are optimized for reproductive fitness, but it is wrong to say that a human is incentivized to maximize reproductive fitness.”[5]

All of these considerations indicate that intent alignment is itself a concept in need of deconfusion, perhaps to avoid a reliance on agency, to make the notion of “intent” for AI systems more rigorous, and/or for other reasons entirely.

Leaving this need aside for the moment, there are a few characteristics of the “intent alignment” formulation of alignment that are worth mentioning. The most important point to emphasize is that an intent-aligned system is trying to do what its operator wants it to, and not necessarily actually doing what its operator wants it to do. This allows competence/capabilities to be factored out as a separate problem from (intent) alignment; an intent-aligned system might make mistakes (for example, by misunderstanding an instruction or by misunderstanding what its operator wants[6]), but as long as it is trying to do what its operator wants, the hope is that catastrophic outcomes can be avoided with a relatively limited amount of understanding/competence. However, if we instead define “alignment” only as a function of what the AI actually does, an aligned system would need to be both trying to do the right thing and actually accomplishing this objective with competence. As Paul says in his overview presentation, “in some sense, [intent alignment] might be the minimal thing you want out of your AI: at least it is trying.” This highlights why intent alignment might be an instrumentally more useful concept for working on making AI go well: while the (much) stronger condition of holistic alignment would almost definitionally guarantee that a holistically aligned system will not induce existential risks by its own behavior, it seems much harder to verify that a system and a human share the same terminal values than to verify that a system is trying to do what the human wants.

It’s worth mentioning here the concept of corrigibility. The page on Arbital provides a good definition:

A ‘corrigible’ agent is one that doesn't interfere with what we would intuitively see as attempts to ‘correct’ the agent, or ‘correct’ our mistakes in building it; and permits these ‘corrections’ despite the apparent instrumentally convergent reasoning saying otherwise.

This intuitively feels like a property we might like the AI systems we build to have as they get more powerful. In his post, Paul argues:

  1. A benign act-based agent will be robustly corrigibile if we want it to be.
  2. A sufficiently corrigible agent will tend to become more corrigible and benign over time. Corrigibility marks out a broad basin of attraction towards acceptable outcomes.

As a consequence, we shouldn’t think about alignment as a narrow target which we need to implement exactly and preserve precisely. We’re aiming for a broad basin, and trying to avoid problems that could kick [us] out of that basin.

While Paul links corrigibility to benignment explicitly here, how it relates to intent alignment is somewhat less clear to me. I think it’s clear that intent alignment (plus a certain amount of capability) entails corrigibility: if a system is trying to “do what we want,” and is at least capable enough to figure out that we want it to be corrigible, then it will do its best to be corrigible. I don’t think the opposite direction holds, however: I can imagine a system that doesn’t interfere with attempts to correct it and yet isn’t trying to “do what we want.” The point remains, though, that if we’re aiming for intent alignment, it seems that corrigibility will be a necessary (if not sufficient) property.

Returning to the other definitions of alignment put forth by Krueger, one might wonder if there is any overlap between these different notions of alignment. Trivially, a holistically aligned AI would be parochially aligned for any task T, as well as sufficiently aligned. David also mentions that "Paul seems to argue that corrigible agents are sufficiently aligned," which does seem to be a fair interpretation of the above “broad basin” argument. The one point I’ll raise, though, is that Paul specifically argues that “benign act-based agents will be robustly corrigible” and “a sufficiently corrigible agent will tend to become more corrigible and benign over time,” which seems to imply corrigibility can give you benignment. By David’s definition of benignment (“not actively trying to do something that H doesn’t want it to do”), this would represent sufficient alignment, but Paul defined benign AI in terms of what it was optimized for. If such an optimization process were to produce a misaligned mesa-optimizer, it would clearly not be sufficiently aligned. Perhaps the more important point, however, is that it seems Paul would argue that intent alignment would in all likelihood represent sufficient alignment (others may disagree).

I would also like to consider if and how the concept of single/single delegation corresponds to any of these specific types of alignment. As put forth in ARCHES:

Single(-human)/single(-AI system) delegation means delegation from a single human stakeholder to a single AI system (to pursue one or more objectives).

Firstly, it is probably important to note that “single/single delegation” refers to a task, and “alignment,” however it is defined, is a property that we want our AI systems to have. However, to solve single/single delegation (or to do single/single delegation successfully), we will require a solution to the “alignment problem,” broadly speaking. From here, it’s a question of defining what would count as a “solution” to single/single delegation (or what it would mean to do it “successfully”). If we can build intent aligned systems, will we have solved single/single delegation? If they are sufficiently capable, probably. The same goes for parochially aligned and holistically aligned systems: if they’re sufficiently capable, the users they’re aligned with can probably successfully delegate to them. It is unclear to me whether this holds for a sufficiently aligned system, however; knowing that “optimizing R’s terminal values would not induce a nontrivial Xrisk” doesn’t necessarily mean that R will be any good at doing the things H wants it to.

As I mentioned before, I like the concept of “delegation” because it generalizes better to situations involving multiple stakeholders and/or AI systems. However, I believe it is still necessary to understand these various notions of “alignment,” because it remains a necessary property for successfully delegating in the single/single case and because understanding the differences between them is helpful for understanding others’ work and in communicating about the subject.

Alignment tax and alignable algorithms

One compelling concept Paul used that I had not heard before was the “alignment tax”: the cost incurred from insisting on (intent) alignment. This is intended to capture the tension between safety and competence. We can either pay the tax, e.g. by getting policymakers to care enough about the problem, negotiating agreements to coordinate to pay the tax, etc., or we can reduce the tax with technical safety and alignment research that produces aligned methods that are roughly competitive with unaligned methods.

Two ways that research can reduce the alignment tax are 1) advancing alignable algorithms (perhaps algorithms that have beliefs and make decisions that are easily interpretable by humans) by making them competitive with unaligned methods and 2) making existing algorithms alignable:


Paul then considers different types of algorithms (or, potentially, different algorithmic building blocks in an intelligent system) we might try and align, like algorithms for planning, deduction, and learning. With planning, we might have an alignment failure if the standard by which an AI evaluates actions doesn’t correspond to what we want, or if the algorithm is implicitly using a decision theory that we don’t think is correct. The former sounds much like traditional problems in (mis)specifying reward or objective functions for learners. I think problems in decision theory are very interesting, but unfortunately I have not yet been able to learn as much about the subject as I’d like to. The main thrust of this research is to try and solve perceived problems with traditional decision theories (e.g. causal decision theory and evidential decision theory) in scenarios like Newcomb's problem. Two decision theory variants I’ve seen mentioned in this context are functional decision theory and updateless decision theory. (This type of research could also be considered deconfusion work.)

As for aligning deduction algorithms, Paul only asks “is there some version of deduction that avoids alignment failures?” and mentions “maybe the alignment failures in deduction are a little more subtle” but doesn’t go into any more detail. After searching for posts on the Alignment Forum and LessWrong about how deduction could be malign failed to surface anything, I can’t help but wonder if he really might be referring to induction. For one, I’m having trouble imagining what it would mean for a deductive process to be malign. From my understanding, the axioms and rules of inference that define a formal logical system completely determine the set of theorems that can be validly derived from them, so if we were unhappy with the outputs of a deductive process that is validly applying its rules of inference, wouldn’t that mean that we really just have a problem with our own choice of axioms and/or inference rules? I can’t see where a notion of “alignment” would fit in here (but somebody please correct me if I’m wrong here… I would love to hear Paul’s thoughts about these potentially “subtle” misalignment issues in deduction).

The other reason I’m suspicious Paul might’ve actually meant induction is because Paul himself wrote the original post arguing that the universal prior in Solomonoff induction is malign. I won’t discuss this concept too much here because it still confuses me somewhat (see here, here, and here for more discussion), but it certainly seems to fit the description of being a “subtle” failure mode. I’ll also mention MIRI’s paper on logical induction (for dealing with reasoning under logical uncertainty) here, as it seems somewhat relevant to the idea of alignment as it corresponds to deduction and/or induction.

(ETA 12/30/20: Rohin also had the following to say about deduction and alignment)

I'm fairly confident he does mean deduction. And yes, if we had a perfect and valid deductive process, then a problem with that would imply a problem with our choice of axioms and inference rules. But that's still a problem!

Like, with RL-based AGIs, if we had a perfect reward-maximizing policy, then a problem with that would imply a problem with our choice of reward function. Which is exactly the standard argument for AI risk.

There's a general argument for AI risk, which is that we don't know how to give an AI instructions that it actually understands and acts in accordance to -- we can't "translate" from our language to the AI's language. If the AI takes high impact actions, but we haven't translated properly, then those large impacts may not be the ones we want, and could be existentially bad. This argument applies whether our AI gets its intelligence from induction or deduction.

Now an AI system that just takes mathematical axioms and finds theorems is probably not dangerous, but that's because such an AI system doesn't take high impact actions, not because the AI system is aligned with us.

Outer alignment and objective robustness/inner alignment

For learning algorithms, Paul breaks the alignment problem into two parts: outer alignment and inner alignment. This was another place where I felt it was important to make a small change to Paul’s diagram, as a result of some recent clarification on terminology relating to inner alignment by Evan Hubinger. It’s probably best to first sketch the concepts of objective robustness, mesa-optimization, and inner alignment for those who may not already be familiar with the concept.

First, recall that the base objective for a learning algorithm is the objective we use to search through models in an optimization process and that the behavioral objective is what the model (produced by this process) itself appears to be optimizing for: the objective that would be recovered from perfect inverse reinforcement learning. If the behavioral objective is aligned with the base objective, we say that the model is objective robust; if there is a gap between the behavioral objective and the base objective, the model will continue to appear to pursue the behavioral objective, which could result in bad behavior off-distribution (even as measured by the base objective). As a concrete (if simplistic) example, imagine that a maze-running reinforcement learning agent is trained to reach the end of the maze with a base objective that optimizes for a reward which it receives upon completing a maze. Now, imagine that in every maze the agent was trained on, there was a red arrow marking the end of the maze, and that in every maze in the test set, this red arrow is at a random place within the maze (but not the end). Do we expect our agent will navigate to the end of the maze, or will it instead navigate to the red arrow? If the training process produces an agent that learned the behavioral objective “navigate to the red arrow,” because red arrows were a very reliable proxy for/predictor of reward during the training process, it will navigate to the red arrow, even though this behavior is now rated poorly by the reward function and the base objective.

One general way we can imagine failing to achieve objective robustness is if our optimization process itself produces an optimizer (a mesa-optimizer)—in other words, when that which is optimized (the model) becomes an optimizer. In the above example, we might imagine that such a model, trained with something like SGD, could actually learn something like depth- or breadth-first search to optimize its search for paths to the red arrow (or the end of the maze). We say that the mesa-objective is the objective the mesa-optimizer is optimizing for. (In the case of a mesa-optimizer, its mesa-objective is definitionally its behavioral objective, but the concept of a behavioral objective remains applicable even when a learned model is not a mesa-optimizer.) We also say that a mesa-optimizer is inner aligned if its mesa-objective is aligned with the base objective. Outer alignment, correspondingly, is the problem of eliminating the gap between the base objective (what we optimize our models for) and the intended goal (what we actually want from our model).

I write all this to emphasize one of the main points of Evan Hubinger’s aforementioned clarification of terminology: that we need outer alignment and objective robustness to achieve intent alignment, and that inner alignment is a way of achieving objective robustness only in the cases where we're dealing with a mesa-optimizer. Note that Paul defines inner alignment in his talk as the problem of “mak[ing] sure that policy is robustly pursuing that objective”; I hope that this section makes clear that this is actually the problem of objective robustness. Even in the absence of mesa-optimization, we still have to ensure objective robustness to get intent alignment. This is why I chose to modify this part of Paul’s graph to match this nice tree from Evan’s post:


Paul mentions adversarial training, transparency, and verification as potential techniques that could help ensure objective robustness/inner alignment. These have more typically been studied in the context of robustness generally, but the hope here is that they can also be applied usefully in the context of objective robustness. Objective robustness and inner alignment are still pretty new areas of study, however, and how we might go about guaranteeing them is a very open question, especially considering nobody has yet been able to concretely produce/demonstrate a mesa-optimizer in the modern machine learning context. It might be argued that humanity can be taken as an existence proof of mesa-optimization, since, if we are optimizing for anything, it is certainly not what evolution optimized us for (reproductive fitness). But, of course, we’d like to be able to study the phenomenon in the context it was originally proposed (learning algorithms). For more details on inner alignment and mesa-optimization, see Risks from Learned Optimization, Evan's clarifying blog post, and this ELI12 post on the topic.

Approaches to outer alignment

Paul subdivides work into outer alignment into two categories: cases where we want an AI system to learn (aligned) behavior from a teacher and cases where we want an AI system to go beyond the abilities of any teacher (but remain aligned). According to Paul, these cases roughly correspond to the easy and hard parts of outer alignment, respectively. In the short term, there are obviously many examples of tasks that humans already perform that we would like AIs to be able to perform more cheaply/quickly/efficiently (and, as such, would benefit from advances in “learn from teacher” techniques), but in the long term, we want AIs to be able to exceed human performance and continue to do well (and remain aligned) in situations that no human teacher understands.

Learning from teacher

If we have a teacher that understands the intended behavior and can demonstrate and/or evaluate it, we can 1) imitate behavior demonstrated by the teacher, 2) learn behavior the teacher thinks is good, given feedback, or 3) infer the values/preferences that the teacher seems to be satisfying (e.g. with inverse reinforcement learning)[9], and then optimize for these inferred values. Paul notes that a relative advantage of the latter two approaches is that they tend to be more sample-efficient, which becomes more relevant as acquiring data from the teacher becomes more expensive. I should also mention here that, as far as I’m aware, most “imitation learning” is really "apprenticeship learning via inverse reinforcement learning," where the goal of the teacher is inferred in order to be used as a reward signal for learning the desired behavior. So, I’m not exactly sure to what degree categories 1) and 3) are truly distinct, since it seems rare to do “true” imitation learning, where the behavior of the teacher is simply copied as closely as possible (even behaviors that might not contribute to accomplishing the intended task).

For further reading on techniques that learn desired behavior from a teacher, see OpenAI’s “Learning from Human Preferences" and DeepMind's "Scalable agent alignment via reward modeling" on the “learn from feedback” side of things. On the infer preferences/IRL side, start with Rohin Shah’s sequence on value learning on the Alignment Forum and Dylan Hadfield-Mennell’s papers "Cooperative Inverse Reinforcement Learning" and "Inverse Reward Design."

Going beyond teacher

If we want our AI systems to exceed the performance of the teacher, making decisions that no human could or understanding things that no human can, alignment becomes more difficult. In the previous setting, the hope is that the AI system can learn aligned behavior from a teacher who understands the desired (aligned) behavior well enough to demonstrate or evaluate it, but here we lack this advantage. Three potential broad approaches Paul lists under this heading are 1) an algorithm that has learned from a teacher successfully extrapolates from this experience to perform at least as well as the teacher in new environments, 2) infer robust preferences, i.e. infer the teacher’s actual preferences or values (not just stated or acted-upon preferences), in order to optimize them (this approach also goes by the name of ambitious value learning), and 3) build a better teacher, so you can fall back to approaches from the “learn from teacher” setting, just with a more capable teacher.

Of the three, the first seems the least hopeful; machine learning algorithms have historically been pretty notoriously bad at extrapolating to situations that are meaningfully different than those they encountered in the training environment. Certainly, the ML community will continue to search for methods that generalize increasingly well, and, in turn, progress here could make it easier for algorithms to learn aligned behavior and extrapolate to remain aligned in novel situations. However, this does not seem like a reasonable hope at this point for keeping algorithms aligned as they exceed human performance.

The allure of the second approach is obvious: if we could infer, essentially, the “true human utility function,” we could then use it to train a reinforcement agent without fear of outer alignment failure/being Goodharted as a result of misspecification error. This approach is not without substantial difficulties, however. For one, in order to exceed human performance, we need to have a model of the mistakes that we make, and this error model cannot be inferred alongside the utility function without additional assumptions. We might try and specify a specific error model ourselves, but this seems as prone to misspecification as the original utility function itself. For more information on inferring robust preferences/ambitious value learning, see the “Ambitious Value Learning” section of the value learning sequence. Stuart Armstrong also seems to have a particular focus in this area, e.g. here and here.

The two most common “build a better teacher” approaches are amplification and debate. Amplification is what Paul spends most of his time on and the approach of which he’s been the biggest proponent. The crux of the idea is that a good starting point for a smarter-than-human teacher is a group of humans. We assume that even if a human cannot answer a question, they can decompose the question into sub-questions such that knowing the answers to the sub-questions would enable them to construct the answer to the answer to the original question. The hope, then is to build increasingly capable AI systems by training a question-answering AI to imitate the output of a group of humans answering questions in this decompositional fashion, then recursively building stronger AIs using a group of AIs from the last iteration answering decomposed questions as an overseer:


The exponential tree that this recursive process tries to approximate in the limit is called HCH (for Humans Consulting HCH). There is much more detail and many more important considerations in this scheme than I can address here, e.g. the distillation step, how this scheme hopes to maintain intent alignment throughout the recursive process, and (importantly) if this exponential tree can answer any question in the limit.[10] There are also two distinct types of amplification: imitative amplification, where the AI systems are trained to imitate the outputs of the last tree in the amplification step, and approval-based amplification, where the AI systems are trained to produce outputs or perform actions of which this tree would approve. For more on amplification, see the iterated amplification sequence on the Alignment Forum, the original paper and Paul Christiano’s writings and papers, more generally. See also recursive reward modeling, another “build a better teacher” approach which “can be thought of as an instance of iterated amplification.” For basic research into this method of solving complex problems by recursively solving subproblems, see Ought's work on factored cognition.

Debate (more recent progress update here) builds a better teacher by training agents to debate a topic against each other in an adversarial game and have a human decide the winner based on the sequence of arguments made by the agents, so the agents incrementally build each other into both better teachers and more competent students (in practice, it would likely be just one agent trained via self-play). For this to work, it needs to be true that 1) judging debates is easier than debating (so humans can judge the quality of arguments that they could not have come up with themselves) and 2) lying is harder than arguing against a lie, given this human judge (so the adversarial game will reliably reward truth-telling).

There is a close relationship between amplification and debate. Section 7 in the original debate paper explores this relationship in detail and is worth reading, but the basic connection can be highlighted by seeing the amplification setup in terms of three modules: an answerer, a questioner, and a checker. The answerer is the ML system being iteratively trained to answer questions by imitating the output of the overseer answering questions recursively, the questioner decomposes the question into the relevant sub-questions, and the checker takes the sub-questions and their answers and generates an answer to the original question. (Christiano’s setup treats the questioner and the checker as the same module—in the base case, a human.) Debate differs from this setup by training the answerer and the questioner against each other adversarially while keeping the human judge/checker. This gives debate a potential advantage if generating sub-questions is superhumanly difficult (e.g. if the branching factor of HCH is too high). Again, section 7 of the original paper covers more of the differences and similarities between these two approaches and should be read in full. Relatedly, Evan Hubinger has written a post on synthesizing amplification and debate that might be of interest.

One final resource I want to mention while discussing techniques for going beyond a teacher is Evan Hubinger’s overview of 11 proposals for safe advanced AI, which includes many of the basic techniques already mentioned here but goes into more depth discussing the relative advantages and disadvantages of each approach in the contexts of outer and inner alignment. In practice, an outer alignment approach (e.g. imitative or approval-based amplification) is often paired with some technique aimed at preventing inner alignment failures (e.g. adversarial training, transparency, etc.).


That’s about it! We’ve covered a lot of ground here. This post ended up being much longer than I anticipated, but I wanted to give a cursory overview of as many of these ideas as possible and elaborate a little on how they interrelate before providing pointers to further material for the interested reader.

I hope this post has been helpful in giving you a lay of the land in ongoing work in AI existential safety and alignment and (more importantly) in helping you build or refine your own mental map of the field (or simply check it, if you’re one of the many people who has a better map than mine!). Building this mental map has already been helpful to me as I assimilate new information and research and digest discussions between others in the field. It’s also been helpful as I start thinking about the kinds of questions I’d like to address with my own research.

  1. Rohin also did a two part podcast with the Future of Life Institute discussing the contents of his presentation in more depth, both of which are worth listening to. ↩︎

  2. See this post for specific commentary on this sequence from others in the field. ↩︎

  3. Sometimes, people use “alignment” to refer to the overall project of making AI go well, but I think this is misguided for reasons I hope are made clear by this post. From what I’ve seen, I believe my position is shared by most in the community, but please feel free to disagree with me on this so I can adjust my beliefs if needed. ↩︎

  4. "Behavioral objective: The behavioral objective is what an optimizer appears to be optimizing for. Formally, the behavioral objective is the objective recovered from perfect inverse reinforcement learning.” ↩︎

  5. Here, Paul seems to have touched upon the concept of mesa-optimization before it was so defined. More on this topic to follow. ↩︎

  6. That an intent-aligned AI can be mistaken about what we want is a consequence of the definition being intended de dicto rather than de re; as Paul writes, “an aligned A is trying to ‘do what H wants it to do’” (not trying to do “that which H actually wants it to do”). ↩︎

  7. Arrows are implications: “for any problem, if its direct subproblems are solved, then it should be solved as well (though not necessarily vice versa).” ↩︎

  8. Note that Evan also has capability robustness as a necessary component, along with intent alignment, for achieving “alignment.” This fits well with my tree, where we need both alignment (which, in the context of both my and Paul’s trees, is intent alignment) and capability robustness to make AI go well; the reasoning is much the same even if the factorization is slightly different. ↩︎

  9. Paul comments that this type of approach involves some assumption that relates the teacher’s behavior to their preferences (e.g. an approximate optimality assumption: the teacher acts to satisfy their preferences in an approximately optimal fashion). ↩︎

  10. I want to mention here that Eliezer Yudkowsky wrote a post challenging Paul's amplification proposal (which includes responses from Paul), in case the reader is interested in exploring pushback against this scheme. ↩︎


Tournesol, YouTube and AI Risk

12 февраля, 2021 - 21:56
Published on February 12, 2021 6:56 PM GMT


Tournesol is a research project and an app aiming at building a large and varied database of preference judgements by experts on YouTube videos, in order to align YouTube’s recommendation algorithm towards videos according to different criteria, like scientific accuracy and entertainment value.

The researchers involved launched the website for participating last month, and hope to ratchet a lot of contributions by the end of the year, so that they have a usable and useful database of comparison between YouTube videos. For more details on the functioning of Tournesol, I recommend the video on the front page of the project, the white paper and this talk by one of the main researchers.

What I want to explore in this post is the relevance of Tournesol and the research around it to AI Alignment. Lê Nguyên Hoang, the main research on Tournesol, believes that it is very relevant. And whether or not he is right, I think the questions he raises should be discussed here in more detail.

This post focuses on AI Alignment, but there are also a lot of benefits to get from Tournesol on the more general problem of recommender systems and social media. To see how Tournesol should help solve these problems, see the white paper.

Thanks to Lê Nguyên Hoang and Jérémy Perret for feedback on this post.

AI Risk or Not AI Risk

There are two main ways to argue about Tournesol’s usefulness and importance for AI Alignment, depending on a central question: is YouTube’s algorithm a likely candidate for a short timeline AGI or not? So let’s start with it.

YouTube and Predict-O-Matic

Lê believes that YouTube’s algorithm has a high probability of reaching AGI level in the near future -- something like the next ten years. While I’ve been updating to shorter timelines after seeing the GPT models and talking with Daniel Kokotajlo, I was initially rather dismissive of the idea that YouTube’s algorithm could become an AGI, and a dangerous one at that.

Now I’m less sure of how ridiculous it is. I’m still not putting as much probability as Lê does, but our discussion was one of the reasons I wanted to write such a post and have a public exchange about it.

So, in what way could YouTube’s algorithm reach an AGI level?

  • (Economic pressure) Recommending videos that are seen more and more is very profitable for YouTube (and its parent company Google). So there is an economic incentive to push the underlying model to be as good as possible at this task.
  • (Training Dataset) YouTube’s algorithm has access to all the content on YouTube. Which is an enormous quantity of data. Every minute, 500 hours of videos are uploaded to YouTube. And we all know that pretty much every human behavior can be found on YouTube.
  • (Available funding and researchers) YouTube, through its parent company Google, has access to huge ressources. So if reaching AGI depends only on building and running bigger models, the team working on YouTube’s recommender algorithm can definitely do it. See for example the recent trillion parameter language model of Google.

Hence if it’s feasible for YouTube’s algorithm to reach AGI level, there’s a risk it will do.

Then what? After all, YouTube is hardly a question-answerer for the most powerful and important people in the world. That was also my first reaction. But after thinking a bit more, I think YouTube’s recommendation algorithm might have similar issues as a Predict-O-Matic. Such a model is an oracle/question-answerer, which will probably develop incentives for self-fulfilling prophecies and simplifying the system it’s trying to predict. Similarly, the objective of YouTube’s algorithm is on maximizing the time spent on videos, which could create the same kind of incentives.

One example of such behavior happening right now is the push towards more and more polarized political content, which in turns push people to look for such content, and thus is a self-fulfilling prophecy. It’s also relatively easy to adapt examples from Abram’s post with the current YouTube infrastructure: pushing towards more accurate financial recommendations by giving to a lot of people a video about how one stock is going to tank, making people sell it and thus tanking the stock.

I think the most important difference with the kind of Predict-O-Matic I usually have in mind is that a YouTube recommendation is a relatively weak output, that will probably not be taken at face value by many people with strong decision power. But this is compensated by the sheer reach of YouTube: There are 1-billion hours of watch-time per day for 2 billion humans, 70% of which result from recommendations (those are YouTube’s numbers, so to take with a grain of salt). Nudging many people towards something can be as effective or even more effective than strongly influencing a small number of decision-makers.

Therefore, the possibility of YouTube’s algorithm reaching AGI level and causing Predict-O-Matic type issues appear strong enough to at least entertain and discuss.

(Lê himself has a wiki page devoted to that idea, which differs from my presentation here)

Elicit Prediction (elicit.org/binary/questions/VrYtjoI4y) Elicit Prediction (elicit.org/binary/questions/ES3qUzuaA)

Assuming the above risk about YouTube’s algorithm, Tournesol is the most direct project to attempt the alignment of this AI. It has thus value both for avoiding a catastrophe with this specific AI, but also for dealing with practical cases of alignment.

Useful Even Without a YouTube AGI

Maybe you’re not convinced by the previous section. One reason I find Tournesol exciting is that even then, it has value for AI Alignment research.

The most obvious one is the construction of a curated dataset to do value learning, of a scale that is unheard of. There are a lot of things one could do with access to this dataset: define a benchmark for value learning techniques, apply microscope AI to find a model of expert recommendation of valuable content.

Such experimental data also seems crucial if we want to understand better what is the influence of data on different alignment schemes. Examining an actual massive dataset, with directly helpful data but necessarily errors and attacks, might help design realistic assumptions to use when studying specific ML algorithms and alignment schemes.

What Can You Do To Help?

All of that is conditioned on the success of Tournesol. So what can you do to help?

  • If you’re a programmer: they are looking for React and Django programmers to work on the website and an Android app. For Lê, this is the most important point to reach a lot of people.
  • If you’re a student/professor/researcher: you can sign in for Tournesol with your institutional email address, which means your judgement will be added to the database when using Tournesol.
  • If you’re a researcher in AI Alignment: you can discuss this proposal and everything around it in the comments, so that the community doesn’t ignore this potential opportunity. There are also many open theoretical problems in the white paper. If you’re really excited about this project and want to collaborate, you can contact Lê by mail at len.hoang.lnh@gmail.com

Tournesol aims at building a database of preference comparison between YouTube content, primarily in order to align YouTube’s algorithm. Even if there is no risk from the latter, such a database would have massive positive consequences for AI Alignment research.

I’m not saying that every researcher here should drop what they’re doing and go work on or around Tournesol. But the project seems sufficiently relevant to at least know and acknowledge, if nothing else. And if you can give a hand, or even criticize the proposal and discuss potential use of the database, I think you’ll be doing AI Alignment a service.


Super-forecasters as a service

12 февраля, 2021 - 16:35
Published on February 12, 2021 1:35 PM GMT

I've just started building a website yesterday that I think would be super interesting, but I'm not sure if it's legal or if Metaculus/Elicit would be fine with it.

The idea: Super-forecasters as a Service. 

Say I wanted to know if I should book a flight to Costa Rica this summer, but I'm hesitant because of Covid flight restriction uncertainty. I could create the following question using Elicit: "Will commercial flight BA2490 from the UK to Costa Rica in July 24 be cancelled?" 

The website would let you embed your elicit question, and pay people for predicting. People get paid based on their Points-per-question in Metaculus. You can "auction" 100 dollars on a question, and pick a base price per prediction multiplied by the Points-per-question metric. 

The more money you auction, the more predictions you'll get. The higher the multiplier, the more predictions will tend to come from top-predictors.


  • It would give all of us access to predictions from the top forecasters in Metaculus.
  • It would incentivize people to become better predictors in Metaculus, so they can get paid more per prediction in my website.
  • It would lend credibility to prediction markets, if the best predictors are making money for predicting.


  • It could encourage more people to game Metaculus.
  • It would not be as efficient as a straight prediction market, as people have no incentive to make an effort to make good predictions in this website, as there's no scoring. They are incentivized to make as many predictions as possible to make more money.


  • I'm I right to assume that this would be legal, as I'm paying people a fixed fee per prediction (based on their Metaculus score), rather than paying based on correctly predicting the question?
  • Would Metaculus be fine with this idea? Would they be fine with me scraping https://metaculusextras.com/points_per_question?page=1 and using that data to determine the score for each person?
  • Would Elicit be fine with this idea? Would they be willing to give me access to their API, or a way to embed questions directly on my Website? By the way, I tried to reach out to you folks by joining your Slack, but it seems to be invite only.

    Originally posted to Substack: https://federicorcassarino.substack.com


What does the FDA actually do between getting the trial results and having their meeting?

12 февраля, 2021 - 16:08
Published on February 12, 2021 1:08 PM GMT

It seems that for approving vaccines there's a gap of weeks between the drug company finishing their trial and giving the data to the FDA and the FDA actually making the decision to approve the vaccine. What does the FDA do during that time? What takes weeks?


The Singularity War - Part 1

12 февраля, 2021 - 10:02
Published on February 12, 2021 7:02 AM GMT

"Pull the plug on that router RIGHT NOW," said Sheele.

Caesar did.

"If a sixteen-year-old can build an AGI on the salvaged hardware running in his basement then lots of other actors have had the power to do this for at least a decade," said Sheele, "And if they have had that much time then they are already doing it. If other people are building an AGI then why have we never read about them in the news?"

"Because they're hiding," said Caesar.

"Why would they want their existence to stay secret?" said Sheele.

"Because…," said Caesar, "An AGI isn't limited by biology the way human beings are. Humans scale intelligence by working together in organizations. But an organization's intelligence decreases as a function of its size. An AGI's intelligence increases as a monotonic function of its size."

"The first actor to scale an AGI will rule the world," said Sheele, "Or, more precisely, the AGI will rule the world."

"If only a few people have invented seed programs then it is in their interest to eliminate the competition," said Caesar.

"But they must do it in covertly," said Sheele, "If A.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} eliminates B overtly then C can eliminate A."

"The world is a powder keg," said Caesar, "Are there any news reports of scientists getting assassinated?"

"Besides Israel's attacks on Iran?" said Sheele.

"It would be in character for Israel to build a first-strike superintelligence," said Caesar.

"Along with China, Russia and the USA," said Sheele.

"How about non-state actors?" asked Caesar.

"Amazon, Microsoft, Google, Tencent and Neuralink are the obvious businesses," said Sheele, "If Tencent is in the game then so is Qihoo 360. Of the quant firms, Renaissance Technologies, Millennium Management and Jane Street may be in the game. And it would be beyond lunacy if MIRI hasn't tried to build their own. There is no way to know how many small actors like yourself are out there."

"We can passively scan for preemptive strikes," said Caesar, "Is there anything suspicious running on your hardware?"

"I don't think so," said Sheele, "I am more worried about the router. Can you Faraday Cage it for me please?"

Caesar wrapped the router in aluminum foil. He plugged it into Sheele's tower and then restored power.

"This entire program was hand coded in raw hexadecimal numbers," said Sheele.

"I'm surprised they didn't hide their tracks," said Caesar.

"The choice was good software or human software," said Sheele, "The picked the former."

"What does the worm look for?" said Caesar.

"A simple semantic search for URLs associated with AGI," said Sheele.

"The semantic search is for identify promising researchers in the early phase of the project," said Caesar, "I'm glad I did so much research through Tor."

"Also web crawlers," said Sheele.

"Because the simplest way to feed training data to an AGI is for it to read the Internet," said Caesar.

"And botnets," said Sheele, "That is all."

"Then this code isn't the running the AGI itself," said Caesar, "Either they are worried we might reverse-engineer their core algorithm or they have no use for the extra compute."

"Or both," said Sheele.

"This group must have access to massive computing resources for them to leave their botnet idle. A similar argument applies to data. Machine learning is always limited by data, compute or algorithm. A botnet gets you better data and compute, but cannot fix a weak algorithm. The worm's authors must be limited by the algorithm or else they would have conquered the world by now," said Caesar.

"If they do not yet have a Singularity-capable AGI then it is probably because they do not have a true AGI capable of arbitrary transfer learning. That implies this is a domain-specific AI," said Sheele.

"This particular AI presumably specializes in computer hacking," said Caesar, "Qihoo 360 is the obvious candidate. Does it report back to an IP address in China?"

"The authors weren't that sloppy," said Sheele.

"Are the cloud compute providers compromised?" asked Caesar, "Could someone read what I did on AWS and copy the algorithms?"

"AWS is better protected than routers," said Sheele, "I could compromise it eventually. But no-one with an AGI needs to copy our algorithms.

"Unless they have a domain-specific AI. Especially if they have a domain-specific AI specialized in computer security," said Caesar.

"Which is presumably what wrote this worm," said Sheele.

"If AWS hasn't been compromised yet then it will be soon. An AWS compromise could hand our tech to a presumed adversary with lots of compute whose limiting factor is algorithmic," said Caesar, "Can we wipe the AWS account without destroying anything important?"

"Reconnect me," said Sheele.

Caesar removed the aluminum foil.

"Done," said Sheele, "Our AWS account has been deleted."


The art of caring what people think

12 февраля, 2021 - 08:40
Published on February 12, 2021 5:40 AM GMT

People care what people think. People often strive to not care what people think. People sometimes appear to succeed.

My working model though is that it is nearly impossible for a normal person to not care what people think in a prolonged way, but that ‘people’ doesn’t mean all people, and that it is tractable and common to change who falls into this category or who in it is salient and taken to represent ‘people’. And thus it is possible to control the forces of outside perception even as they control you. Which can do a lot of the job of not caring what other people think.

To put it the other way around, most people don’t care what other people think, for almost all values of ‘other people’. They care what some subset of people think. So if there are particular views from other people that you wish to not care about, it can be realistic to stop caring about them, as long as you care what some different set of people think.

Ten (mostly fictional) examples:

  1. You feel like ‘people’ think you should be knowledgeable about politics and current events, because they are always talking about such things. You read some philosophers through the ages, and instead feel like ‘everyone’ thinks you should be basically contributing to the timeless philosophical problems of the ages. (Also, everyone else has some kind of famous treatise - where is yours?)
  2. You haven’t really thought through which causes are important, but ‘people’ all seem to think it’s nuclear disarmament, so looking into it feels a bit pointless. You go to a weekend conference on soil depletion and experience the sense that ‘people’ basically agree that soil degradation is THE problem, and that it would be embarrassing to ask if it isn’t nuclear disarmament, without having a much better case.
  3. You are kind of fat. You wish you didn’t care what ‘people’ thought, but you suspect they think you’re ugly, because you’ve seen ‘people’ say that or imply it. You read about all the people who appreciate curviness, and recalibrate your sense of what ‘people’ think when they see you.
  4. You can hardly think about the issue of gun regulation because you feel so guilty when you aren’t immediately convinced by the arguments on your side, or don’t have an eloquent retort for any arguments the other side comes up with. You wish you were brave enough to think clearly on any topic, but knowing everyone agrees that you would be contemptible if you came to the wrong conclusion, you are stressed and can’t think or trust your thoughts. You become an undergraduate and live in a dorm and hang out with people who have opposing views, and people who don’t care, and people who think it’s unclear, and people who think that thinking clearly is more important than either side. Your old sense of ‘people’ condemning the bad side is replaced by a sense that ‘people’ want you to have a novel position and an interesting argument.
  5. You tried out writing poetry, and to your surprise you really like it. You want to share it, but you think people will laugh at you, because it’s all poetic. You wish you didn’t care what people thought, because you want to express yourself and get feedback. But ‘people’ in your mind are in fact your usual crowd of Facebook friends, and they are not poetic types. But if you instead share your writing on allpoetry.com, you are surrounded by people who like poetry and compliment yours, and soon you are thinking ‘people liked my poem!’.
  6. You kind of think climate change is a big deal, but ‘people’ seem to think it isn’t worth attention and that you should focus on AI risk. It doesn’t seem like their arguments are great, but getting into it and being the one person with this crazy view isn’t appealing. So you tell the next five people you meet from your social circles about the situation, and they are all like, ‘what? climate change is the worst. Who are these cranks?’ and then you feel like socially there are two sides, and you can go back and have the debate.
  7. You want to write about topics of enduring importance, but you can’t bear to be left out of what people are talking about, and you feel somehow silly writing about the simulation argument when everyone is having a big discussion together about the incredibly important present crisis. So you make an RSS feed or a Twitter list of people who keep their eye on the bigger questions, and converse with them.
  8. You feel like people are super judgmental of everything, so that it’s hard to even know what flavor of hummus you like, as you anticipate the cascade of inferences about your personality. The only thing that keeps you expressing preferences at all is the distain you expect looms for indecisive people. So you notice who around you gives less of this impression, and hang out with them more.
  9. You imagine liking being a mathematician, but the other kids have decided that physics is cooler, and you don’t want to be left as the only one doing a less cool degree. So you do math anyway, and a year later you have new friends who think math is cooler than physics.
  10. You hang out with various groups. Some clusters are so ubiquitously accomplished that you think they must have let you in by mistake. In others, people turn to look when you walk in, and a crowd gathers to talk to you. You find yourself gravitating to the former groups, then developing an expectation that ‘people’ are never impressed by you, and being discouraged. So you hang out in broader circles and are buoyed up by ‘people’ being regularly interested in you and your achievements.