## Вы здесь

# Новости LessWrong.com

*Адрес:*https://www.lesswrong.com/

*Обновлено:*1 час 5 минут назад

### FB/Discord Style Reacts

For the past year I've wanted LessWrong to include something like Discord, Facebook or Slashdot style reactions.

*Facebook Style* means "there's a few key reactions that people use"

*Discord Style* means "there's nigh-infinite reactions and you can add more, but there still end up being a few commonly used defaults."

*Slashdot Style* means "after upvoting or downvoting, you have the option of clicking a button that clarifies why you upvoted or downvoted."

Of these, I'm most excited for Discord-Style. But I think any of them would be improvements (if done well)

Habryka recently wrote a shortform comment on this subject. My own thoughts come in a few different frames.

Separating Enthusiasm from Approval**Boos/Yays vs 'approve/disapprove'**

Empirically, people want to cheer for their causes, boo causes they dislike, signal their social allegiance and try to ensure the overton window moves in the direction they want. I don't think you can really fight this. But you can nudge people to disentangle this from "what gets attentional allocation on a site about rationality."

I think it's important that when you see a comment you like, and you feel the impulse to go "yeah! good point! go team!" the first impulse you have, the first button available and exciting to click, is a button that *doesn't* send any signals about how that comment should be sorted, and doesn't aggregate into an overall user-score you can check (that, for good or for ill, people will tend to associate with social status)

**O ther things from 'approve/disapprove'**

Boos/yays aren't the only thing I'm worried about. Ideally, I want LessWrong to reward good thinking over things like being funny, or exciting. (Being funny and exciting should still get rewarded, but no amount of clever injokes should add up to something greater than "wrote an actually useful, insightful point."

**"Viscerally Fun but Low Signal Buttons" should be easy to access. "Higher Signal" buttons should require more effort and thought.**

With both of the above in mind, I think it's important that "Yay", or "Funny" buttons should be the first, most obvious thing to click on. They should feel satisfying to click, and you shouldn't feel motivated to click more things if that's the only reason you were upvoting.

The buttons that send more important signals should require a bit of extra effort, and force you to at least notice some cognitive dissonance if you're upvoting people just because they're on your side.

Social Entanglement, Epistemic Entanglement and Common KnowledgeOne react someone expressed interested in was a simple "acknowledged." Votes are totally anonymous, and that means if you want someone to know that you have read a thing, you have to actually comment, which is moderately high effort and takes up a lot of vertical space on the page. Whether someone has read a thing is fairly important information about how to continue a conversation.

By default, on many social-media platforms, likes are public. They were also public on the old Intelligent Agent Foundations Forum (and I think probably on Arbital, although not sure offhand).

This does two things, which I have mixed feelings about.

One is social entanglement. Visibly liking each other's comments is part of the process by which people build social trust and alliances. I think there's reason to be cautious about LessWrong directly facilitating that.

Another is clarity on *who believes what*, and whose judgment you trust. When you're building a serious, complex idea, it's actually important who understands what concepts, who thinks different concepts are important. There are people I *do* in fact trust more intellectually than others, and it's higher signal to know that one of them liked a post than some rando. It's also more informative when I know that multiple people I trust disagree.

My current best guess is that it's best for the voting on LessWrong to be anonymous, but for reactions to display usernames on hover-over. It might or might not be feasible or desirable (from a UI complexity standpoint) to let people choose whether to react publicly. But I can imagine changing my mind about this.

Making it lower effort to give feedback.Receiving a downvote without explanation sucks. Some people complain about this – "can't you provide reasons for your downvotes?" Well, no. Trivial inconveniences matter. If you force people to provide information and figure out how to articulate what's wrong with something, people will probably just stop giving feedback rather than actually providing reasons.

Not only does this require figuring out how to write a comment, it opens up a line of engagement that you might have to put *even more* effort into defending.

[this is an empirical claim, it's perhaps worth the experiment of requiring downvotes to always require reasons, but I'm not optimistic about it].

But I think there are some fairly common reasons why a comment gets downvoted, that could at least make it lower-effort to give feedback:

- "This comment seemed a bit confused"
- "This comment seemed to be rounding things off in an oversimplified way"
- "This comment seems wrong in ways that have previously been explored at length on LessWrong"
- "This comment seems mean spirited."
- "This comment seemed to be acting in bad faith"

It's also nice to improve the reward signal for particularly good actions:

- "This comment was particularly clear"
- "This comment made special effort to be rigorous and credible."
- "This comment actually changed my mind about something."
- "This comment made special effort to be charitable"

You'll notice some issues, comparing the above feedbacks to Facebook Reacts.

Facebook reacts are "haha!" "love!" "sad!" "anger!" "wow!"

Everyone knows what those mean. Everyone knows that everyone else knows what those mean. They are very short words. They are (due to millennia of evolution, genetic and cultural) conceptually simple.

"This comment seems to be rounding things off in an oversimplified way" is a less common concept. It's more complicated. And if you simplified it slightly so that the button said "Oversimplified"... that would... actually be an oversimplified button. It's important that I'm just saying "yo this comment was oversimplified", but rather that it seemed (probably) to be making a subtle error.

I think this is really important. I think something LessWrong needs to do is nuanced critiques *easier to chunk*. This is pretty tricky, since, well, the whole point of nuances is that they're *nuanced.*

A rationalist friend once commented, in non-rationalist circles, that when they tried to say "I agree with your point but I think this particular part has a logical error", they would often have people... just completely fail to parse that. It wasn't in their schema at all.

On LessWrong, we have some shared context where we mostly all understand not to just have Arguments Be Soldiers and whatnot. Our schema includes Local Validity. But there are many important, key concepts that still take a lot more effort to express than "yay/boo" or "haha!"

And thing is... it's not like "Love" is a simple concept. When someone clicks 'Love' on one of my facebook posts, there is a fairly rich wave of senses I get (depending on my post, and depending on my relationship with the person in question). When someone posts about their pet dying and I click 'Love', there's this whole shared context about how we're both human and we know what it is to lose people and my heart goes out to them and I chest tenses slightly and there's... just a whole lot going.

Still, I'm able to chunk that complexity into a concept called "Love", and it's easily available for me to access.

There's a potential longterm vision for LessWrong – maybe not the right vision, but possible – where part of what we're doing here is distilling concepts down so thoroughly that a single word can communicate a lot of nuance.

Language real estate is limited, and I'm not sure which concepts make the most sense to distill in such a way. There's also certainly room for this to fail, where instead of being able to more-easily-express nuanced concepts it ends up destroying nuance.

Facebook has cheapened the word "friend", and that's important. But... I also have an impression of it having made it *easier for me to express love*, in a way that so far seems net positive.

It feels exciting to me to imagine one day living on a world where* "this changed my mind"* or *"this was well thought even though I disagree"* to feel like basic, obvious concepts that are important enough to be communicated with a single word.

Discuss

### Conditions for Mesa-Optimization

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

*This is the second of five posts in the Mesa-Optimization Sequence based on the upcoming MIRI paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper, with the full paper set to be published on the MIRI blog with the release of the last post in the sequence.*

In this post, we consider how the following two components of a particular machine learning system might influence whether it will produce a mesa-optimizer:

**The task:**The training distribution and base objective function.**The base optimizer:**The machine learning algorithm and model architecture.

We deliberately choose to present theoretical considerations for why mesa-optimization may or may not occur rather than provide concrete examples. Mesa-optimization is a phenomenon that we believe will occur mainly in machine learning systems that are more advanced than those that exist today.[1] Thus, an attempt to induce mesa-optimization in a *current* machine learning system would likely require us to use an artificial setup specifically designed to induce mesa-optimization. Moreover, the limited interpretability of neural networks, combined with the fact that there is no general and precise definition of “optimizer,” means that it would be hard to evaluate whether a given model is a mesa-optimizer.

2.1. The task

Some tasks benefit from mesa-optimizers more than others. For example, tic-tac-toe can be perfectly solved by simple rules. Thus, a base optimizer has no need to generate a mesa-optimizer to solve tic-tac-toe, since a simple learned algorithm implementing the rules for perfect play will do. Human survival in the savanna, by contrast, did seem to benefit from mesa-optimization. Below, we discuss the properties of tasks that may influence the likelihood of mesa-optimization.

**Better generalization through search.** To be able to consistently achieve a certain level of performance in an environment, we hypothesize that there will always have to be some minimum amount of optimization power that must be applied to find a policy that performs that well.

To see this, we can think of optimization power as being measured in terms of the number of times the optimizer is able to divide the search space in half—that is, the number of bits of information provided.(9) After these divisions, there will be some remaining space of policies that the optimizer is unable to distinguish between. Then, to ensure that all policies in the remaining space have some minimum level of performance—to provide a performance lower bound[2] —will always require the original space to be divided some minimum number of times—that is, there will always have to be some minimum bits of optimization power applied.

However, there are two distinct levels at which this optimization power could be expended: the base optimizer could expend optimization power selecting a highly-tuned learned algorithm, or the learned algorithm could itself expend optimization power selecting highly-tuned actions.

As a mesa-optimizer is just a learned algorithm that itself performs optimization, the degree to which mesa-optimizers will be incentivized in machine learning systems is likely to be dependent on which of these levels it is more advantageous for the system to perform optimization. For many current machine learning models, where we expend vastly more computational resources training the model than running it, it seems generally favorable for most of the optimization work to be done by the base optimizer, with the resulting learned algorithm being simply a network of highly-tuned heuristics rather than a mesa-optimizer.

We are already encountering some problems, however—Go, Chess, and Shogi, for example—for which this approach does not scale. Indeed, our best current algorithms for those tasks involve explicitly making an optimizer (hard-coded Monte-Carlo tree search with learned heuristics) that does optimization work on the level of the learned algorithm rather than having all the optimization work done by the base optimizer.(10) Arguably, this sort of task is only adequately solvable this way—if it were possible to train a straightforward DQN agent to perform well at Chess, it plausibly would *have* to learn to internally perform something like a tree search, producing a mesa-optimizer.

We hypothesize that the attractiveness of search in these domains is due to the diverse, branching nature of these environments. This is because search—that is, optimization—tends to be good at generalizing across diverse environments, as it gets to individually determine the best action for each individual task instance. There is a general distinction along these lines between optimization work done on the level of the learned algorithm and that done on the level of the base optimizer: the learned algorithm only has to determine the best action for a given task instance, whereas the base optimizer has to design heuristics that will hold regardless of what task instance the learned algorithm encounters. Furthermore, a mesa-optimizer can immediately optimize its actions in novel situations, whereas the base optimizer can only change the mesa-optimizer's policy by modifying it ex-post. Thus, for environments that are diverse enough that most task instances are likely to be completely novel, search allows the mesa-optimizer to adjust for that new task instance immediately.

For example, consider reinforcement learning in a diverse environment, such as one that directly involves interacting with the real world. We can think of a diverse environment as requiring a very large amount of computation to figure out good policies before conditioning on the specifics of an individual instance, but only a much smaller amount of computation to figure out a good policy once the specific instance of the environment is known. We can model this observation as follows.

Suppose an environment is composed of N different instances, each of which requires a completely distinct policy to succeed in.[3] Let P be the optimization power (measured in bits(9)) applied by the base optimizer, which should be approximately proportional to the number of training steps. Then, let x be the optimization power applied by the learned algorithm in each environment instance and f(x) the total amount of optimization power the base optimizer must put in to get a learned algorithm capable of performing that amount of optimization.[4] We will assume that the rest of the base optimizer's optimization power, P−f(x), goes into tuning the learned algorithm's policy. Since the base optimizer has to distribute its tuning across all N task instances, the amount of optimization power it will be able to contribute to each instance will be P−f(x)N, under the previous assumption that each instance requires a completely distinct policy. On the other hand, since the learned algorithm does all of its optimization at runtime, it can direct all of it into the given task instance, making its contribution to the total for each instance simply x.[5]

Thus, if we assume that, for a given P, the base optimizer will select the value of x that maximizes the minimum level of performance, and thus the total optimization power applied to each instance, we get[6]

x=argmaxx P−f(x)N+x.

As one moves to more and more diverse environments—that is, as N increases—this model suggests that x will dominate P−f(x)N, implying that mesa-optimization will become more and more favorable. Of course, this is simply a toy model, as it makes many questionable simplifying assumptions. Nevertheless, it sketches an argument for a pull towards mesa-optimization in sufficiently diverse environments.

As an illustrative example, consider biological evolution. The environment of the real world is highly diverse, resulting in non-optimizer policies directly fine-tuned by evolution—those of plants, for example—having to be very simple, as evolution has to spread its optimization power across a very wide range of possible environment instances. On the other hand, animals with nervous systems can display significantly more complex policies by virtue of being able to perform their own optimization, which can be based on immediate information from their environment. This allows sufficiently advanced mesa-optimizers, such as humans, to massively outperform other species, especially in the face of novel environments, as the optimization performed internally by humans allows them to find good policies even in entirely novel environments.

**Compression of complex policies.** In some tasks, good performance requires a very complex policy. At the same time, base optimizers are generally biased in favor of selecting learned algorithms with lower complexity. Thus, all else being equal, the base optimizer will generally be incentivized to look for a highly compressed policy.

One way to find a compressed policy is to search for one that is able to use general features of the task structure to produce good behavior, rather than simply memorizing the correct output for each input. A mesa-optimizer is an example of such a policy. From the perspective of the base optimizer, a mesa-optimizer is a highly-compressed version of whatever policy it ends up implementing: instead of explicitly encoding the details of that policy in the learned algorithm, the base optimizer simply needs to encode how to search for such a policy. Furthermore, if a mesa-optimizer can determine the important features of its environment at runtime, it does not need to be given as much prior information as to what those important features are, and can thus be much simpler.

This effect is most pronounced for tasks with a broad diversity of details but common high-level features. For example, Go, Chess, and Shogi have a very large domain of possible board states, but admit a single high-level strategy for play—heuristic-guided tree search—that performs well across all board states.(10) On the other hand, a classifier trained on random noise is unlikely to benefit from compression at all.

The environment need not necessarily be too diverse for this sort of effect to appear, however, as long as the pressure for low description length is strong enough. As a simple illustrative example, consider the following task: given a maze, the learned algorithm must output a path through the maze from start to finish. If the maze is sufficiently long and complicated then the specific strategy for solving this particular maze—specifying each individual turn—will have a high description length. However, the description length of a general optimization algorithm for finding a path through an arbitrary maze is fairly small. Therefore, if the base optimizer is selecting for programs with low description length, then it might find a mesa-optimizer that can solve all mazes, despite the training environment only containing one maze.

**Task restriction.** The observation that diverse environments seem to increase the probability of mesa-optimization suggests that one way of reducing the probability of mesa-optimizers might be to keep the tasks on which AI systems are trained highly restricted. Focusing on building many individual AI services which can together offer all the capabilities of a generally-intelligent system rather than a single general-purpose artificial general intelligence (AGI), for example, might be a way to accomplish this while still remaining competitive with other approaches.(11)

**Human modeling.** Another aspect of the task that might influence the likelihood of mesa-optimization is the presence of humans in the environment.(12) Since humans often act as optimizers, reasoning about humans will likely involve reasoning about optimization. A system capable of reasoning about optimization is likely also capable of reusing that same machinery to do optimization itself, resulting in a mesa-optimizer. For example, it might be the case that predicting human behavior requires instantiating a process similar to human judgment, complete with internal motives for making one decision over another.

Thus, tasks that do not benefit from human modeling seem less likely to produce mesa-optimizers than those that do. Furthermore, there are many tasks that might benefit from human modeling that don't explicitly involve modeling humans—to the extent that the training distribution is generated by humans, for example, modeling humans might enable the generation of a very good prior for that distribution.

2.2. The base optimizer

It is likely that certain features of the base optimizer will influence how likely it is to generate a mesa-optimizer. First, though we largely focus on reinforcement learning in this sequence, RL is not necessarily the only type of machine learning where mesa-optimizers could appear. For example, it seems plausible that mesa-optimizers could appear in generative adversarial networks.

Second, we hypothesize that the details of a machine learning model's architecture will have a significant effect on its tendency to implement mesa-optimization. For example, a tabular model, which independently learns the correct output for every input, will never be a mesa-optimizer. On the other hand, if a hypothetical base optimizer looks for the program with the shortest source code that solves a task, then it is more plausible that this program will itself be an optimizer.(13) However, for realistic machine learning base optimizers, it is less clear to what extent mesa-optimizers will be selected for. Thus, we discuss some factors below that might influence the likelihood of mesa-optimization one way or the other.

**Reachability.** There are many kinds of optimization algorithms that a base optimizer could implement. However, almost every training strategy currently used in machine learning uses some form of local search (such as gradient descent or even genetic algorithms). Thus, it seems plausible that the training strategy of more advanced ML systems will also fall into this category. We will call this general class of optimizers that are based on local hill-climbing *local optimization processes.*

We can then formulate a notion of *reachability,* the difficulty for the base optimizer to find any given learned algorithm, which we can analyze in the case of a local optimization process. A local optimization process might fail to find a particular learned algorithm that would perform very well on the base objective if the learned algorithm is surrounded by other algorithms that perform poorly on the base objective. For a mesa-optimizer to be produced by a local optimization process, it needs to not only perform well on the base objective, but also be *reachable;* that is, there needs to be a path through the space of learned algorithms to it that is approximately monotonically increasing. Furthermore, the degree to which the path only need be approximate—that is, the degree to which ML training procedures can escape local optima—is likely to be critical, as optimization algorithms are complex enough that it might require a significant portion of the algorithm to be present before performance gains start being realized.

**Algorithmic range.** One key factor likely to determine the reachability of mesa-optimizers is the *algorithmic range* of the learned algorithms—that is, how extensive is the set of algorithms (or how expressive is the model space) capable of being found by the base optimizer. The more extensive a model's algorithmic range, the broader the space of possible learned algorithms, and thus the more likely that it will be able to find one that is a mesa-optimizer, assuming the base optimizer is incentivized to do so. For example, architectures that explicitly give the algorithm access to a wide range of possible computations, such as recurrent neural networks or neural Turing machines,(14) seem more likely to produce mesa-optimizers.

**Inductive biases.** Another important factor is the degree to which the base optimizer is explicitly or implicitly biased in various ways. The nature of these inductive biases will contribute to the likelihood of a mesa-optimizer being selected for. One of the most important kinds of inductive bias is simplicity bias, which would almost certainly be exhibited by almost all base optimizers. We identify three ways in which simplicity bias can manifest itself:

- An explicit penalty due to parameter regularization or architectural constraints such as weight-sharing or sparse connections.
- An implicit bias due to the model architecture. For example, it has been shown that neural networks are more likely to fit a simple function to a set of training data, even when no regularization is used.(15)
- The capacity limitations of the model. The size of a model imposes a hard limit on the complexity of the functions it is able to represent. Thus, to the degree that the base optimizer is selecting based on performance, it will be driven to “squeeze out” as much performance as it can for any given model capacity, leading to a bias in favor of relatively compressed policies.

The more a base optimizer is biased towards simple solutions, the more it will be incentivized to find a compressed policy such as a mesa-optimizer.

The base optimizer could also be biased in other ways. For example, it could be biased towards algorithms with a low time or space complexity. If we view mesa-optimization as delegation of optimization work, a time penalty on the learned algorithm is likely to incentivize the base optimizer to do more precomputation itself by pre-computing relevant heuristics rather than delegating optimization work to the learned algorithm. Thus, we hypothesize that penalizing description length will favor mesa-optimizers while penalizing time complexity will disfavor them. This suggests the following speculative conjecture: neither a minimal-depth nor minimal-size boolean circuit that solves a problem can be a mesa-optimizer.(16)

Lastly, another form of bias that might have particularly interesting effects is the presence of an information funnel. In deep learning, the base optimizer directly selects the values of individual parameters. In biological evolution, by contrast, the base optimizer selects DNA strings, which in turn produce the detailed structure of the brain only indirectly. This implicitly introduces pressure towards compressing the brain's structure. As we noted previously, this might favor the development of mesa-optimizers, as they can be seen as a way of compressing complex policies. Because deep learning algorithms currently select weights directly, there is no such information funnel. In the case of meta-learning over a machine learning model's architecture, however, an information funnel might present itself due to the fact that the meta-learner has to learn a single policy for creating architectures that is capable of solving all of the problems it is presented with.

**Statefulness.** Another factor that could make it easier for a learned algorithm to implement a complex optimization process is the ability of the learned algorithm to save and recall information. The ability to refer back to previous state enables the caching of intermediate results, increasing the architecture's algorithmic range by giving the learned algorithm significantly more time to complete its computations. Statefulness, therefore, favors mesa-optimization by decreasing the implicit penalty on time complexity imposed by enforcing all computations to end when the learned algorithm produces its output. One simple form of this might be a recurrent neural network, which can perform computations with longer time horizons due to its ability to pass intermediate activations across different time steps. An even more extreme version of this might be a network that is able to store state not just across different time steps in the same episode, but also across episodes, enabling it to do more long-term learning, though such an architecture is currently completely hypothetical.

**Hard-coded optimization.** One possible means of alleviating some of these issues might be to include hard-coded optimization where the learned algorithm provides only the objective function and not the optimization algorithm. The stronger the optimization performed explicitly, the less strong the optimization performed implicitly by the learned algorithm needs to be. For example, architectures that explicitly perform optimization that is relevant for the task—such as hard-coded Monte Carlo tree search—might decrease the benefit of mesa-optimizers by reducing the need for optimization other than that which is explicitly programmed into the system.

*The third post in the Mesa-Optimization Sequence, titled “The Inner Alignment Problem,” will be released in two days.*

As of the date of this post. Note that we do examine some existing machine learning systems that we believe are close to producing mesa-optimization in post 5. ↩︎

It is worth noting that the same argument also holds for achieving an average-case guarantee. ↩︎

This definition of N is somewhat vague, as there are multiple different levels at which one can chunk an environment into instances. For example, one environment could always have the same high-level features but completely random low-level features, whereas another could have two different categories of instances that are broadly self-similar but different from each other, in which case it's unclear which has a larger N. However, one can simply imagine holding N constant for all levels but one and just considering how environment diversity changes on that level. ↩︎

Note that this makes the implicit assumption that the amount of optimization power required to find a mesa-optimizer capable of performing x bits of optimization is independent of N. The justification for this is that optimization is a general algorithm that looks the same regardless of what environment it is applied to, so the amount of optimization required to find a x-bit optimizer should be independent of the environment. ↩︎

Note, however, that there will be some maximum x simply because the learned algorithm generally only has access to so much computational power. ↩︎

Subject to the constraint that x−f(y)≥0. ↩︎

Discuss

### The Fundamental Theorem of Asset Pricing: Missing Link of the Dutch Book Arguments

*Assumed background: Acyclic preferences, Dutch Book theorems*

There are fairly elementary arguments that, in the absence of uncertainty, any preferences not described by a utility function are problematic - this is the circular preferences argument. There are also fairly elementary arguments that, *if* we handle uncertainty by taking weighted sums of utilities of different outcomes, *then* the weights should follow the usual rules of probability - these are the Dutch Book arguments. But in the middle there’s a jump: we need to assume that taking weighted sums of utilities makes sense for some reason. There are some high-powered theorems which make that jump (specifically the complete class theorem), but they’re not very mathematically accessible.

(If any of that sounds new, you should read __Yudkowsky’s excellent intro to this stuff__ before reading this post.)

It turns out that there *is* a relatively simple theorem which bridges the gap between deterministic utility and Dutch Book arguments. But rather than hanging out in decision theory textbooks, it’s been living it up in finance. It’s called the Fundamental Theorem of Asset Pricing (FTAP).

Here’s the setup. Just like the Dutch Book arguments, we have a bunch of tradable assets - i.e. betting contracts, like stock options or horse race bets. We have a bunch of possible outcomes - i.e. possible prices of an underlying stock at expiry, or possible winners of the horse race. Each asset's final value will depend on the outcome. Then the FTAP states that either:

- There exists some portfolio of assets which costs $0 to buy (can include short sales) and is guaranteed a positive payout (i.e. arbitrage), or
- There exists a probability distribution such that the price of each asset is the expected value of its payout (i.e. price is a weighted sum of possible outcome values).

Note that this is exactly what we need to round out the Dutch Book arguments: either there exists an arbitrage opportunity, or we compare assets using a weighted sum of possible outcome values.

Let’s prove it. First, we’ll name some variables:

- .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} Vij: a big matrix which contains the value of each asset i under each possible outcome j.
- Si: current price of asset i (we need P for probability, so S represents price).
- pj: probability distribution over outcomes j (which may or may not exist)
- qi: arbitrage portfolio (which may or may not exist)

FTAP says that either:

- Arbitrage portfolio exists: profit 0">∑iqiVij>0 for all outcomes j, and the portfolio currently costs ∑iqiSi=0.
- Probability distribution exists: Si=∑jVijpj

I’ll state the proof informally - if you know a little linear algebra, it’s easy but tedious to formalize and see that it works. The key question is: how many assets, and how many possible outcomes? With N assets and M outcomes, our arbitrage condition has N variables (the q’s) and M+1 equations (one for each outcome plus the current cost constraint). Conversely, our probability distribution condition has M variables (the p’s) and N equations. We generally expect the system to be solvable when the number of variables is at least as large as the number of equations. So, either:

- N > M (more assets than outcomes), and the arbitrage system (typically) has a solution, or
- M >= N (at least as many outcomes as assets), and the probability system (typically) has a solution.

I’m brushing some stuff under the rug here - i.e. maybe there are more assets than outcomes, but the prices line up perfectly. That’s where the linear algebra comes in - the above works for full-rank V, but rank-deficient V requires checking the usual corner cases. If you take a math finance class, you’ll probably go through that tedium in its full glory, along with some more interesting extensions of the theorem.

Anyway, what have we shown? We actually haven’t established that the “probability distribution” p_j is a probability distribution - we’ve shown that the prices are described by *some* weighted sum of outcome values, but the weights could still be negative or not sum to 1. That’s fine - the usual Dutch Book arguments show that the weights are a probability distribution (or else there’s an arbitrage opportunity). We’ve bridged the gap.

All the usual considerations of the Dutch Book theorems still apply. “Arbitrage” means exactly the same thing here that it means in the Dutch Book theorems. As usual, we’re formulating things with “bets” and “contracts” and “arbitrage” and “prices”, but that can model a much wider range of phenomena.

One interesting point: the probability distribution may not be unique. There may be more than one possible distribution which satisfies the conditions. This works fine with the Dutch Book arguments: each possible distribution corresponds to a different prior.

Discuss

### I translated 'Twelve Virtues of Rationality' into Hebrew.

Here it is - if you know Hebrew and have feedback, do give it, either in the comments here or in the document (it's not fully edited).

I don't really know what am i supposed to do with it now, though. where should i put it? can/should i put it on my future website? (of course i don't own the translation anymore than the original piece)

Regarding the translation, the thing I'm most uncertain of is (ironically) which word to use for 'virtue', if you have a better idea, please share.

Discuss

### Open Thread June 2019

If it’s worth saying, but not worth its own post, you can put it here.

Also, if you are new to LessWrong and want to introduce yourself, this is the place to do it. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome. If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, and seeing if there are any meetups in your area.

The Open Thread sequence is here.

Discuss

### "But It Doesn't Matter"

If you ever find yourself saying, "Even if Hypothesis *H* is true, it doesn't have any decision-relevant implications," *you are rationalizing!* The fact that *H* is interesting enough for you to be considering the question at all (it's not some arbitrary trivium like the 1923th binary digit of π, or the low temperature in São Paulo on September 17, 1978) means that it must have some relevance to the things you care about. It is *vanishingly improbable* that your optimal decisions are going to be the *same* in worlds where *H* is true and worlds where *H* is false. The fact that you're tempted to *say* they're the same is probably because some part of you is afraid of some of the imagined consequences of *H* being true. But *H* is already true or already false! If you happen to live in a world where *H* is true, and you make decisions as if you lived in a world where *H* is false, you are thereby missing out on all the extra utility you would get if you made the *H*-optimal decisions instead! If you can figure out exactly what you're afraid of, maybe that will help you work out what the *H*-optimal decisions are. Then you'll be a better position to successfully notice which world you *actually* live in.

Discuss

### Feedback Requested! Draft of a New About/Welcome Page for LessWrong

**Context for Draft / Request for Feedback**

The LessWrong team is hoping to soon display a new About/Welcome page which does an improved job of conveying what LessWrong.com is about and how community members can productively use the site.

However, LessWrong *is *a *community* site and I (plus the team) feel it's not appropriately for us to unilaterally declare *what LessWrong is about*. **So here's our in-progress draft of a new About/Welcome page. Please let us know what you think in the comments. Please especially let us know if you think LessWrong is actually about something else.** Or even just what it means to you.

Thanks!

<3 Ruby

---------------------------------------------------------------------------------

Related:

The tl;drLessWrong is a community blog devoted to the art of human rationality.

We invite you to use this site for any number of reasons, including, but not limited to: learning valuable things, being entertained, sharing and getting feedback on your ideas, and participating in a community you like. *However*, fundamentally, this site is designed for two main uses:

**As a place to level-up your***rationality***As a place to apply your rationality to important real-world problems**

Primary things to do on LessWrong are:

- Read LessWrong’s repository of rationality materials
- Join a local rationality meetup
- Join in a discussion
- Ask or answer a question
- Write a post

*Rationality *is a term which can have different connotations to different people. On LessWrong, we mean something like the following:

- Rationality is thinking in ways which systematically arrive at truth.
- Rationality is thinking in ways which cause you to achieve your goals.
- Rationality is trying to do better on purpose.
- Rationality is reasoning well even in the face of massive uncertainty.
- Rationality is making good decisions even when it’s hard.
- Rationality is being self-aware, understanding how your own mind works, and applying this knowledge to thinking better.

What rationality is *not*:

- Forsaking all human emotion and intuition to embrace Cold Hard Logic.

One reason to care about rationality is because you intrinsically care about having true beliefs. You might also care about rationality because you *care about anything at all*. Our ability to achieve our goals depends on 1) our ability to ability to understand and predict the world, 2) having the skills to make good plans, and 3) having the self-knowledge and self-mastery to avoid falling into common pitfalls of human thinking. These are core topics in rationality are of interest to anyone with non-trivial goals, from curing their persistent insomnia and having fulfilling relationships to performing groundbreaking research or curing the world’s greatest ills.

See also Why truth? And...

How does LessWrong help me level up my rationality?**A repository of rationality knowledge**

**LessWrong has an extensive Library containing hundreds of essays on rationality topics.** **You can get started on the Library page or from the homepage. Among the newer material, we particularly recommend Curated posts.**

The writings of Eliezer Yudkowsky and Scott Alexander comprise the core readings of LessWrong. As part of the founding of LessWrong, Eliezer Yudkowsky wrote a long series of blog posts, originally known as *The Sequences *and more recently compiled into an edited volume, *Rationality: AI to Zombies. *

Rationality: From AI to Zombies is a deep exploration of how humans minds can come to understand the world they exist in - and all reasons they so often fail to do so. The comprehensive work:

- lays foundational conceptions of belief, evidence, and understanding
- reviews the systematic biases and common excuses which cause us to believe false things
- offers guidance on how to change our minds and how to use language effectively to describe the world
- depicts the nature of human psychology with reference to how evolution produced us
- clarifies the kind of morality humans like us can have in a reducible, physical world
- and repeatedly reminds us that confusion and mystery exist only in our minds.

Eliezer covers these topics and many more through allegory, anecdote, and scientific theory. He tests these ideas by applying them to debates in artificial intelligence (AI), physics, metaethics, and consciousness.

Eliezer also wrote *Harry Potter and the Methods of Rationality* (HPMOR), an alternative universe version of Harry Potter where Harry’s adoptive parents raised with Enlightenment ideals and the experimental spirits. This work introduces many of the ideas from Rationality: A-Z in a gripping narrative.

Scott Alexander’s essays on how good reasoning works, how to learn from the institution of science, and the different ways society has been and could be organized have been made into a collection called *T**he* *Codex**. *The Codex contains such exemplary essays as:

- Beware Isolated Demands for Rigor
- The noncentral fallacy - the worst argument in the world?
- The Categories Were Made For Man, Not Man For The Categories
- I Can Tolerate Anything Except the Outgroup

Members on LessWrong rely on many of the ideas from their writers in their own posts, and so it's advised to read at least a little of these authors to get up to speed on LessWrong's background knowledge and culture.

**Truth-seeking norms and culture**

We are proud of the LessWrong community not just for its study of rationality, but also for how much these ideals and skills are put into practice. Unlike many social spaces on the modern Internet, LessWrong is a place where changing your mind, charitability, scholarship, and many other virtues are cherished. LessWrong helps you improve you rationality by providing a space where healthy epistemic and conversational norms are encouraged and enforced.

**Social support and reinforcement**

Beyond culture and norms, it’s easier to learn, change, and grow when you’re not alone on your path. Find solidarity on your quest for greater rationality with the LessWrong community. You can participate in the conversations online (via the comments or writing posts which build on the posts of others). Or attend a local in-person meetup, conference, or community celebration. In the last twelve months, there have been 461 meetups in 32 countries.

**Opportunities to practice your rationality.**

*See the next section*.

Feedback and practice are crucial for mastery of skills. If you’re not using your skills to do anything real, how do you even know whether you’re on the right track? For this reason, LessWrong is a place where rationality is both trained and put to use.

Plus, it’s nice to accomplish real things.

Ways to apply your rationality on LessWrong**Participate in discussions aimed at truth-seeking and self-improvement**

On LessWrong, you can converse with others with the real goal of exchanging beliefs and converging on the truth. You can delight in dialog which isn’t about Being Right, but actually in clarifying the matter at hand. And you can work together with others, each of you providing your own understanding and background knowledge to figure out how reality really is. This is not Internet discussion as you know it.

While rationality, self-improvement, and AI are the most frequently discussed topics on the site, there are also commonly discussions of self-improvement, psychology, philosophy, decision theory, mathematics, computer science, physics, biology, history, sociology, meditation, and many other topics.

Core to LessWrong is that we want our online conversations to be productive, constructive, and oriented around determining what is true. Our Frontpage commenting guidelines ask members to:

**Aim to explain, not persuade.**Write your true reasons for believing something, not what you think is most likely to persuade others. Try to offer concrete models, make predictions, and note what would change your mind.

**Present your own perspective.**Make personal statements instead of statements that try to represent a group consensus (“I think X is wrong” vs. “X is generally frowned upon”). Avoid stereotypical arguments that will cause others to round you off to someone else they’ve encountered before. Tell people how

**you**think about a topic, instead of repeating someone else’s arguments (e.g. “But Nick Bostrom says…”).

**Get curious.**If I disagree with someone, what might they be thinking; what are the moving parts of their beliefs? What model do I think they are running? Ask yourself - what about this topic do I not understand? What evidence could I get, or what evidence do I already have?

Once you’ve read some of LessWrong’s core material and read through some past comment-section discussions to get a sense of how we communicate around here, you’re ready to participate in a LessWrong discussion.

**Post your valuable ideas**

Our collective knowledge and skills are solidified by members writing posts. By writing posts, you benefit the world by sharing your knowledge and benefit yourself by getting feedback from an audience. Our audience will hold you to high standards of reasoning, yet in a cooperative and encouraging manner.

Posts on practically any topic are welcomed. We think it's important that members can “bring their entire selves” to LessWrong and are able to share their thoughts, ideas, and experiences without fearing whether they are “on topic”. Rationality is not restricted to only specific domains in one’s life, and neither should LessWrong be.

However, to maintain its overall focus, LessWrong classifies posts as either * Personal blogposts* or as

*. The latter have more visibility by default on the site.*

**Frontpage posts**All posts begin as personal blogposts. Authors can grant permission to LessWrong’s moderation team to give a post *Frontpage status* if it i) has broad relevance to LessWrong’s members, ii) is timeless, i.e. not tied to current events, and iii) primarily attempts to explain rather than persuade.

The not-perfectly-named category of “Personal” blogposts are suitable for everything which doesn't fit in Frontpage. It’s the right classification for discussions of niche topics, personal interests, current events, community concerns, potentially divisive topics, and just about anything else you want to write about.

See more in *Site Guide: Personal Blogposts vs Frontpage Posts*

**Contribution on LessWrong’s Open Questions research platform.**

Open Questions was built to help apply the LessWrong community’s rationality/epistemic to humanity’s most important problems.

Discuss

### A Brief History of LessWrong

In 2006, __Eliezer Yudkowsky__, __Robin Hanson__, and others began writing on __ Overcoming Bias__, a group blog with the general theme of how to move one’s beliefs closer to reality despite biases such as overconfidence and wishful thinking. In 2009, after the topics drifted more widely, Eliezer moved to a new community blog

*, LessWrong*.

LessWrong was seeded with series of daily blog posts written by Eliezer, originally known as *The Sequences*, and more recently compiled into an edited volume, * Rationality: A-Z. *These writings attracted a large community of readers and writers interested in the art of human rationality.

In 2015-2016 the site underwent a steady decline of activity leading some to declare the site dead. In 2017, a team led by Oliver Habryka took over the administration and development of the site, relaunching it on an __entirely new codebase__ later that year.

The new project, dubbed LessWrong 2.0, was the first time LessWrong had a full-time dedicated development team behind it instead of only volunteer hours. Site activity recovered from the 2015-2016 decline and __has remained at steady levels__ since the launch.

The team behind LessWrong 2.0 has ambitions not limited to maintaining the original LessWrong community blog and forum. The LessWrong 2.0 team conceives of itself more broadly as an organization attempting to build community, culture, and technology which will drive intellectual progress on the world’s most pressing problems.

Discuss

### The LessWrong Team (page under construction)

Core TeamOliver Habryka / HabrykaOliver Habryka is the current project lead for LessWrong.com, where he tries to build infrastructure for making intellectual progress on global catastrophic risks, cause prioritization and the art of rationality. He used to work at the Centre for Effective Altruism US as strategic director, ran the EA Global conferences for 2015 and 2016 and is an instructor for the Center for Applied Rationality. He has generally been involved with community organizing for the Effective Altruism and Rationality communities in a large variety of ways. He studied Computer Science and Mathematics at UC Berkeley, and his primary interests are centered around understanding how to develop communities and systems that can make scalable progress on difficult philosophical and scientific problems.Ben P / Benito

<Ben has not yet filled out a bio.>

Raymond Arnold / RaemonI've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.I guess also I code? I worked at Spotify. Now I don’t.James Babcock / jimrandomh< Jim has not yet written a bio.>

Ruben Bloom / RubyI studied electrical engineering and philosophy before working as a Data Scientist and Product Manager. At LessWrong, I analyze data, tests features, talks to users, and writes many lengthy internal and external documents. In years gone by, I was organizer for the LessWrong and EA communities in Melbourne, Australia. I also like to boast that I’m of only two people who has volunteered for CFAR in three continents (the other is my wife). I have strong interests in emotions and “planning”, both about which I’ve written on LessWrong. Admin (BDFL)Matthew Graves / VaniverRaemon:

Once upon a time, LessWrong almost died. There were numerous half-hearted attempts to revitalize the community. Eventually someone noticed that part of the problem was there was no particular person who actually had the mandate to make sweeping changes. Someone said “I vote for Vaniver” and then a bunch of people said “me too!” and in a highly unsuspect, democratic process, Vaniver became king.Nowadays Vaniver is the meta-king, and his gentle authority flows through us.Discuss

### Site Guide: Personal Blogposts vs Frontpage Posts

Posts on practically any topic are welcomed on LessWrong [1]. I (and others on the team) feel it is important that members are able to “bring their entire selves” to LessWrong and are able to share all their thoughts, ideas, and experiences without fearing whether they are “on topic” for LessWrong. Rationality is not restricted to only specific domains of one’s life and neither should LessWrong be.

**However, to maintain its overall focus while still allowing posts on any topic, LessWrong classifies posts as either Personal blogposts or as Frontpage posts.**

- Is the default classification for all posts.
- Are not displayed by default on the homepage.
- Can be on any topic [1] and in any format: nothing is “off topic”
- Suitable for personal interests, blogging, and general ramblings
- e.g. your thoughts on Magic the Gathering, a poem, or a short story you wrote
- Suitable for discussion of current events
- Suitable for discussion of specific social and community issues
- Suitable for discussion of highly divisive topics

- Are displayed by default to all users.
- Authors can allow moderators to give their post Frontpage status if the moderator judges the post to be:
- Useful, novel, and relevant to many LessWrong members
- “Timeless”, i.e. minimizes references to current events and is likely to remain useful even after a few years
- The post attempts to explain rather than persuade

This system allows LessWrong members to write about whatever is of interest to them while ensuring that only members who wish to see “off topic” content see that content.

How to view personal blogpostsPersonal blogposts, with their laxer restrictions, are not shown by default on LessWrong’s homepage. To view Personal blogposts, you can:

- Click the “Include Personal blogposts” checkbox beneath the
*Latest Posts*section. - Visit the
__All Posts__page and ensure "Filtered by" is set to “All Posts” - Find a user’s Personal blogposts by visiting their user profile page
- Personal blogposts and their comments appear in the Recent Discussion feed on the homepage.

Personal blogposts and the Recent Discussion Feed

In some cases, the moderation team will hide comments on Personal blogposts from the Recent Discussion feed on the homepage. This is done if the moderation teams feels a discussion is veering in directions which are particularly controversial, political, or unproductive.

I (and I believe the rest of team) are not opposed to such discussions per se, but believe that we shouldn’t be drawing marginal attention to these discussions. In particular, the team does not think it's ideal for newcomers to encounter these discussions when first exploring LessWrong.

What does this mean for me?Our classification system means that anyone can decide to use the LessWrong platform for their own personal blog and write about whichever topics take their interest. All of your posts and comments are visible under your user page which you can be treated as your personal blog hosted LessWrong [2] Other users can subscribe to your account and be notified whenever you post.

[1] We will remove material of the following types:

- Calls for direct violence against others
- Doxing of people on the internet
- Material we are not legally able to host
- To a very limited degree, material that seriously threatens LessWrong’s long-term values, mission and culture.

[2] In the future we might add various user-page/personal blog customization features like custom backgrounds, curating which posts and comments are shown first, etc.

Discuss

### When Observation Beats Experiment

Suppose we have a strain of lab rats which are colored purple, and we want to know why. We suspect that chemical X is responsible, so we run an experiment:

- We genetically modify our purple rats to repress X production, and find that their purple coloration disappears.
- We genetically modify ordinary rats to produce X, and find that their coats turn purple.

We conclude that chemical X is both necessary and sufficient to turn rats’ coats purple. Case closed!

… or maybe not.

Suppose that rats are purple-colored if-and-only-if they express Purple Pigment (PP) above some threshold level. Purple Pigment, in turn, is chemically produced from X and Y:

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} X+Y⇌PP

High levels of PP could result from high levels of X, or from high levels of Y. Either way, increasing X enough will always turn a rat purple, and decreasing X enough will always turn a rat not-purple. So our experiment doesn’t tell us whether our particular rats are purple due to high X or high Y - it could be either. In order to tell the difference, we need to go measure X and Y levels in our rats - not an experiment, but an observation.

(Warning: technical details not relevant to the main point were brushed under the rug there.)

Generalizing: experiments are really good for figuring out the *structure* of the underlying causal graph. How can we tell that Purple Pigment is produced from X and Y in the first place? Experiment: we try various levels of X and Y and see which rats are purple.

But if we want to know the *state* of the causal graph, in some real-world system, then observation beats experiment. To find out whether our particular rats are purple because of high X or high Y, we should measure their X and Y levels, without any experimental intervention. Of course, this only works if we’ve already done the experiments to figure out the structure of the system.

Discuss

### How to determine if my sympathetic or my parasympathetic nervous system is currently dominant?

I've been reading CFAR Handbook's chapter about Againstness. The chapter's idea is that when your sympathetic nervous system (SNS) is dominant over parasympathetic nervous system (PSNS), your introspection is impaired and you tend to be less rational, hence you should learn how to know which system is currently dominant and also learn to switch to PSNS dominance.

They provide the following table of how your body, mind, and behaviour change depending on which system is dominant

In order to learn to know where you are on SNS-PSNS spectrum they recommend observing yourself in different situations, determine which system is dominant, and try to find patterns (e.g., maybe after doing physical exercise your SNS is usually dominant).

Well, I tried observing myself and figuring out which nervous system is active, and about 70% of the times I can't determine it. I ask myself questions like "Do I feel joy?", "Am I at peace?", "Are my muscles tense?", "Does my skin feel like flushed?", "Is my breathing shallow, or slow and in belly?". More often than not I can't answer these questions. And it doesn't help that there's no easy way to calibrate by observing the ground truth.

If there are people here who can determine which of your two nervous systems is dominant, how do you do it? Any tips for me?

Also, sometimes I get signals from my body that don't fit in this framework. When it's time for me to go to sleep but I don't do it, after a while my awareness and introspective clarity become impaired, I feel both energized and lacking energy at the same time, I fill like my system 2 (as in Kahneman's two systems) doesn't turn on a lot, and I get impatient. What happens with SNS and PSNS at this time? I don't know.

Discuss

### What is the best online community for questions about AI capabilities?

I'm interested in learning more about existing ML model capabilities, and in particular how easily a model that is used on one task could be used for another, similar task.

For example, if I want to know and/or estimate whether an existing AI system like AlphaZero could beat Angry Birds with current capabilities, what's the best place to post that type question?

Discuss

### Egoism In Disguise

Originally posted at Living Within Reason

Epistemic status: moderately certain, but open to being convinced otherwise

tl;dr: any ethical system that relies on ethical intuitions is just egoism that's given a veneer of objectivity.

Utilitarianism Relies on Moral IntuitionsMost rationalists are utilitarians, so much so that most rationalist writing assumes a utilitarian outlook. In a utilitarian system, whatever is "good" is what maximizes utility. Utility, technically, can be defined as anything, but most utilitarians attempt to maximize the well-being of humans and, to some extent, animals.

I am not a utilitarian. I am an egoist. I believe that the only moral duty that we have is to act in our own self-interest (though generally, it is in our self-interest to act in prosocial ways most of the time). I feel a certain alienation from a lot of rationalist writing because of this difference. However, I have long suspected that most utilitarian thinking is largely the same thing as egoism.

Recently, Ozy of Thing of Things wrote a post that illustrates this point well. Like a lot of rationalist writing, this is addressing an ethical dilemma from a utilitarian framework. Ozy is trying to decide what creatures have a right to life, specifically considering humanely-raised animals, human fetuses, and human babies. From the post:

Imagine that, among very wealthy people, there is a new fad for eating babies. Out baby farmer is an ethical person and he wants to make sure that his babies are farmed as ethically as possible. The babies are produced through artificial wombs; there are no adults who are particularly invested in the babies’ continued life. The babies are slaughtered at one month, well before they have long-term plans and preferences that are thwarted by death. In their one month of life, the babies have the happiest possible baby life: they are picked up immediately whenever they cry, they get lots of delicious milk, they’re held and rocked and sung to, their medical concerns are treated quickly, and they don’t ever have to sit in a poopy diaper. In every way, they live as happy and flourishing a life as a two-week-old baby can. Is the baby farm unethical?If you’re like me, the answer is a quick “yes.”

Ozy's main evidence for their conclusion is specifically stated to be their moral intuition, resting on the idea that "I am horrified by the idea of a baby farm. I am not horrified by the idea of a beef cow farm." Ozy goes on to examine this intuition, weighs it against other moral intuitions, and ultimately concludes that it is correct.

This is not surprising given that the ultimate authority for any consequentialist system is the individual's moral intuitions (see Part 1). In a utilitarian system, moral intuitions "are the only reason you believe morality exists at all. They are also the standards by which you judge all moral philosophies." People have many different moral intuitions, and must weigh them against one another when it comes to difficult ethical questions, but at bedrock, moral intuitions are the basis for the entire ethical system.

Moral Intuitions Are Subjective PreferencesFrom the previously-linked FAQ:

Moral intuitions are people's basic ideas about morality. Some of them are hard-coded into the design of the human brain. Others are learned at a young age. They manifest as beliefs (“Hurting another person is wrong"), emotions (such as feeling sad whenever I see an innocent person get hurt) and actions (such as trying to avoid hurting another person.)Notice that nothing in this explanation appeals to anything objective. Arguably, "hard-coded into the design of the human brain" could be seen an objective, but it is also trivial. If I do not share a specific intuition, then tautologically it is not hard-coded into my brain so it cannot be used to resolve a difference of opinion.

Under a egoist worldview, there are still ethics, but they are based on self-interest. What is "good" is merely what I prefer. Human flourishing is good because the idea of human flourishing makes me smile. Kicking puppies is bad because it upsets me. These are not moral rules that can bind anyone else. They are merely my preferences, and to the extent that I want others to conform to my preferences, I must convince or coerce them.

The egoist outlook is entirely consistent with the utilitarian one. Consider the above paragraph, but rewritten to emphasize the subjectivity:

[My] moral intuitions are [my preferences for how the world should be]. Some of them are hard-coded into the design of [my] brain. Others are learned at a young age. They manifest as beliefs (“Hurting another person is wrong"), emotions (such as feeling sad whenever I see an innocent person get hurt) and actions (such as trying to avoid hurting another person.)The language is changed, but the basic idea is the same. It emphasizes that my moral rules are based entirely on what appeals to me. At its heart, any system that relies on moral intuitions is indistinguishable from egoism.

Why Does This Matter?In a sense, my conclusion here is rather trivial. Who cares if utilitarian ethics and egoism are largely the same thing? As an egoist, shouldn't I be happy about this and encourage more people to be utilitarians?

The reason why I would prefer that more people explicitly acknowledge the egoist foundations of their moral theory is that I believe moral judgment of others does great harm to our society. Utilitarianism dresses itself up as objective, and therefore leaves room to decide that other people have moral obligations, and that we are free (or even obligated) to judge and/or punish them for their moral failings.

Moral judgment of others makes us unlikely to accept that nobody deserves to suffer. If someone behaves immorally, we often feel that it is "justice" to punish that person regardless of the practical effects of the punishments. It leads to outrage culture and is a major impediment to adopting an evidence-based criminal justice system.

If we’re insisting on punishing someone for reasons other than trying to influence (their or others’) future behavior, we are not making the world a better place. We are just being cruel. Nobody deserves to suffer. Even the worse people in the world are just acting according to their brain wiring. By all means, we should punish bad behavior, but we should do it in a way that’s calculated to influence*future*behavior. We should recognize that, if we truly lived in a just world, everyone, even the worst of us, would have everything they want.

If, instead, we acknowledge that our moral beliefs are merely preferences for how we would like the world to work, we will inflict less useless suffering. If we acknowledge that attempting to force our morality on someone else is inherently coercive, we will use it only in circumstances where we feel that coercion is justified. We will stop punishing people based on the idea of retribution and can instead adopt an evidence-based system that only punishes people if the punishments are reasonably likely to create better future outcomes.

I have a preference for less suffering in the world. If you share that preference, consider adopting an explicitly egoist morality and encouraging others with similar preferences to do the same. We will never tame our most barbaric impulses unless we abandon the idea that we are able to morally judge others.

Discuss

### Stories of Continuous Deception

In my recent posts, I considered scenarios where an AI realizes that it would be instrumentally useful to deceive humans (about its alignment or capabilities) when weak, then undertake a treacherous turn when humans are no longer a threat. Those scenarios have the following (implicit) assumptions:

- i) We're considering a seed AI able to
**recursively self-improve**without human intervention. - ii) There is some
**discontinuity**at the*conception of deception*, i.e. when it first thinks of its treacherous turn plan.

This discontinuity could be followed by a **moment of vulnerability** where it isn't really good at concealing its intentions (humans could detect its misalignment). Thus, according to the sordid stumble view, it would "behave in a way that reveals its human-undesirable values to humans before it gains the capability to deceive humans into believing that it has human-desirable values".

In this post, I'll present gradual deception stories where, even without assumptions i) and ii), the AI continuously learns to deceive human, hence constituting counterexamples for the sordid stumble view.

The Unbiased Newsfeed is Biased Towards YouHumans are biased towards story closer to their beliefs, as they estimate that those are more likely to be true. Now, let's imagine a Machine Learning model with the goal of "aggregating stories into an unbiased newsfeed for a human H", the human providing a bias score for each story.

By doing so, the human is unfortunately specifying "try to sound unbiased, taking into account my prejudice against stories with high inferential distance".

At the beginning, the AI doesn't really know what constitutes an unbiased newsfeed, so its bias score is high. At some point, it stumbles upon a story that appears unbiased but is actually biased toward H, and updates its parameters in this direction.

After multiple steps, the model mostly outputs news that look unbiased but are biased toward H. It is now *deceiving* the human into the trustworthiness of its newsfeed, without even realizing it.

At this point, another phenomenon happens: because H is constantly receiving news from the model, its beliefs start shifting toward the average opinion of the news produced by the model.

If the same newsfeed model is now serving millions of people, and for each individual it tries to minimize a bias score, then it is likely to stumble upon humans that are more susceptible to be deceived. Finally, if it generalizes correctly, it will end up deceiving most people.

Dumbing Down Complex PlansLet's consider an AI built to explain plans. Similarly, the model is rewarded by a numerical human feedback, estimate of how much the human understood from the explanation.

Now, let's also assume that AIs that turn out to be smart too fast are under more surveillance. In that case, the AI might end up dumbing out its explanations because either:

- the surviving AIs are the one that avoid scrutiny (survival bias).
- the surveillance code is automatically triggered when the AI appears smart, interrupting other processes, so it would be too costly for the AI (in terms of compute time) to appear smart.

Note that the AI is not *pretending* to be dumb. We're talking about mostly simple machines anyway. The "dumb" is our external judgment of what it's doing.

Deception can happen when:

- (i) there is a misalignment between two agents.
- (ii) even though this is not agent's B intention, agent A is rewarded by agent B, in practice, for misdirection/deception.

For instance, deception naturally happens when a couple is tired of arguing: one partner starts presenting his/her decisions as being an original idea from the other.

More generally, an oracle AI aimed at shaping human decision making would be incentivized to present the decisions as being "close to what the human querying the oracle would guess".

Indeed, even if at the beginning the AI tries to output the most accurate answers, it will end up (after human feedback) finding that the answers that give the most reward are the one that make the human believe "it was close to my original guess after all".

Discuss

### Lonelinesses

Cross-posted from Putanumonit.

In recent weeks I found myself experiencing a profound loneliness. I became curious about this feeling, and I tried to examine it. Though not often, I’ve certainly felt lonely before and yet this time felt new to me. I wondered if there are different flavors of loneliness that we lack the vocabulary to categorize and understand.

I also found myself rereading bits of Greek mythology. It struck me that *loneliness* is a core theme the ancient Hellenic worldview: the gods are islands unto themselves, while the connections that mortals make are often illusory and bound for betrayal. The heroes of myth have spouses and lovers, parents and children, friends and patron gods – and yet each person has a fate that is solely their own. Even in death, the souls of Greeks wander Hades as lonely shades, in sharp contrast to the cosmic unification promised by monotheistic traditions to the dead.

I think this is also why the ancient Greeks glorified war – death among comrades in battle is the only way for a man not to die alone. Agamemnon, the leader of the Greeks at Troy, survived the war only to be killed upon his return by his wife Clytemnestra’s new lover. When he meets Odysseus in Hades, Agamemnon wishes bitterly that he had died in combat and his only advice to the Ithakan is to trust no one and keep his own counsel.

All this inspired me to compile the following taxonomy of lonelinesses. It is not intended to capture every facet of the emotion, nor to provide a thorough analysis of each myth. I hope that this post inspires you to observe your own loneliness, perhaps seeing that it is shared by me and by others. If instead, this post inspires you instead to nitpick the details of ancient stories on a stranger’s blog, ask yourself if that is the way to connect with another human being.

Narcissan Loneliness*Echo and Narcissus, by John William Waterhouse*

Is Narcissus lonely? If loneliness comes from missing human contact, Narcissus does not know what he is missing. He has never had a relationship and never thought that he had. Narcissus is alone because he does not need others, except to support his self-image. His suitors don’t truly need him either, they just lust for his body.

And yet, the most devout solipsist must yearn sometimes to share his experience with others, even if it’s just the experience of self-admiration. In the water, Narcissus sees a man who is perfect in all regards – except that he is alone.

Narcissan loneliness is the endless pop songs whose refrain is: *I’m better of alone, I never needed you anyway*. It is meant to sound defiant, but often sounds more of a consolation.

*Medusa, by Peter Paul Rubens*

Medusan loneliness is self-imposed, born of the fear that you will hurt those who approach you. It is the loneliness of the archetypal cat lady who is reassured that the animals benefit by her presence in a way that a person never will.

It is the loneliness of the exile, one who has drifted so far from the collective human consciousness that he can never rejoin it. Medusan loneliness finds solace in saying: *They’re better off without me. *It may be easier to believe that than to consider that you may be wrong.

*Prometheus, by Theodoor Rombouts*

What does Prometheus feel when rosy dawn reveals the approaching silhouette of the eagle? Liver-pecking is no picnic (at least, not for Prometheus). But what’s the alternative, being chained all alone on a cold mountain without even a bird for company?

The fear of being alone can be so great that people welcome any abusive or toxic relationship, as long as it’s predictable. Prometheus’ eagle (how else would you refer to the bird?) is nothing if not predictable.

If Prometheus hasn’t forgotten that he used to have better friends than the eagle, that memory hurts more than the pecking.

Cassandran Loneliness*Cassandra, by Anthony Frederick Sandys*

Cassandra knows what’s up, but no one knows what’s up with her. Everything she says is treated like the ramblings of a madwoman, making connection impossible. Every participant in a relationship must both give and receive, and Cassandra is always doomed to one-way contact.

Cassandran is the loneliness of a just-deconverted atheist at prayer, a sudden skeptic in a mob, a genius not appreciated in his lifetime, a conspiracy nut who thinks he’s an unappreciated genius. It is a scary and frustrating loneliness, especially when in the moments when you think: *Is it them? Or is it me?*

*Penelope and the Suitors, by John William Waterhouse*

Penelope is surrounded by suitors who want nothing more than to relieve her loneliness, but she rejects them all for an idealized relationship that is not really there.

Penelope is supposed to be the Odyssey’s Mary Sue, a wise and loyal wife in contrast to the treacherous Clytemnestra. But when I read the Odyssey I do not find her very admirable. After two decades of absence, Odysseus is likely dead, possibly cavorting with goddesses, and certainly a very different man than the one whom Penelope married. She rejects her suitors not so much for her husband as for the memory and fantasy of him, even though this rejection endangers her son and impoverishes her land.

Penelopean is the loneliness of celebrities who are surrounded by adoring fans but are unable to relate to any of them. It is the loneliness of those who forsake real relationships to pine for the one-who-got-away or the one-who-must-surely-soon-come. Married people experience Penelopean loneliness when they long for their mundane marriages to be supplanted by fantastical affairs.

Penelope’s loneliness resolved in a happy ending. It rarely does.

Atlas’ Loneliness*Atlas and the Hesperides, by John Singer Sargent*

Atlas is not alone. He has a brother he goes to war with, his daughters to keep him company, and great heroes who visit him. And yet, Atlas’ is the existential loneliness at the heart of many Greek myths. It is the loneliness that comes from realizing that for all your lovers, friends, and family, the central burden of your life is yours to bear alone.

Atlas’ loneliness is the awareness that there are gulfs between separate consciousnesses that cannot be bridged; no matter how many times you read this sentence you will never experience exactly what another reader does, or what I do while writing it. It is the loneliness of knowing that no other person can be trusted absolutely – not because there aren’t trustworthy people, but because each is their own person with their own burden to bear.

Atlas’ loneliness is when you lose the belief in soulmates and grieve that loss.

Atlas’ loneliness is painful, but I think it has led me to understanding. I think now that the people who love you the most can’t always make you feel better. I think that ceding the responsibility for your happiness to another person is deeply unfair to them. In the grip of Atlas’ loneliness, I felt the tug of nihilism, but I think I ended up somewhere else.

Sisyphean Loneliness*Sisyphus, by Franz Ritter von Stuck*

No other character in Greek myth is as alone as Sisyphus with a mountain all to himself in the underworld. Even Hades and Persephone do not visit, afraid of Sisyphus notorious trickery. And yet, I agree with Camus wholeheartedly: Sisyphus is surely happy. He has accepted his solitude, and he has a rock to roll.

Sisyphus used to rule a city, seduce princesses, befriend and feud with gods. Does he not miss all those he knew? I think not. Sisyphus’ treks down the mountain give him ample and regular spans to reflect. He recognizes that only the present moment is real in his experience. When he was king, Sisyphus reflects, he did not grieve the absence of his wife when she went momentarily to the other room. And yet in that moment, she was as absent from him as she is now that he’s the rock star of the underworld. Sisyphus is grateful for all his relationships, that they took place when they did and ended when they did.

Sisyphus is not self-deluded like Narcissus or self-abnegating like Medusa. He does not have to suffer Prometheus’ eagle or Cassandra’s frustration. He is free of Penelope’s desperate hope and of Atlas’ grief. Sisyphus is lonely, and it’s OK.

Discuss

### How would you advise a peer-supported virtues-oriented self-help group?

The "virtues" are the classical name for those habitual characteristics of acting and thinking that tend toward a flourishing, beneficial life. An influential theory of virtues holds that they are much like many other human skills in that you can become fluent in them through sustained practice. I'm helping to create a peer-supported self-help group in which each individual will identify a virtue they want to work on, and then work with a peer (who is doing the same) to come up with a practice curriculum and a way of mutual support / accountability so as to keep at it, eventually moving on to a second virtue, and so on. We're "alpha testing" it now. I'm hoping the LW community can give me some advice, and/or some pointers to other groups doing similar projects.

Discuss

### Risks from Learned Optimization: Introduction

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

*This is the first of five posts in the Mesa-Optimization Sequence based on the upcoming MIRI paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper, with the full paper set to be published on the MIRI blog with the release of the last post in the sequence.*

*With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, and everyone else who provided feedback on earlier versions of this sequence.*

The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as *mesa-optimization.* We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned?

We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we plan to present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as *deceptive alignment* which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning.

In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is *optimized* to do (its “purpose”), from what it *optimizes* for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an *optimizer* if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible plans, picking those that do well according to some objective.

Whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior. Importantly, the fact that a system’s behavior results in some objective being maximized does not make the system an optimizer. For example, a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.(1) Rather, bottle caps have been *optimized* to keep water in place. The optimizer in this situation is the human that designed the bottle cap by searching through the space of possible tools for one to successfully hold water in a bottle. Similarly, image-classifying neural networks are optimized to achieve low error in their classifications, but are not, in general, themselves performing optimization.

However, it is also possible for a neural network to itself run an optimization algorithm. For example, a neural network could run a planning algorithm that predicts the outcomes of potential plans and searches for those it predicts will result in some desired outcome.[1] Such a neural network would itself be an optimizer because it would be searching through the space of possible plans according to some objective function. If such a neural network were produced in training, there would be two optimizers: the learning algorithm that produced the neural network—which we will call the *base optimizer*—and the neural network itself—which we will call the *mesa-optimizer.*[2]

The possibility of mesa-optimizers has important implications for the safety of advanced machine learning systems. When a base optimizer generates a mesa-optimizer, safety properties of the base optimizer may not transfer to the mesa-optimizer. Thus, we explore two primary questions related to the safety of mesa-optimizers:

**Mesa-optimization:**Under what circumstances will learned algorithms be optimizers?**Inner alignment:**When a learned algorithm is an optimizer, what will its objective be, and how can it be aligned?

Once we have introduced our framework in this post, we will address the first question in the second, begin addressing the second question in the third post, and finally delve deeper into a specific aspect of the second question in the fourth post.

1.1. Base optimizers and mesa-optimizersConventionally, the base optimizer in a machine learning setup is some sort of gradient descent process with the goal of creating a model designed to accomplish some specific task.

Sometimes, this process will also involve some degree of meta-optimization wherein a *meta-optimizer* is tasked with producing a base optimizer that is itself good at optimizing systems to achieve particular goals. Specifically, we will think of a *meta-optimizer* as any system whose task is optimization. For example, we might design a meta-learning system to help tune our gradient descent process.(4) Though the model found by meta-optimization can be thought of as a kind of learned optimizer, it is not the form of learned optimization that we are interested in for this sequence. Rather, we are concerned with a different form of learned optimization which we call *mesa-optimization.*

Mesa-optimization is a conceptual dual of meta-optimization.[3] *Mesa-optimization* occurs when a base optimizer (in searching for algorithms to solve some problem) finds a model that is itself an optimizer, which we will call a *mesa-optimizer.* Unlike meta-optimization, in which the task itself is optimization, mesa-optimization is task-independent, and simply refers to any situation where the internal structure of the model ends up performing optimization because it is instrumentally useful for solving the given task.

In such a case, we will use *base objective* to refer to whatever criterion the base optimizer was using to select between different possible systems and *mesa-objective* to refer to whatever criterion the mesa-optimizer is using to select between different possible outputs. In reinforcement learning (RL), for example, the base objective is generally the expected return. Unlike the base objective, the mesa-objective is not specified directly by the programmers. Rather, the mesa-objective is simply whatever objective was found by the base optimizer that produced good performance on the training environment. Because the mesa-objective is not specified by the programmers, mesa-optimization opens up the possibility of a mismatch between the base and mesa- objectives, wherein the mesa-objective might seem to perform well on the training environment but lead to bad performance off the training environment. We will refer to this case as *pseudo-alignment* below.

There need not always be a mesa-objective since the algorithm found by the base optimizer will not always be performing optimization. Thus, in the general case, we will refer to the model generated by the base optimizer as a *learned algorithm,* which may or may not be a mesa-optimizer.

**Figure 1.1.** *The relationship between the base and mesa- optimizer. The base optimizer optimizes the learned algorithm based on its performance on the base objective. In order to do so, the base optimizer may have turned this learned algorithm into a mesa-optimizer, in which case the mesa-optimizer itself runs an optimization algorithm based on its own mesa-objective. Regardless, it is the learned algorithm that directly takes actions based on its input.*

**Possible misunderstanding: “mesa-optimizer” does not mean “subsystem” or “subagent.”** In the context of deep learning, a mesa-optimizer is simply a neural network that is implementing some optimization process and not some emergent subagent inside that neural network. Mesa-optimizers are simply a particular type of algorithm that the base optimizer might find to solve its task. Furthermore, we will generally be thinking of the base optimizer as a straightforward optimization algorithm, and not as an intelligent agent choosing to create a subagent (though some of our considerations apply even in that case).

We distinguish the mesa-objective from a related notion that we term the *behavioral objective*. Informally, the behavioral objective is the objective which appears to be optimized by the system’s behavior. More formally, we can operationalize the behavioral objective as the objective recovered from perfect inverse reinforcement learning (IRL).[4] This is in contrast to the mesa-objective, which is the objective *actively being used* by the mesa-optimizer in its optimization algorithm.

Arguably, any possible system has a behavioral objective—including bricks and bottle caps. However, for non-optimizers, the appropriate behavioral objective might just be “1 if the actions taken are those that are in fact taken by the system and 0 otherwise,”[5] and it is thus neither interesting nor useful to know that the system is acting to optimize this objective. For example, the behavioral objective “optimized” by a bottle cap is the objective of behaving like a bottle cap.[6] However, if the system is an optimizer, then it is more likely that it will have a meaningful behavioral objective. That is, to the degree that a mesa-optimizer’s output is systematically selected to optimize its mesa-objective, its behavior may look more like coherent attempts to move the world in a particular direction.[7]

A given mesa-optimizer’s mesa-objective is determined entirely by its internal workings. Once training is finished and a learned algorithm is selected, its direct output—e.g. the actions taken by an RL agent—no longer depends on the base objective. Thus, it is the mesa-objective, not the base objective, that determines a mesa-optimizer’s behavioral objective. Of course, to the degree that the learned algorithm was selected on the basis of the base objective, its output will score well on the base objective. However, in the case of a distributional shift, we should expect a mesa-optimizer’s behavior to more robustly optimize for the mesa-objective since its behavior is directly computed according to it.

As an example to illustrate the base/mesa distinction in a different domain, and the possibility of misalignment between the base and mesa- objectives, consider biological evolution. To a first approximation, evolution selects organisms according to the objective function of their inclusive genetic fitness in some environment.[8] Most of these biological organisms—plants, for example—are not “trying” to achieve anything, but instead merely implement heuristics that have been pre-selected by evolution. However, some organisms, such as humans, have behavior that does not merely consist of such heuristics but is instead also the result of goal-directed optimization algorithms implemented in the brains of these organisms. Because of this, these organisms can perform behavior that is completely novel from the perspective of the evolutionary process, such as humans building computers.

However, humans tend not to place explicit value on evolution’s objective, at least in terms of caring about their alleles' frequency in the population. The objective function stored in the human brain is not the same as the objective function of evolution. Thus, when humans display novel behavior optimized for their own objectives, they can perform very poorly according to evolution’s objective. Making a decision not to have children is a possible example of this. Therefore, we can think of evolution as a base optimizer that produced brains—mesa-optimizers—which then actually produce organisms’ behavior—behavior that is not necessarily aligned with evolution.

1.2. The inner and outer alignment problemsIn “Scalable agent alignment via reward modeling,” Leike et al. describe the concept of the “reward-result gap” as the difference between the (in their case learned) “reward model” (what we call the base objective) and the “reward function that is recovered with perfect inverse reinforcement learning” (what we call the behavioral objective).(8) That is, the reward-result gap is the fact that there can be a difference between what a learned algorithm is observed to be doing and what the programmers want it to be doing.

The problem posed by misaligned mesa-optimizers is a kind of reward-result gap. Specifically, it is the gap between the base objective and the mesa-objective (which then causes a gap between the base objective and the behavioral objective). We will call the problem of eliminating the base-mesa objective gap the *inner alignment problem,* which we will contrast with the *outer alignment problem* of eliminating the gap between the base objective and the intended goal of the programmers. This terminology is motivated by the fact that the inner alignment problem is an alignment problem entirely internal to the machine learning system, whereas the outer alignment problem is an alignment problem between the system and the humans outside of it (specifically between the base objective and the programmer’s intentions). In the context of machine learning, outer alignment refers to aligning the specified loss function with the intended goal, whereas inner alignment refers to aligning the mesa-objective of a mesa-optimizer with the specified loss function.

It might not be necessary to solve the inner alignment problem in order to produce safe, highly capable AI systems, as it might be possible to prevent mesa-optimizers from occurring in the first place. If mesa-optimizers cannot be reliably prevented, however, then some solution to both the outer and inner alignment problems will be necessary to ensure that mesa-optimizers are aligned with the intended goal of the programmers.

1.3. Robust alignment vs. pseudo-alignmentGiven enough training, a mesa-optimizer should eventually be able to produce outputs that score highly on the base objective on the training distribution. Off the training distribution, however—and even on the training distribution while it is still early in the training process—the difference could be arbitrarily large. We will use the term *robustly aligned* to refer to mesa-optimizers with mesa-objectives that robustly agree with the base objective across distributions and the term *pseudo-aligned* to refer to mesa-optimizers with mesa-objectives that agree with the base objective on past training data, but not robustly across possible future data (either in testing, deployment, or further training). For a pseudo-aligned mesa-optimizer, there will be environments in which the base and mesa- objectives diverge. Pseudo-alignment, therefore, presents a potentially dangerous robustness problem since it opens up the possibility of a machine learning system that competently takes actions to achieve something other than the intended goal when off the training distribution. That is, its capabilities might generalize while its objective does not.

For a toy example of what pseudo-alignment might look like, consider an RL agent trained on a maze navigation task where all the doors during training happen to be red. Let the base objective (reward function) be Obase=(1 if reached a door, 0 otherwise). On the training distribution, this objective is equivalent to Oalt=(1 if reached something red, 0 otherwise). Consider what would happen if an agent, trained to high performance on Obase on this task, were put in an environment where the doors are instead blue, and with some red objects that are not doors. It might generalize on Obase, reliably navigating to the blue door in each maze (robust alignment). But it might also generalize on Oalt instead of Obase, reliably navigating each maze to reach red objects (pseudo-alignment).[9]

1.4. Mesa-optimization as a safety problemIf pseudo-aligned mesa-optimizers may arise in advanced ML systems, as we will suggest, they could pose two critical safety problems.

**Unintended optimization.** First, the possibility of mesa-optimization means that an advanced ML system could end up implementing a powerful optimization procedure even if its programmers never intended it to do so. This could be dangerous if such optimization leads the system to take extremal actions outside the scope of its intended behavior in trying to maximize its mesa-objective. Of particular concern are optimizers with objective functions and optimization procedures that generalize to the real world. The conditions that lead a learning algorithm to find mesa-optimizers, however, are very poorly understood. Knowing them would allow us to predict cases where mesa-optimization is more likely, as well as take measures to discourage mesa-optimization from occurring in the first place. The second post will examine some features of machine learning algorithms that might influence their likelihood of finding mesa-optimizers.

**Inner alignment.** Second, even in cases where it is acceptable for a base optimizer to find a mesa-optimizer, a mesa-optimizer might optimize for something other than the specified reward function. In such a case, it could produce bad behavior even if optimizing the correct reward function was known to be safe. This could happen either during training—before the mesa-optimizer gets to the point where it is aligned over the training distribution—or during testing or deployment when the system is off the training distribution. The third post will address some of the different ways in which a mesa-optimizer could be selected to optimize for something other than the specified reward function, as well as what attributes of an ML system are likely to encourage this. In the fourth post, we will discuss a possible extreme inner alignment failure—which we believe presents one of the most dangerous risks along these lines—wherein a sufficiently capable misaligned mesa-optimizer could learn to behave as if it were aligned without actually being robustly aligned. We will call this situation *deceptive alignment.*

It may be that pseudo-aligned mesa-optimizers are easy to address—if there exists a reliable method of aligning them, or of preventing base optimizers from finding them. However, it may also be that addressing misaligned mesa-optimizers is very difficult—the problem is not sufficiently well-understood at this point for us to know. Certainly, current ML systems do not produce dangerous mesa-optimizers, though whether future systems might is unknown. It is indeed because of these unknowns that we believe the problem is important to analyze.

*The second post in the Mesa-Optimization Sequence, titled “Conditions for Mesa-optimization,” will be released in two days.*

As a concrete example of what a neural network optimizer might look like, consider TreeQN.(2) TreeQN, as described in Farquhar et al., is a Q-learning agent that performs model-based planning (via tree search in a latent representation of the environment states)

*as part of its computation of the Q-function.*Though their agent is an optimizer by design, one could imagine a similar algorithm being*learned*by a DQN agent with a sufficiently expressive approximator for the Q function. Universal Planning Networks, as described by Srinivas et al.,(3) provide another example of a learned system that performs optimization, though the optimization there is built-in in the form of SGD via automatic differentiation. However, research such as that in Andrychowicz et al.(4) and Duan et al.(5) demonstrate that optimization algorithms can be learned by RNNs, making it possible that a Universal Planning Networks-like agent could be entirely learned—assuming a very expressive model space—including the internal optimization steps. Note that while these examples are taken from reinforcement learning, optimization might in principle take place in any sufficiently expressive learned system. ↩︎Previous work in this space has often centered around the concept of “optimization daemons,”(6) a framework that we believe is potentially misleading and hope to supplant. Notably, the term “optimization daemon” came out of discussions regarding the nature of humans and evolution, and, as a result, carries anthropomorphic connotations. ↩︎

The word mesa has been proposed as the opposite of meta.(7) The duality comes from thinking of meta-optimization as one layer above the base optimizer and mesa-optimization as one layer below. ↩︎

Leike et al.(8) introduce the concept of an objective recovered from perfect IRL. ↩︎

For the formal construction of this objective, see pg. 6 in Leike et al.(8) ↩︎

This objective is by definition trivially optimal in any situation that the bottlecap finds itself in. ↩︎

Ultimately, our worry is optimization in the direction of some coherent but unsafe objective. In this sequence, we assume that search provides sufficient structure to expect coherent objectives. While we believe this is a reasonable assumption, it is unclear both whether search is necessary and whether it is sufficient. Further work examining this assumption will likely be needed. ↩︎

The situation with evolution is more complicated than is presented here and we do not expect our analogy to live up to intense scrutiny. We present it as nothing more than that: an evocative analogy (and, to some extent, an existence proof) that explains the key concepts. More careful arguments are presented later. ↩︎

Of course, it might also fail to generalize at all. ↩︎

Discuss

### Have you lost your purpose?

Not long ago, I noticed myself wondering: why am I working on AI Safety again?

I remembered caring deeply about it. Doing essentially everything for the sake of that singular point in the future. Resting for the sake of work. Brooding over it every moment. But not now.

Now I found myself just following the path of least resistance that my past self had carved out for me. Simply carrying out my social role. Sheepishly. Kinda ignoring the point of it.

My sincere worry had turned into a fake belief. One that I kept to preserve social capital, while conveniently forgetting the mentally taxing worldview that motivated my work in the first place.

This had been going on for at least a year, and it worries me. It shows that losing purpose can happen internally, quietly, long before it manifests outwardly in obviously malign choices. Long before you notice. I was still acting as if I was pursuing AI Safety, but merely to placate social control mechanisms. I wonder just how much thinking was wasted in the times that those mechanisms were appeased.

I thought long and hard about it. What was the point again? I simulated what I thought might happen. I imagined seeing hours of work being done in the 3 seconds after uttering a command. I imagined everything changing overwhelmingly fast, being lost in horrible confusion. I imagined the joy of idea generation being my bottleneck, instead of boring execution.

And it all came back. The fire lit. My beliefs paid rent again.

Do yours?

Discuss

## Страницы

- « первая
- ‹ предыдущая
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- следующая ›
- последняя »