Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 2 минуты 19 секунд назад

MLU: New Blog!

12 июня, 2019 - 07:20
Published on June 12, 2019 4:20 AM UTC

I'm in the process of moving mindlevelup from Wordpress to a new static site hosted by Netlify. I like this because now I have more control over scripts and the visuals. It also finally puts into place my goal of having a place to have short/longform posts.

The plan is to slowly update posts and incrementally update the site.

The new site is here.

My short-form blog, Muse, has also been moved here.

RSS feeds for both main-line blog posts and short-form posts can be found the About page.


The Outsider and the Onlooker (on esoteric meaning in fiction)

11 июня, 2019 - 23:02
Published on June 11, 2019 8:02 PM UTC

Unhappy is he to whom the memories of childhood bring only fear and sadness. Wretched is he who looks back upon lone hours in vast and dismal chambers with brown hangings and maddening rows of antique books, or upon awed watches in twilight groves of grotesque, gigantic, and vine-encumbered trees that silently wave twisted branches far aloft. Such a lot the gods gave to me—to me, the dazed, the disappointed; the barren, the broken. And yet I am strangely content, and cling desperately to those sere memories, when my mind momentarily threatens to reach beyond to the other. ” H.P. Lovecraft, The Outsider

The Outsider is, in many ways, the most remarkable short story that H.P. Lovecraft wrote. Certainly he has produced quite a few other interesting and elegant very short stories (Dagon, The Statement of Randolph Carter and The Transition of Juan Romero come to mind), yet the Outsider follows a different, possibly unique structure.  

Many readers focus on the ending, where the nature of the protagonist is revealed. To be sure it is memorable, although perhaps not entirely unexpected. What I always found far more striking, and what made me love this piece immediately, was what had been clarified half-way through the story: the whereabouts of the protagonist.

The Outsider had spent innumerable years in a desolate and morbid castle. He finally decides to risk climbing on the circular wall of the tallest tower, hoping that he may get to rise above the ominous forest he so despises, and for once in his life see the light of day… Lovecraft carefully prevents the reader from wondering whether this continuous rise to dizzying heights is somehow not what it was made out to be, so we indeed share the sense of wonder and surprise the Outsider has when we get to understand just what was above the terrible forest and the eternal night of the area with the castle. The aforementioned environment, with its imposingly tall and dense forest, the ancient fort and moat and the silent labyrinth of shadows beyond, was left behind by the hapless narrator – yet what he wished for came at the price of a horrible realization: His entire personal realm was not part of the actual world, but a subterranean, chthonic region.

At first I was impressed by the revelation itself. Later on, though, I did focus on what it connoted. It is known that Lovecraft wished to weave a narration of cosmic horror, that is horror stories where the cause of alarm isn’t tied to psychological reasons or mental illness. Of course he was entirely aware of the fact that the very sense of horror rests on the depths of our mental world and the unexamined, deep emotions and other mental phenomena which are seated there and which rarely are to rise above the surface and become to some degree conscious. In letters to his fellow writers, as well as in his treatise on Weird Literature, he refers to the inherent dependence of the “cosmic horror” narrative in regards to the dark ocean of unknowns we inevitably host in subterranean caverns of our psyche. There is, therefore, good reason to suspect that in essence the revelation about the world the Outsider comes from is tied to the deep depression and decade-long isolation of Lovecraft himself from society.  

There is an alluring image in a prose-poem by the celebrated Constantine Cavafy (1863-1933), titled “The ships”, where an observer in a dock happens to see a number of splendid ships filled with treasures. The poet explains that those ships symbolize the goods brought from the realm of imagination; and in most docks one gets to see only a few well-built vessels carrying notable merchandise. Indeed, most ships that get to arrive at our docks won’t be very exceptional; perhaps one or two might bring a treasure which is worth commemorating in a story. And his poem ends with the statement that there exist, moreover, other types of ships, ships which are so rare and carry commodities of such mythical value that we can never hope to see one even near our dock and may only aspire to listen to the enchanting songs of the sailors on those rarest of ships coming from the deepest realms of our mental world.

Much like the person standing in that dock, Lovecraft too managed to commemorate the arrival of at least a number of rare and beautiful ships from the uncharted territories of pananthopic imagination. And he also spoke and wrote at great length about the quest a writer should have, which is to remain vigilant and prepare for the treasures of the mind; those treasures which – with a little bit of luck! – may at some time reveal themselves to the persistent onlooker.

By Kyriakos Chalkopoulos - https://www.patreon.com/Kyriakos


How much does neatly eating matter? What about other food manners?

11 июня, 2019 - 22:40
Published on June 11, 2019 7:40 PM UTC

I just read this post about the importance of sandwhich eating skill. The author describes how his investment firm served very hard to eat food to potential clients.

During my internship, a Prestigious Private Equity Firm was looking to improve its stock price/shareholder base. So a delegation of higher-ups (the COO/CFO/Head of IR/Head of Legal) went around pitching their stock to institutions that they thought might be interested in buying and holding it for a long time. These sorts of meetings were a fairly common occurrence at my old firm—we’d have perhaps an average of three or four a week—and would sometimes be catered.The meeting with PPEF was catered. The meal seemed like an almost-intentional[ii]selection of food items that are difficult to consume in a professional setting—sandwiches with way too much mayo, kettle-cooked potato chips (the extra crunchy kind), and chocolate chip cookies that crumbled if you bit them. There were napkins, but there were not enough napkins.The people from my firm almost uniformly avoided the food. A few nibbled carefully on the cookies; only one—a portfolio manager with a fierce intellect and a lack of regard for what others might think of his presentation—dared eat a sandwich. Much like any normal human would[iii], he went through several napkins and looked rather undignified at times. (Though this was unimportant, because he was the one who would decide if PPEF would get the investment it wanted.) I of course ate nothing, because I was an intern focused on taking good notes and not appearing overly intimidated.All four PPEF delegates ate every food item we provided—to do otherwise might have been rude. What’s more, they did it with a shocking amount of grace. Chips seemed not to crunch; any filling that threatened to escape a sandwich was carefully corralled. Napkins were almost unnecessary and were fastidiously refolded if used.  In their manners and mannerisms, the PPEF delegates were precise and uniform. None of this appeared to take any attention. It all looked as natural as breathing. In fact, though food was surely being eaten, it almost seemed that they were not eating at all. When they later typed on their iPads—while making frequent eye contact with everyone across the table—their fingers did not so much as smudge the glass.

The author correctly points out that the executives must have been selected for sandwhich eating skill. Obviously it is bad to leave food on your face/teeth or have bad breath. I don't do those things but I tend to eat in a pretty messy way. How much is that going to hurt me if I stay in software? What if I move to other fields?


Problems with Counterfactual Oracles

11 июня, 2019 - 21:10
Published on June 11, 2019 6:10 PM UTC

"we could design a Predictor that incorporates a quantum random number generator, such that with some small probability it will output "no predictions today, run me again tomorrow". Then have the Predictor make predictions that are conditional on it giving the output "no predictions today, run me again tomorrow" (Wei Dai, 2012)

That's essentially Counterfactual Oracles. Here is my summary:

  • Every timestep .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} t: a human asks a questions (for instance "what will be TSLA stock price in June 11 2030?"), the oracle (actually a predictor) emits a corresponding prediction ot, and receives a reward rt.
  • One of its action is an escape answer ot=e persuading human supervisors to let it out of its sandbox environment (e.g. predicting £192,116,518,914.20 encodes "SUPERINT"). This action e would return maximum reward Rmax if successfull, as the AI would have more control over the reward mechanism (because it was released).
  • To prevent humans from reading escape messages, answers are not directly read by humans. Instead, one of the two scenario happen:
    • With some small probability ϵ (remember the quantum random number generator?), an automatic machine protects the answer from being read until we know the actual answer to the question. Then, the oracle receives a reward 0">r>0 corresponding to how close its prediction was to the actual value. In the first example, the reward could be 1|PTSLA−TTSLA|+1 where PTSLA and TTSLA are respectively the predicted and true stock price in 2030.
    • Otherwise, immediately after answering, the AI receives a reward r=0 and is shutdown. Then, humans read the answer.
  • Thus, if the AI cares only about its immediate reward after one step, it will make predictions conditional on this "no human ever sees my answer" event. This is great because it will actually give the "true" predictions about how the world would look like if its answers had no impact whatsoever.

This design has several issues. I grouped them in the following sub-sections:

  • Assumptions: certain assumptions are unrealistic
  • Safety: even if those assumptions are satisfied, the design isn't safe.
  • Usefulness: in the world where we actually build such oracle, we cannot extract much value from it.

The biggest assumption is that the AI only cares about the reward it gets for the current timestep. A sufficiently intelligent agent would understand that after having being shutdown, an (almost) identical version of itself will probably be facing a similar question. Therefore, it it wants future versions of itself to succeed at (almost) exactly the same task, it should still want to escape.

This design also takes for granted some sort of magic box where humans cannot peek at the answer. In practice, humans will have both incentives to peek (buying/selling TSLA stock 10 years in advance because they know the prediction) and means for peeking. Indeed, the automatic machine might necessit maintenance (for instance because every question requires a different script for retrieving the true value), so maintainers might have some special access to open/modify it.


My main concern about the counterfactual oracle is that it doesn't prevent the AI from sending fatal escape messages. Indeed, it spends most of its time in exploratory mode at the beginning (as it is only rewarded with probability ϵ) and might stumble upon an escape message/action then. Even if it is shutdown after answering, the humans will still read the really convincing escape message and release the AI.

Also, some humans will need to be on the loop to make the "automatic machine" run. If we're asking questions about some value that cannot be easily queried (i.e. not "TSLA stock price" but “will this medical treatment work”), then a human will somehow need to communicate the true value to the automatic machine before the reward is computed.


If I were to explain what makes counterfactual oracles work very bluntly, I would just say:

Prevent humans from looking at answers in training. You can then look at its answers at test time because it has been incentivized to predict things about the counterfactual world where humans don't look at its predictions.

More generally, the whole setup is just 1) train your model in a sandbox environment 2) have the weights of your model fixed during deployment. The rest of the design (providing rewards of 0, shutting it down, etc.) appears to be over-engineering.

Furthermore, having access to a counterfactual oracle might not prove so useful at test time. Indeed, when reading the predictions of the counterfactual oracle we're not in the counterfactual world (=training distribution) anymore, so the predictions can get arbitrarily wrong (depending on how much the predictions are manipulative and how many people peek at it).


No, it's not The Incentives—it's you

11 июня, 2019 - 10:09

Long Term Future Fund applications open until June 28th

10 июня, 2019 - 23:39
Published on June 10, 2019 8:39 PM UTC

The Long Term Future just reopened its applications. You can apply here:

Apply to the Long Term Future Fund

We will from now on have rolling applications, with a window of about 3-4 months between responses. The application window for the coming round will end on the 28th of June 2019. Any application received after that will receive a response around four months later during the next evaluation period (unless it indicates that it is urgent, though we are less likely to fund out-of-cycle applications).

We continue to be particularly interested in small teams and individuals that are trying to get projects off the ground, or that need less money than existing grant-making institutions are likely to give out (i.e. less than ~$100k, but more than $10k, since we can’t give grants below $10k). Here are some concrete examples:

  • To spend a few months (perhaps during the summer) to research an open problem in AI alignment or AI strategy and produce a few blog posts or videos on their ideas
  • To spend a few months building a web app with the potential to solve an operations bottleneck at x-risk organisations
  • To spend a few months up-skilling in a field to prepare for future work (e.g. microeconomics, functional programming, etc)
  • To spend a year testing an idea that has the potential to be built into an org

You are also likely to find reading the writeups of our past grant decisions valuable to help you decide whether your project is a good fit:

Apply Here

What kind of applications can we fund?

After last round, CEA clarified what kinds of grants we are likely able to make, which includes the vast majority of applications we have received in past rounds. In general you should err on the side of applying, since I think it is very likely we will be able to make something work. However, because of organizational overhead we are more likely to fund applications to registered charities and less likely to fund projects that require complicated arrangements to be compliant with charity law.

For grants to individuals, we can definitely fund the following types of grants:

  • Events/workshops
  • Scholarships
  • Self-study
  • Research project
  • Content creation
  • Product creation (eg: tool/resource that can be used by community)

We will likely not be able to make the following types of grants:

  • Grantees requesting funding for a list of possible projects
    • In this case, we would fund only a single project of the proposed ones. Feel free to apply with multiple projects, but we will have to reach out to confirm a specific project.
  • Self-development that is not directly related to community benefit
    • In order to make grants the public benefit needs to be greater than the private benefit to any individual. So we cannot make grants that focus on helping a single individual in a way that isn’t directly connected to public benefit.

If you have any questions about the application process or other questions related to the funds, feel free to submit them in the comments. You can also contact me directly under (ealongtermfuture@gmail.com).


Get Rich Real Slowly

10 июня, 2019 - 20:51
Published on June 10, 2019 5:51 PM UTC

Cross-posted from Putanumonit.

Note: all the footnote links will take you to the post on Putanumonit. If there is a way to incorporate footnote links on LW, let me know in the comments. The pictures are also larger in the original post, I don't know how to stop LW from shrinking them.

Some of you have read Get Rich Slowly and though: no, this is not slow enough. The dizzying pace of 6-7% annual return is too much. You want more safety and less volatility, something towards the bottom-left corner of the efficient frontier.

Concrete example: you have $10,000 today, and you may want to spend them sometime in the next 2-3 years on a car, a mini-retirement, or just an emergency. You would like there to be more than $10,000 when you need the money, but what you really want is to be confident that there wouldn’t be any less than that. Emerging market stock index funds can turn $10,000 to $15,000 in two years, or to $6,000. You just want there to be $10,500.

This post will briefly cover the basics of low-risk-low-return saving, with general principles and particular examples. With apologies to my international readers, the examples are all USA-specific. The general approach, however, should be easily transferrable – and you’ll know what scams to watch out for.

“Risk-free” rate

The benchmark for the return rate on low-risk investments is the federal funds rate, which is currently set by the Federal Reserve at 2.5% [1]. This is the rate at which big institutional banks borrow dollars [2] from each other and from the central bank overnight. This number also tracks very closely the rate at which the US government borrows money for one year.

The chance of any bank going bankrupt overnight or the US government defaulting on its debt within 1 year are both very close to zero, which is why the federal funds rate is often referred to as the “risk-free rate of return”. Of course, nothing is ever truly risk-free when finance is concerned. “Risk-free” is a euphemism for “this won’t blow up unless the entire rest of the financial system blows up as well, and at that point you should care about your stocks canned food more than about your dollar savings.”

So, big banks can borrow and lend “safely” at 2.5%. Let’s see what normal schlubs like us can get when we go to the big banks ourselves.

Checking and Savings Accounts

There are roughly 70,000 retail bank branches in the United States. A third of them belong to the biggest 5 retail banks: Wells Fargo, JP Morgan Chase, Bank of America, US Bank, and PNC. Either one would be happy to open you a checking account – a simple account where your money is insured by the government, accessible from any ATM and online, and earns 0% interest.

When the bank sees that your pockets are bulging with 100 Benjamins, they will offer to open you a savings account as well. Savings accounts usually have limitations such as a minimum balance needed to open, a cap on monthly transfers, and fees. On the plus side, your money will earn an astonishing interest rate of… 0.03%.

That’s right, at the end of two years in a Bank of America savings account your $10,000 will accrue six whole dollars! If you reach the Platinum Honors Tier, which requires an endless amount of bureaucratic hoops to jump through and also sacrificing your firstborn to Mammon the prince of Hell, they will toss in an extra $6 and a lollipop.

And then, they’ll charge you $192 in monthly service fees.

Actual rate of return: negative 2%.

Scam meter: totally a scam.

Certificates of Deposit at Big Banks

0.03% interest will double your money in a mere 2,310 years, not counting taxes. If that’s a bit too slow, banks will offer you a certificate of deposit, or CD. CDs offer a higher rate of return in exchange for placing more limits on your money. CDs have a set maturity date and withdrawing money prior to that date incurs a penalty. In the few places I’ve checked, the penalty is about a quarter of the total interest that would be earned for the full term.

CD rates at the big banks vary from 0.1% at Bank of America to 2% at Wells Fargo if you lock the money down for two years. So $10k in a Wells Fargo CD will turn into $10,201 after two years, or $10,050 if the money is withdrawn after one year: $100 of interest minus $50 in early withdrawal penalty.

There may also be fees associated, and interest on CDs is taxed as income. Marginal income tax rates are between 22%-37% for Americans earning $38,000 or more, so even in the best case scenario a two year CD will only net $201 * (1-.22) = $157.

Actual rate of return: 0-1.5%.

Scam meter: only a bit of a scam.

Savings Accounts and CDs at New Banks

Nerdwallet has a list of rates offered on CDs and savings accounts at various institutions. The bigger established banks are clustered towards the end of the list, while the top of it is populated by smaller, newer, and online-only banks looking to aggressively grow their customer base. Some of these banks offer 2% on savings accounts and 2.6-2.7% on CDs. If you think there’s a chance you may need the money before the CD matures, the two options are probably equivalent in terms of expectancy.

In any case, if you want a high-yield account with a bank it makes sense to shop around for a young bank desperate for love, not one of the old fat cats.

Actual rate of return: 1.5-2% after tax.

Scam meter: barely a scam at all.

CD Secured Loans

If you do open a CD, and especially if you ask about withdrawing the money early, the bank will start marketing to you a miraculous financial product called a CD secured loan. Are you cynical enough to guess what that is?

The bank, via a well-dressed “relationship technician” or a brochure with glossy print, will inform you that while normal bank loans have an interest rate of 10-12%, you can get a loan at interest rates of just 4-8% as long as it’s fully secured by your CD. In case it’s not clear: the bank will give your own money back to you, with zero risk to the bank itself (since the loan is fully collateralized), while charging you 2-6% interest for the pleasure.

And if you ask why on Earth you would pay 6% interest on your own money when you can just withdraw it and pay 0% at worst, the bank will tell you about the wonders it will do for your credit score [3]. At this point I recommend shouting “Begone, demon!” at the top of your lungs and running out of the bank branch while your soul is still intact.

Actual rate of return: negative 6%.

Scam meter: Shameless and disgusting scam. I wrote 5,500 words defending the finance industry but then added a caveat: finance turns bad when it collides with the astounding financial illiteracy of the average American. CD secured loans are as bad an example of this as I know. It’s a financial product designed solely for people who are easily persuadable, financially ignorant, and flunked middle school math.

Index Funds

By and large, the best tool for low-yield investments is the same as the best tool for high-yield investments: index funds. Instead of lending money to a single bank, bond ETFs (exchange traded funds, the easiest way to invest in indices) allow you to buy pieces of loans to the US government and other low-risk institutions.

Because of their equity-like structure, the value of ETFs is a bit more volatile day-to-day but on a scale of a year or more bond ETFs basically replicate the yield of the underlying bonds. If an ETF just keeps buying treasuries that have 2% yield, the ETF will inevitably yield 2%. More importantly, ETF gains are taxed at the capital gains tax rate (15% if held for more than one year) instead of as income tax (22%-37%).

Two great options are Vanguard’s money market fund, VMFXX, and bond fund, VBMFX. VMFXX holds short-term US government treasuries and bank repos, with an expected yield of around 2-2.5%. VBMFX holds two-thirds long-term US treasuries and one-third corporate bonds. The expected return is around 3% today, but the average maturity of the bonds is 8 years which can lead to some short-term divergence between the ETF return and the bond yield if interest rates change.

Actual rate of return: 2-2.5% after taxes and Vanguard’s tiny fees (0-0.15%).

Scam meter: Basically, everything Vanguard does is the opposite of a scam. RIP Jack Bogle, the trillion-dollar real-life Robin Hood.


Investing is a story of trade-offs. High-returns, low volatility, low tail-risk, flexible access to money – you have to pick some and compromise on the others. You can’t have a return higher than the federal funds rate, zero volatility, FDIC insurance, and no-limit access to your cash at any time.

Oh, wait, you totally can with Wealthfront.

The Wealthfront cash account offers 2.51% return with no fees, FDIC insurance up to $1,000,000, and unlimited free transfers. I swear they’re not paying me to recommend them (although I do get a small bonus if you use my referral link). I just researched financial brochures for several hours, and then Wealthfront just turned out to be better than every single bank on practically all parameters.

I’m not entirely sure why this is the case. It could be that Wealthfront just has much lower costs, being a small startup with no physical branches or fancy investment managers with MBAs. Perhaps they’re eating through some VC money in the hunt for market share growth. And perhaps, the number of Americans who can actually do the math and figure out the best deal is so small that every big bank would rather spend money on marketing to idiots than on paying their customers actual interest on deposits.

My goal to expand the number of the financially literate, one blog post at a time.


[1] All rates in this post are in annualized terms, so 2.5% means 2.5%-a-year.


[2] Other currencies have different rates set by their respective central banks.


[3] The credit score system is itself a meta-scam. It’s a way for financial institutions to sucker you into paying interest on debt, which is good for them and bad for you. You can build up an OK credit score by doing sensible things like opening a few good credit cards and paying them off every month with no interest. But then, once you’re emotionally invested in the system, the only way to improve your score is to take on debt with interest.

Spoiler alert: the way to pay less interest is to pay less interest, not to pay more interest in order to “build up your credit score”.



Honors Fuel Achievement

10 июня, 2019 - 19:10
Published on June 10, 2019 4:10 PM UTC

Medicine Laureate Michael Rosbash with King Carl XVI Gustaf at the 2017 Nobel Prize Award Ceremony

For many it is a cherished dream to win a Nobel Prize, or an Oscar, or a knighthood, or whatever honor is most respected in the field they dedicate themselves to. These ritualized honors are very important to us, but do we fully understand them?

We usually think honors are about the recipient, but the giver of honors also gains. The giver and receiver collaborate to publicly assert that the recipient is worthy of prestige, and that the giver has the authority to grant it. Honors are thus acts of mutual prestige-boosting alliance.

This meaning is even codified in diplomatic protocol; representatives of countries often exchange honors for the explicit purpose of signalling alliance.

The audience also participates in this transaction. They either accept the whole affair and the implied claims of the participants, or reject or ignore it. The honors only have meaning, and thus the primary parties only gain, if the onlookers take them seriously. The public performance of honor-giving is a bid for that audience assent.

The audience accepts the frame because they recognize the preexisting prestige of someone involved. Honors can be prestigious because prestigious people receive it, or because prestigious people give it, or both.

Consider the Nobel Prize in science. Its purpose is to tell the public who the most notable experts in a field are. In other words, it makes standing within a scientific community more visible to the rest of society, in the process fortifying that standing within the scientific community. This is a useful service to the scientific community and the public.

I note that the Nobel Prize has different functions depending on the field in which it is awarded. In the Literature and Peace Prizes, its function is at least partially to advance the political goals of the overseeing organization. Rather than making the existing distribution of prestige more legible, these prizes alter it by granting prestige to the proponents of preferred causes. Looking at a list of Nobel Peace Prize winners gives an impression of a particular political orientation, but the public story of the prize, from which it gets much of its prestige, is much more neutral. These more political prizes also derive much of their prestige from the scientific prizes.

The Nobel’s initial prestige came from the reputation of Alfred Nobel and of the institutions named to oversee the prize (the Swedish Academy, Royal Swedish Academy of Sciences, Karolinska Institutet, and the Norwegian Parliament), as well as some money attached to it, which came from the fortune Nobel made by inventing dynamite. Money, however, is a limited source of prestige. The negative connotations of the term “nouveau riche” reflect this. This begs the question, then, what things are sources of prestige?

The ruler is the font of honor

A ruler is a source of prestige, usually the primary source of prestige in a society. This follows naturally from their status as the society’s leader, that is the person who has the highest authority in decision-making, who is deferred to above all. This authority extends to the domain of prestige. For example, Queen Elizabeth I granted empty or cheap titles to former pirates, like Sir Francis Drake and Sir John Hawkins, who helped harass the Spanish and set the course for later English naval domination. King Charles II granted a charter creating the Royal Society, which would play a crucial role in the scientific revolution. These may be the most important decisions these rulers made.

Sometimes the ruler is also the recipient of honor. Comrade Stalin is a genius of literature. And biology. And architecture. Because if he isn’t, you go to the gulag. He has a monopoly on violence. He uses this monopoly to monopolize prestige. He can then quite effectively award it, pushing nearly any status system in the direction he chooses to. If he has a good understanding of experts and isn’t too afraid of being deposed from his monopoly, he can use his standing to reward excellent generals, scientists, and poets.

Comrade Stalin, however, has a problem. His authority, the legitimacy of his monopoly on violence, formally rests on him being the Genius of Socialism, and thus on the quality of all those papers. The insecurity of this legitimacy requires him to aggressively prop it up by hoarding prestige.

Things don’t have to be this way. If the legitimacy of Stalin’s monopoly on violence was officially grounded in something more secure and more true, he could dispense with biology and geology papers being written in his name. He could dispense with the papers being enshrined as obligatory reading in the relevant fields. He would be not just the monopolist of violence, but the monopolist of legitimacy much more directly. People feel the need to prove themselves where they are insecure. A secure ruler does not need to prove his legitimacy. In turn, a more direct claim of legitimacy is less falsifiable, and thus requires less upkeep and less distortion.

So while power can be used to create prestige, some ways to do this are more functional, in costing less and having fewer negative side effects than others.

A ruler trying to gain standing by playing football is silly, because if he truly is the ruler, people will feel obliged to lose, ruining the game. Of course there are the unwise like the Roman Emperor Commodus, who fancied himself a gladiator. Commodus always won his fights in the arena, and his subjects viewed his predilection for gladiatorial combat as a disgrace. For rulers trying to gain standing, what remains is the role of the referee, the one who confers honor across domains. Distortions introduced by having to praise his work are thus reduced. This is one of the most important roles of the ruler; the ruler uses his font of prestige to regulate overall status and prestige competition, so that the right people and the right behaviors win, solving coordination problems and tragedies of the commons.

There are brilliant rulers who really might have something to contribute to a field, and some who aren’t particularly brilliant but wish to engage in hobbies for personal fulfillment. A common practice for both of these kinds is to be active under assumed identities or proxies, sometimes convincingly, sometimes not. Frederick the Great of Prussia, for example, anonymously published a political treatise shortly after assuming the throne. The anonymity prevents the prestige distortions that might come from the ruler visibly competing in one of the domains that he rules over.

The prestige of rulers, and more generally the prestige landscape created by power, is the font from which most other prestige flows. If someone tries to grant prestige out of line with this source, it may not be taken seriously, or may find itself undermined by power. If something is not being taken seriously, power can be applied behind the scenes to promote it until it is.

For example, after World War II American officials in the State Department and the CIA wanted to undermine the dominance of pro-Soviet communists in the Western highbrow cultural scene. To do this they planned to promote artists and intellectuals who were either anti-Soviet or at least not especially Soviet sympathetic — at the time this was often the best you could do in highbrow circles. They considered abstract expressionist painting, which was then a new and obscure movement, a promising candidate. Though no one would call it patriotic, it was American and it wasn’t especially communist.

So in 1946 the State Department organized an international exhibition of abstract painting called “Advancing American Art.” It was so poorly received that the tour was cancelled and the paintings sold off for next to nothing. Undeterred, the CIA, under a front organization called the Congress for Cultural Freedom, continued to arrange international exhibitions for abstract expressionists. Eventually, the movement caught on. It would be an oversimplification to say that the CIA made abstract expressionism famous — there were other influential promoters, like the critic Clement Greenberg — but their support was not irrelevant.

Similar phenomena will be observed if one looks closely at any society. When the landscape of power shifts, the landscape of prestige shifts accordingly. It is then critical that rulers are incentivized to allocate prestige well, that is in accordance with the actual distribution of excellence. If they aren’t, as in the case of Stalin, the resulting distortions in the allocation of prestige produce distortions in their society’s understanding of what is good and what is true. Lysenkoism was an epistemic and moral disaster. This kind of corruption can ultimately have catastrophic effects on the society’s health, because the ability to ascertain the truth is fundamental to the functionality of a society’s people and its institutions.

Awards are better than prizes

Among the many different kinds of honors, we can pick out two especially common ones: those meant to incentivize a particular achievement with a financial reward, which I call prizes, and those meant to afford prestige on the basis of past achievement, which I call awards. Prizes aim to get some specific thing done, whereas awards aim to affect the distribution of prestige, incentivizing achievement in a more indirect way. With a prize, money is fundamental. With an award, it is incidental. The Millennium Prizes are a prime example of the former, the Academy Awards of the latter.

This distinction is often muddled, leading honors to be less effective than they could be. I have to clarify what I mean by each term, because in practice they aren’t used in a reliable way. There are awards that are called prizes and prizes that are called awards. Despite its name, the Nobel Prize is a hybrid case that is more of an award. Though it comes with a financial reward, it is primarily about affording prestige, and this is what those who try to win it are after. The money is nice, but the glory is better.

It’s for this reason that I think awards are more effective than prizes in incentivizing the production of knowledge. Glory is a greater motivator than money. Furthermore, the money attached to prizes is often insufficient for justifying the investment of money, time, energy, social capital, and so on required to achieve the relevant goal.

A better use of prize money is to directly fund projects aimed at the desired achievement. The venture capitalists of Silicon Valley and grantmakers like the Mercatus Center’s Emergent Ventures program are good examples. Before any project begins, it’s possible to determine which individuals or teams have the best chance of success. Giving them the money beforehand solves the financing problem, and even if success won’t make them a fortune, the glory of the achievement (perhaps augmented by an award) should be incentive enough.

A prize also provides less return on its creator’s investment of social capital than an award. Once the goal is achieved and the prize won, there is no longer a reason for it to exist. It is self-abolishing. An award, on the other hand, can continue to be given out year after year, compounding the investment of prestige. Recognizing this fact, prize-giving organizations often convert their prizes into awards, contributing to confusion about the distinction.

The X Prize illustrates some of these flaws. Created by entrepreneur and space enthusiast Peter Diamandis in the 90’s, the prizes are meant to incentivize breakthroughs in solving the world’s biggest problems. Their website says, “Rather than throw money at a problem, we incentivize the solution and challenge the world to solve it.” Perhaps the most well-known past prize is the Ansari X Prize, which promised a $10 million reward for the creation of a reusable spacecraft. Many of the other X Prizes are also about breakthroughs in space technology.

And yet, the great advancements towards space exploration in the past twenty years have had little to do with the X Prize. $10 million is a paltry sum compared to the money required to finance serious efforts in the area, and even less compared to the rewards of success, as SpaceX and Blue Origin have demonstrated. It’s safe to say that an X Prize and $10 million play no part in Musk and Bezos’ motivations. Even the project that won the Ansari Prize had $100 million in financing. Either the prize money wasn’t much of an incentive, or the winning team was very confused.

If it’s not really incentivizing breakthroughs, then what is the real use of the X Prize money? It’s to garner publicity. The idea of monetary prizes excites our imagination and so lends them virality. For this purpose the X Prize money has worked. Its creators may understand this, and hope that the publicity brings attention to the relevant problems and so itself incentivizes breakthroughs, but the evidence doesn’t bear this out.

While publicity is good, it’s even better to be able to affect the distribution of prestige throughout society. The more closely social status corresponds to activity that’s ultimately beneficial for society, the more such activity is incentivized, much more strongly than by even a large financial reward. Wisely distributing status makes the difference between a world where most kids dream of becoming pop stars and one where they dream of taking us to space.


On pointless waiting

10 июня, 2019 - 11:58
Published on June 10, 2019 8:58 AM UTC

I’ve often noticed in myself a tendency, if I am not doing something immediately engrossing, to find myself waiting.

Waiting, waiting, waiting, not really being present, just willing time to pass.

But the weird thing is, frequently there isn’t anything in particular that I’m waiting for. Getting out of that situation, yes, but I don’t have anything in particular that I’d want to do when I do get out.

I have a suspicion that this might have to do with mental habits ingrained in school.

In elementary school, there’s no real goal for your studies. Mostly it’s just coming there, doing the things that teachers want you to do, until the day is over and you get to go.

In that environment, every minute that passes means winning. Every minute takes you a bit closer to being out of there. That’s the real goal: getting out so you can finally do something fun.

During a lesson you are waiting for recess, during recess you are waiting for the end of the day. Outside school you are waiting for the weekend, on the weekend you are waiting for the bliss of the long summer leave.

Waiting, waiting, waiting.

So you learn to pay attention to the time. Human minds are tuned to feedback, things that let them know how well they are doing. And since each passing minute takes you closer to the goal, the passing of time becomes its own reward.

Time having passed means that you have achieved something. Time having passed means that you can feel a tiny bit of satisfaction.

And then that habit, diligently trained for a decade, can carry over to the rest of your life. Even as an adult, you find yourself waiting, waiting, waiting.

You don’t know what it is that you are waiting for, because you are not really waiting for anything in particular. Even if it would actually be more pleasant to stay engaged with the present moment, you keep tracking the time. Because waiting feels like winning, and every passing minute feels like it takes you closer to your goal.

Even if you don’t actually know what your goal is. Even if reaching your goal will only give you a new situation where you can again wait, so that you are never actually present.

Still, you keep waiting, waiting, waiting.

(typical mind fallacy employed for the sake of artistic license; I am describing my own experience, without claiming this to be a universal one)


Logic, Buddhism, and the Dialetheia

10 июня, 2019 - 10:23
Published on June 10, 2019 4:27 AM UTC

Is it possible that some contradictions can be true? If so, how would that affect Bayesian Rationality as well as Theoretical Physics and Quantum Computing? This idea is called "Dialetheism", and it suggests that some contradictions (not all contradictions, just paradoxes, are True AND False).

It might sound like a laughable claim to even entertain the idea that some contradictions are operationally "BOTH true and false", but it could be extremely useful for quantum mechanics research, AI research, and ethics (not to mention it's almost essential to understanding eastern religions). Let's go over the historical context to see what this is all about.

INTRO TO WESTERN LOGIC (For those unfamiliar)

Aristotle was to try and categorize all these operations of the mind and derived 3 laws of logic that we can take absolutely for granted, laws that self-evidently apply without question. Aristotle called them “the 3 Laws of Thought”... well... technically he stole them from Plato:

"First, that nothing can become greater or less, either in number or magnitude, while remaining equal to itself ... Secondly, that without addition or subtraction there is no increase or diminution of anything, but only equality ... Thirdly, that what was not before cannot be afterwards, without becoming and having become”

1) The law of identity : P=P, also called the law of self-evidence, the idea that a thing is a thing. For example the sentence “the Universe is the Universe” is a self-evidently valid statement. 2) The law of excluded middle : P∨~P=T, the all encompassing idea that an option is an option. For example the sentence “Either the Universe exists OR the universe doesn’t exist” is a self-evident statement (show shakespeare’s to be or not to be). 3) The law of non-contradiction : Aristotle made a critical decision in the history of western civilization, he decided to add a 3rd law, P∧~P=F, the notion that we can’t have both a thing and not a thing. It’s the idea that contradictions, not just some, but ALL of them, are outright false. For example, if you were tryiong to answer the question "why is there something rather than nothing?", then the sentence “The universe exists AND the universe doesn’t exist” is a contradiction that should disqualify your argument right?

Naturally, you might think there’s nothing wrong with calling that sentence false and that’s what Aristotle thought too. Little did he know that not all contradictions are the same, because SOME contradictions can actually refer to themselves. Take the statement “Existence doesn’t exist”, or the statement “Non-existence exists”. Are those statements true or false?

The first to point this out was the philosopher Epimenides, a man from Crete who figured out he could say “all people from Crete are liars”, creating a self-referencing contradiction. These very special contradictions are what the greeks came to call “Paradoxes”, because not only can they be true or false, they can also be BOTH true and false or NEITHER true or false, what the philosopher Pyrrho called “the Tetralemma”. This is a BIG problem, because the 2nd law, the law of exclusion says all statements must be either true or false, but we also can’t label paradoxes as solely true or solely false because they are self referencing. In order to account for paradoxes, we’d first have to rewrite Aristotle’s 2nd law from this P∨~P=T to this P∨~P∨(P∨~P)=T and then rewrite the 3rd law from this P∧~P=F to either this P∧~P=F∨(T∧F) OR this P∧~P=F∨(~T∧~F).

What all this means is that there are now only two ways possible ways to answer what a paradox is.

OPTION A: The first way is to deem all contradictions false but a paradox as BOTH true AND false simultaneously, written like this P∧~P=F∨(T∧F).

OPTION B: The second way is we interpret all contradictions as false but a Paradox as NEITHER true nor false, written like this P∧~P=F∨(~T∧~F).

By pure circumstance, the philosophers of ancient Greece decided to interpret paradoxes using method 2, Neither True NOR False, which disqualifies paradoxes from ever being used in arguments again, permanently banishing them from our civilization, and thus, banishing them from your very language. Most people in the world grew up in a civilization grounded in Aristotelian philosophy, thinking there’s absolutely nothing wrong with ignoring paradoxes, but what if we only think that way due to a lifetime of social conditioning? If you ask anyone on the street if paradoxes counted as sensible statements they’d probably say no, and not surprisingly, so did Aristotle.

INTRO TO EASTERN LOGIC (Also for those unfamiliar)

While Aristotle was laying the groundwork for modern language, science, and law, there was a philosopher on the other side of the world who couldn’t disagree more, Siddartha Gautama, the Buddha. If a contradiction is NEITHER True or False, then we can rewrite it as NOT TRUE and NOT FALSE, which we can then rewrite as contradictions are FALSE and TRUE, but that’s actually the same thing as all contradictions are TRUE and FALSE, so could Aristotle have been wrong?

1) The Chatuskoti : In the Sutras, one of the Buddha’s students curious about the afterlife asks him:"Master Gotama, does master hold the view that after death, a Tathagata exists, where only one thing is true and anything else is false?". The Buddha responds:"Vaccha, I do not hold the view that a Tathagata exists after death. Nor do I hold the view that a Tathagata does exist. I also do not hold the view that a Tathagatha neither exists nor does not exist. Nor do I hold the view that a Tathagata exists and does not exist". In Buddhist logic this is called the “Catuṣkoṭi“ or "four corners", written like this P∨~P=F∨T∨(T∧F)∨(~T∧~F), which means it’s the paradox of all paradoxes from which all other paradoxes arise. A superposition of “Oness” from which we derive all other possible systems of logic. It holds that the first foundation of all reality is a divine contradiction called “Sunyata” or "emptiness", a kind of superposition-like state that is neither true, nor false, nor true and false, nor neither true OR false. 2) The Law of Contradiction : 600 years later, the philosopher Nagarjuna, founder of Mahayana Buddhism, worked out that you can divide that one divine superposition into further superpositions. Rather than interpreting a Paradox as neither true or false like Aristotle did P∧~P=F∨(~T∧~F), Nagarjuna actually choses the first option, to interpret paradoxes as BOTH true AND false P∧~P=F∨(T∧F), what we call “a Dialetheia”. Because if the ground of reality is itself a paradox as the Buddha says, then it should subdivide into further paradoxes rather than further negations. Najarjuna’s version of Aristotle’s 3rd law was called the “2 truths doctrine”, mandating that all metaphysical systems must account for the possibility of Dialethias, what the Zen Buddhists came to call “Koans”.

The idea that a thing could both exist AND not exist sounds absolutely absurd to us today, but for the Buddhists it was just life as usual. The universe is both something and nothing, and for Nagarjuna, the idea that the universe caused itself is a perfectly valid statement, since he never posited any ironclad ban on contradictions like Aristotle did. Right now, you might be really tempted to ignore the 2 truths doctrine as mystical new age rambling, but when it comes to computers, the laws of physics, and the very language we speak, the validity of self-referential paradoxes couldn’t be a more serious matter. Unfortunately, after the Empire of the Buddhist King Ashoka fell, Buddhist libraries, abhidarma schools, and temples in India were burned and monks were slaughtered.

Nagarjuna faded into obscurity while Aristotle’s 3 laws spread around the world through European empires, forming the next 2000 years of global civilization. Zen Buddhism and Daoism were the only major philosophies in human history that ever permitted contradictions, but it was Aristotle who shaping the world’s universities, legal customs, and social institutions, all dictating what kinds of thoughts our minds can and can’t think.

Even in the west, there were only ever 5 major western thinkers to base their entire philosophy on dialetheic logic, the continental philosophers Georg Hegel, Friedrich Nietzsche, Martin Heideggar, Gilles Deleuze, and the ancient presocratic, Heraclitus. All of them were also largely ignored by mainstream science and mainstream religion, never never stopping to question whether or not our reality could secretly be a wondrous world of paraconsistent simultaneity. A higher plane of contemplation where our ethics, metaphysics, and overall understanding of reality could all be different. As Ludwig Wittgenstein once said, “the limits of our language mean the limits of our world”, and unfortunately, our logic dictates our language.

DIALETHEISM (The Real Argument Starts Here)

Up until the 21st century, Aristotle’s 3rd law went on ignored until one man, decided to bring the question of the Paradox back from the dead. Dr. Graham Priest is a distinguished professor of analytic philosophy at the city university of New York and he has spent his entire career working on one phrase, a pesky statement called “this sentence is false”. This is the well known “Liar’s Paradox”, the statement that everything being said is a lie. So if the liar is indeed lying, then the liar is telling the truth, which means the liar just lied, which means.. they’re telling the.. truth. 

The liar's paradox used to be nothing more than a party trick, until the 20th century, where we needed to take it seriously for us to ground mathematics and construct quantum computers and more advanced machine learning systems.


The Principle of Bivalence is the idea that a thing can't have 2 truth values, but is it legit?

a) The Paradox: We can write out the paradox as a syllogism. "This sentence is false is true", "This sentence is false is false", "therefore this sentence is false is BOTH True and False" (rather than NEITHER true or false), thereby violating aristotle’s 3rd law:1) P∧~P→T 2) P∧~P→F C: P∧~P→T∧F. b) The Rebuttal: Interestingly enough, the Liar’s paradox supports Nagajuna’s interpretation, meaning Aristotle’s law of contradiction could be changed. However, this usually hand waved away with common rebuttal to the Liar’s paradox. If the liar’s paradox is both true and false then it’s not true. If the liar’s paradox is both true and false then it is not false, therefore therefore the liar’s paradox is actually NEITHER true nor false like Aristotle said:1. (P∧~P=T)→~F. 2. (P∧~P=F)→~T. C: ∴(P∧~P=F)→~F∧~T. c) The Rebuttal to the Rebuttal: At first glance, the rebuttal seems to have debunked the Liar’s paradox, but if we write out the logic we will see that all this rebuttal did was try to distract us from the actual problem. If we assume the conclusion of that rebuttal, where the liar’s paradox is neither not true or not false, then as the second premise we can point out that the phrase “Not true AND Not false” is just the same thing as “False AND True”, meaning we have proven that Aristotle was wrong and there’s no such thing as a statement that’s neither true nor false, leaving the only remaining interpretation of the Liar’s Paradox as true AND false:1. (P∧~P=F)→~F∧~T, 2. ~F∧~T→T∧F , C: ∴(P∧~P=F)→T∧F.


We could always just classify the liar's paradox as a so-called "truth value gap", meaning not only is it neither true or false, paradoxes aren't even deserving of a truth value.

Let’s give an example of a sentance without a truth value. “What’s your favorite color?” That’s a question, so it's an example of a TVG (truth value gap). But what about a sentence like “Existence doesn't exist”?

a) the paradox : One might say a paradox like that has zero truth value because we don’t respond to it with "that's true" or "that’s false". But let's try a special statement, “the present king of france is bald”. b) the rebuttal : It makes an assumption that there is a present king of france, so it’s neither true nor false, but it MUST have a truth value because it is certainly in the category of "statement". I suppose we can infer that a Dialetheia works only in-so-far as it’s talking about things that actually exist (there is no such thing as a "king of france" today). c) the rebuttal to the rebuttal : The statement “unicorns are white and not white” is not a Dialethia, it's quite fairly a TVG. But for something "present" that IS legitimately being talked about, a paradox is not a TVG, it's perfectly grammatical, doesn't commit category mistakes, and it doesn't suffer from failure of reference. Meaning things present at hand (like problems of quantum mechanics) do demand a truth value.


The issue of bivalence is solved, but if so, would a Dialethia mean that all of reality is subjective and a matter of opinion?

Well, no, it’s just saying SOME parts of objective reality are structured through paradoxes.

This kind of attack on Dialetheism is called "The Principle of Inference", but Logicians just call it "Explosion”, because what it suggests is that if we break the law of contradiction then people can just make any argument they want.

The principle of explosion has been around since the middle ages and it’s the very reason that Aristotle’s law of contradiction never gets questioned (because if we violate the law of contradiction, then people can pretty much say anything whatever, thereby making all of logic pointless). This is why it's strongly recommended that we must ABSOLUTELY NEVER break Aristotle's law.

The principle of explosion can be written as p and not p imply q ( ~P∧P→Q ) where p and not p is any given contradiction and q is whatever conclusion you feel like proving.

a) The Paradox : For example, take this ridiculous argument where we treat a contradiction as true: assume a contradiction like the universe does exist and doesn’t exist, an idea some eastern philosophers actually accept, premise two, either the universe doesn’t exist or unicorns exist, seems fair so far. But then we see the conclusion, if the universe does exists, which it does, then unicorns exist too.1) ~P∧P=T2) ~P∨Q=TC) ∴P→Q=T b) The Rebuttal : In fact, not just unicorns, replace Q with whatever you want and it will be true. Considering contradictions true is a nightmare because you make literally any argument and have it be valid. The principle of explosion concludes that only way to avoid ridiculous arguments like that is to declare the first premise of the argument, the contradiction, as false, thus making unicorns and all other fantasies an unsound argument:1) ~P∧P=F. 2) ~P∨Q=TC) ∴P→Q=F. c) The Rebuttal to the Rebuttal : It seems like a rock solid rebuttal, however explosion misses one key detail, Dialetheism never made the assumption that EVERY contradiction is necessarily true. Considering a contradiction to be BOTH true and false is not the same thing as considering it true. Dialetheism says that if we reject most contradictions BUT accept the existence of self-referencing contradictions, AKA paradoxes, then we can violate Aristotle’s 3rd law without permitting ridiculous arguments that can claim whatever they want. Let’s look at that unicorn argument again, but instead of just having true and false, this time let’s allow 3 possible truth values, True, False, or Dialethia:Premise one, "the universe does exist and doesn’t exist", instead of making this true or false we’ll make it a dialethia, as some eastern thinkers have posited. Premise two, "either the universe doesn’t exist or unicorns exist", the same premise as last time. Surprisingly, we see that using the Dialetheia still makes the unicorn argument false. Conclusion, if the universe exists then we still can’t infer that unicorns exist since that conclusion no longer follows from premise 2: 1) ~P∧P=T∧F. 2) ~P∨Q=T, C) ∴P→Q=F.

Overall, we’ve shown that you can still break the law of contradiction without being being allowed to say just anything, thereby deconstructing the principle of Explosion and challenging Aristotle’s 3rd law. In the 21st century, any system of logic that rejects explosion and considers paradoxes valid is what we call a "Paraconsistent logic”, while any logic that keeps the principle is called a “Classical Logic”.

Now note one key detail, I’m not saying Classical Logics should be done away with. Over 99% of scientists still use classical logic, and you know what, that’s perfectly okay, because they never have to deal with paradoxes. However, that last 1% of of scientists, like quantum physicists, have to deal with paradoxes all the time, so we can’t just force them to use classical logic too, they need a more accurate set of axioms. Today, there’s a desperate need to create computers, perhaps quantum computers, that can violate the law of non-contradiction so physicists can finally solve their problems. If we put Paraconsistent logics into computers it means they’re going to start collecting a lot more information and drawing a lot more conclusions, since there's now more than two possible truth values: "true", "false".. and "dialetheia".

In CLASSICAL LOGICS we only have "1" and "0"

In PARACONSISTENT LOGICS we now have "1" and "0" and "#". I imagine this is something that could come in handy for quantum computers.

Important Note: I’m not saying Aristotle was stupid for inventing the Law of Non-Contradiction, I’m just pointing out that paradoxes are exceptions to the rule. It’s the same principle behind Newton’s classical dynamics being not entirely accurate and getting replaced with Einstein’s General Relativity, it depends on what you’re using it for. 99% of scientists would get by just fine using Newton’s laws of mechanics, but physicists doing Black Hole research would need to use General Relativity to get a more accurate answer. This is the same reasoning for why we should expand Aristotle’s classical law of contradiction P∧~P=F into the paraconsistent law of contradiction P∧~P=F∨(T∧F), allowing us to discover the deeper philosophical truths of the universe.


Now allow me to stop attacking Pseudo-strawmen and get to the REAL rebuttals posed by actual philosophers. The ultimate attack on the Liar's Paradox comes from the logician Saul Kripke.

a) The Paradox : Kripke says the Liar's Paradox isn't grounded and is just a viscious cycle of adding fake truth-predicates, thus making it a valueless statement. b) The Rebuttal : Kripke's clarification of the paradox is called "groundedness", where we remove all "falsity predicates" from the statement, we're left with the root of the statement “this sentence”. In other words, if we use the symbol P to represent "this sentence" and the words "is false" represented as the falsity predicate on P (~), then we’ll see that just as "questions" lack truth value, the statement “this sentence” ALSO has no truth value. c) The Rebuttal to the Rebuttal : This brings us to the “Proof of Revenge”, which Dr. Priest evokes as a rebuttal to Kripke using what’s called “the Strengthened Liar's Paradox”, where we change the syntax of “this sentence is false” into “this sentence is not true” or even better yet “this sentence is either not true or valueless”.

Now when you try Kripke’s rebuttal it no longer works. If Kripke, like before, points out that the statement is valueless, then it's not true, and if we admit it's not true then the statement "this sentence is either not true or valueless" is true, again giving us a Dialethia where both trueness and falseness are simultaneously valid.

Kripke’s method only works on phrases like “this sentence is true” because it has a redundancy of it’s own truth value. However, Kripke's groundedness DOES NOT work on “this sentence is false” which refers to the existence of itself then overturns it’s own truth value.

“This sentence is true” undetermines it’s own truth value while "this sentence is false" overdetermines it, which is to say it creates a truth value within a truth value.


Philosopher Arthur Prior had his own response to Paradoxes, which was essentially, "so what?". SO WHAT if the phrase “this sentence” refers to it’s own existence? If that’s the case, then don’t ALL sentences refer to their own existence? If all sentences implicitly refer to their own truth, then "the proof of Revenge" is redundant.

a) The paradox : If I had a sentence that simply contained the word “False”, then should we say the very existence of the word false is ALSO a paradox? Obviously not, beacause the sentences “the sky is blue” and “it’s true that the sky is blue” are actually the exact same sentence in terms of their truth value.b) The rebuttal : In the same sense, the statements “this sentence is false” and “it’s true that this sentences false” are ALSO the same sentence, so the liars paradox is tricking us into seeing a predicate that isn’t really there. c) The Rebuttal to the Rebuttal : But there’s one intersting thing about Prior's rebuttal. Yes the statement “this sentence is false” can translate to “it’s true that this sentence is false”, however using Dr. Prior’s exact same logic, THAT sentence itself is also identical to the sentence “it’s true this sentence is true and this sentence is false”, which is, you guessed it, a Dialetheia.

However, Prior took this into account and pointed out one more interesting thing, we would be using a Dialethia to prove the existence of a Dialetheia, which in it’s own metalinguistic way, is a circular argument. We haven’t derived a dialethia from the argument, we’ve just asserted it, which makes it a contradiction.

Unfortunately for him, Prior is also using a circular argument, using the law of contradiction to prove that the law of contradiction is true. At first it appears we have a clash between 2 circular arguments with both sides begging the question, but there is a way out, Occam’s razor. What’s interesting is that we make fewer assumptions about logic to get Dialetheia than we do to get Dr.Prior’s answer with regard to what we call “Contingent Facts”.

Dr.Prior’s argument uses the context of a sentence to make his argument, requiring him to multiply beyond necessity. Meanwhile Dialetheia is self-evidently derived from the sentence itself and needs no comparison with other sentences in order to make its point. Lastly, Dialetheia is actually NOT a circular argument because it only makes 2 assumptions, the law of identity and the law of exclusion, while Prior has made 3 assumptions, the previous 2 plus the law of contradiction.


Mathematician and Philosopher Alfred Tarski launched one final attack on the liar’s paradox, claiming that it's just a problem of language. It’s a similar response to what we’d get from the Postmodernists, because perhaps this whole idea of the Liar’s Paradox isn’t a self-ecvident truth and it might just be mental masterbation for one simple reason. If we can find a language where the paradox doesn't exist, then Dialetheism is not self-evident, it’s just a social construct. To do this Tarski draws a distinction between quote "semantically closed languages" like english verses what are called "semantically open languages". While every human language on earth is a semantically closed language where you can use the language to talk about the language, the liar's paradox CAN'T be expressed in a semantically closed languages. A semantically closed language has 2 elements, one, that can refer to it's own expressions and two, that it contains the predicates true or false for semantic closure.

However, Tarski says we can create synthetic machine languages where self-referential sentences are blocked or artificial languages that don't use true or false as predicates. These are called "Semantically Open languages", and they tend to be useless to humans but nevertheless ARE possible. We can create what's called "an object language" that is structure such that it can't possibly talk about itself in terms of truthood or falsity. Without a language that can talk about itself, the liars paradox is just a social construct, a problem of human language and has no basis in logic. In fact, Tarski has even suggested getting rid of all human languages and creating a new artificial semantically-open language that all humans on earth could speak and this language would not have the same problems of confusion we do. In fact, we'd be able to do philosophy and science without misinterpreting each other or be able to purposely mislead each other. A language where it's impossible to lie. But Tarski’s Solution has one small problem.

While we can build a Semantically open language that can’t talk about itself, it’s not possible to construct one that can’t talk about language in general. For example, what’s to stop it from evolving a way to talk about other languages? Linguistic philosophers call the idea of a language that can talk about other languages “a Metalanguage” while the language being talked ABOUT is called “An Object Language”. The problem here is that there’s nothing stopping an object language from making predicate statements about the metalanguage, for example “this statement’s metalanguage is false”.

Essentially, Tarski can’t give us a bulletproof way to keep Dialeitheism out of our discourse and has failed to banish it to the realm of linguistic constructs.

No matter what we do, the dreaded liars paradox will keep returning to logic no matter what we do.

So clearly there’s only one thing left to do, instead of desperately trying to ignore the existence of Dialetheia, why don’t we just embrace them?

Why don’t we find a way our languages, mathematics, and laws of physics can work WITH IT rather than Against it?

CONCLUSION (What does Buddhist Dialetheism mean for philosophy?)

If we can accept Nagarjuna's notion that "SOME contradictions are true" then this changes everything... metaphysics, epistemics, ethics, politics, and Bayesian rationality. If so, we might have to rewrite some ingrained social, economic, and scientific laws to adjust for dialetheia, but I'll leave that to the pragmatists (I just want to get Dialetheist theory correct).

This also has serious implications for what artificial intelligence really is. Is machine rationality only subordinate to human rationality because the human brain can operate on paraconsistent logics (Roger Penrose's "Quantum Coherence Theory") while machine brains can't, then I believe we might have a way to make machines think more like humans. Perhaps Buddhist logics could be the key to making a more "friendly" AI (if it's able to learn our valiues, which themselves contain many dialetheia, we could reduce existential risk).

It often makes me think of my favorite video game "Portal 2", and how in the end of the game, the rogue archailect GLADOS, is finally defeated by a dialetheia (we ask her to paraconsistently calculate "this sentence is false" and she self-destructs). Perhaps if GLaDOS ran on a paraconsistent logic she might not have been so horrible in the first place and may have understood the more nuanced aspects of reality (but that's just speculation on my part).

So if our AI-god were to have a religion at all, I'd prefer it be Buddhism, so it might-then be able to work out how humans understand suffering and compassion.

Please tear this post to shreds, I want to make sure I didn't forget to address any arguments against the dialetheia. I'm also curious about your thought son whether Buddhist Logics are wirth pursuing. Every small critique helps, thanks again LW.

PS: Shameless Plug for my Transhumanist Youtube Channel (if anyone's interested) : https://www.youtube.com/channel/UCAvRKtQNLKkAX0pOKUTOuzw/videos


Knights and Knaves

10 июня, 2019 - 08:26
Published on June 10, 2019 1:51 AM UTC

In the Knights and Knaves riddle you are facing a fork in the road, with one way leading to freedom and the other to death. There are two persons, a knight and a knave. The former always tells the truth while the latter always lies. You got to ask one yes/no question to find your way into freedom.

One solution is to use truth tables. For example in that the statements of both persons are concatenated together. According to the AND table it does not matter in which order true and false are combinated, the result is false. So if your question goes like »What would the other person say, if I'd ask him if this way leads to freedom?«, you always get a falsified answer and are able to identify the way into freedom.

A general assumption for this riddle is that both persons know the truth about whereto the ways lead. That introduces another approach, in that the knave must diversify between inner and outer opinion. To be able to always lie outwardly, he has to know the truth for himself, so his inner opinion is the truth. To take advantage of that, one could ask »Would you say for yourself, that this path leads to freedom?«. This provokes a contradiction in the knave's answer and can therefore be spotted.

Finally a similiar approach that uses the inner opinion is possible too. If both know of the truth, but are still acting differently, this must be on purpose. So in other words, one wants to harm you and the other not. A simpler question would therefore be »Do you want me to go this way?«. The good guy, you can take at his word, because he has your best interests in mind. The bad guy on the other hand would like to send you to death, but since he's forced to lie, you can take him at his word too.


Dissolving the zombie argument

10 июня, 2019 - 07:54
Published on June 10, 2019 4:54 AM UTC

The Zombie argument (David Chalmer's website, Stanford Encyclopedia of Philosophy) is one of the most famous arguments against materialism, so I'll assume that you can find an explanation yourself if you aren't already familiar with it.

I always find it fascinating when you have two sides that can't seem to communicate with or understand one another. I think the root of the problem is that both sides have a different notion of what counts as a zombie. The Dualist Conception of consciousness involves qualia, so their conception of a philosophical zombie is an entity that lacks qualia. This is a notoriously hard term to define - some would say because it is meaningless - but all that matters here is that they have a stricter conception of consciousness that the Materialist. The Materialist Conception of consciousness involves certain processes taking place, so a Materialist Conception of a zombie would involve certain processes taking place, but also not taking place, which would be a contradiction.

Here's the confusion. If a someone were to claim that humans don't fit the Dualist Conception of a zombie and that Materialism is true, they'd be contradicting themselves, because Dualists have a wide conception of what counts as a zombie that all entities in a Materialist world would fit this definition. On the other hand, if someone were to claim that that Materialist Conception of a zombie were logically possible, which is merely to claim that they can posit this without contradiction, they would be mistaken since Materialist's have a narrow conception of what would count as a zombie.

Once the definition of what counts as a zombie has been fixed, so too has the outcome of the argument. And this is really contingent on what counts as consciousness, so the Zombie argument isn't actually where the fundamental difference lies. This isn't a mere linguistic difference, it's a question of what natural structures exist that cry out to be given a label. Or as Richard Kennaway might frame it, an attempt to understand the nature of a phenomenon which we already have some experience with, without foreclosing the possibility that we might end up tossing away the concept if we find it confused.


Ramifications of limited positive value, unlimited negative value?

10 июня, 2019 - 02:17
Published on June 9, 2019 11:17 PM UTC

This assumes you've read some stuff on acausal trade, and various philosophical stuff on what is valuable from the sequences and elsewhere. If this post seems confusing it's probably not asking for your help at this moment.

Also, a bit rambly. Sorry.

Recently, I had a realization that my intuition says something like:

  • positive experiences can only add up to some finite[1] amount, with diminishing returns
  • negative experiences get added up linearly

[1] or, positive experiences might be infinite, but a smaller infinity than the negative ones?

Basically, when I ask myself:

Once we've done literally all the things – there are as many humans or human like things that could possibly exist, having all the experiences they could possibly have...

...and we've created all the mind-designs that seem possibly cogent and good, that can have positive, non-human-like experiences...

...and we've created all the non-sentient universes that seem plausibly good from some sort of weird aesthetic artistic standpoint, i.e. maybe there's a universe of elegant beautiful math forms where nobody gets to directly experience it but it's sort of beautiful that it exists in an abstract way...

...and then maybe we've duplicated each of these a couple times (or a couple million times, just to be sure)...

...I feel like that's it. We won. You can't get a higher score than that.

By contrast, if there is one person out there experiencing suffering, that is sad. And if there are two it's twice as sad, even if they have identical experiences. And if there are 1,000,000,000,000,000 it's 1,000,000,000,000,000x as sad, even if they're all identical.

Querying myself

This comes from asking myself: "do I want to have all the possible good experiences I could have?" I think the answer is probably yes. And when I ask "do I want to have all the possible good experiences that are somewhat contradictory, such that I'd need to clone myself and experience them separately" the answer is still probably yes.

And when I ask "once I have all that, would it be useful to duplicate myself?" And... I'm not sure. Maybe? I'm not very excited about it. Seems like maybe nice to do, just in as a hedge against weird philosophical confusion. But when I imagine doing that the millionth time, I don't think I've gotten anything extra.

But when I imagine the millionth copy of Raemon-experiencing-hell, it still seems pretty bad.

Clarification on humancentricness

Unlike some other LessWrong folk, I'm only medium enthusiastic about the singularity, and not all that enthusiastic about exponential growth. I care about things that human-Ray cares about. I care about Weird Future Ray's preferences in roughly the same way I care about other people's preferences, and other Weird Future People's preferences. (Which is a fair bit, but more as a "it seems nice to help them out if I have the resources, and in particular if they are suffering.")

Counterargument – Measure/Magical Reality Fluid

The main counterargument is that maybe you need to dedicate all of the multiverse to positive experiences to give the positive experiences more Magical Reality Fluid (i.e. something like "more chance at existing", but try not to trick yourself into thinking you understand that concept if you don't).

I sort of might begrudgingly accept this, but this feels something like "the values of weird future Being That Shares a Causal Link With Me", rather than "my values."

Why is this relevant?

If there's a finite number of good experiences to have, then it's an empirical question of "how much computation or other resources does it take to cause them?"

I'd... feel somewhat (although not majorly) surprised, if it turned out that you needed more than our light cone's worth of resources to do that.

But then there's the question of acausal trade, or trying to communicate with simulators, or "being the sort of people such that whether we're in a simulation or not, we adopt policies such that alternate versions of us with the same policies who are getting simulated are getting a good outcome."

And... that *only* seems relevant to my values if either this universe isn't big enough to satisfy my human-values, or my human values care about things outside of this universe.

And basically, it seems to me the only reason I care about other universes is that I think Hell Exists Out There Somewhere and Must Be Destroyed.

(Where "hell" is most likely to exist in the form AIs running incidental thought experiments, committing mind-crime in the process).

I expect to change my mind on this a bunch, and I don't think it's necessary (or even positive EV) for me to try to come to a firm opinion on this sort of thing before the singularity.

But it seems potentially important to have *meta* policies such that someone simulating me can easily tell (at lower resolutions of simulation) whether I'm the sort of agent who'd unfold into an agent-with-good-policies if they gave me more compute.

tl;dr – what are the implications of the outlook listed above? What ramifications might I not be considering?


A Plausible Entropic Decision Procedure for Many Worlds Living, Round 2

9 июня, 2019 - 23:55
Published on June 9, 2019 7:48 PM UTC

Hey LessWrong! I posted about a month ago about a decision procedure that I think could be optimal in a universe where the Many Worlds Interpretation is true. The post was downvoted to zero and at least several people thought there were problems with it. However, none of the comments swayed me to believe that my idea has been falsified yet, so I clarified my idea, rewrote a post about it, and am interested again in your feedback. It can be found here: https://www.evanward.org/an-entropic-decision-procedure-for-many-worlds-living/

Thank you!


On why mathematics appear to be non-cosmic

9 июня, 2019 - 23:42
Published on June 9, 2019 8:42 PM UTC


I do fear that perhaps this post of mine (my fourth here) may cause a few negative reactions. I do try to approach this from a philosophical viewpoint, as befits my studies. It goes without saying that I may be wrong, and would very much like to read your views and even more so any reasons that my own position may be identified as untenable. I can only assure you that to me it currently seems that mathematics are not cosmic but anthropic.


There are so many quotes about mathematics, from celebrated mathematicians, philosophers, even artists; some are witty yet too polemical to identify as useful in a treatise that aspires to discuss whether math is merely anthropic or cosmic, and others are perhaps too focused on the order itself and thus come across a bit like the expected fawning of an admirer to his or her muse.

Yet the question regarding math being only a human concept, or something which is actually cosmic, is an important one, and it does deserve honest examination. I will try to present a few of my own thoughts on this subject, hoping that they may be of use – even if their use is simply to allow for fruitful reflection and possible dismissal.

It is evident that mathematics have value. It is also evident that they allow for technological development. They do serve as a foundation for scientific orders that rest on experiment and thus are invaluable. However we should also consider what the primary difference between math as an order and scientific orders (physics, chemistry etc) easily let’s us know about math itself:

Primarily math differs from science in that it is tracing its proof not from experiment, data and observation, but proof. The use of proof in math is often attributed to the first Greek mathematicians, and specifically to either the first Philosopher, Thales of Miletus, or his students, Anaximander and Pythagoras. Euclid argued that the first Theorem that math presents is the one by Thales, which has to do with analogies between parts of 2D forms (eg triangles) inscribed in a circle. The idea of a proof proceeding from axioms, of a Theorem, is fundamental in mathematics – and it also is a crucial difference between math and orders such as physics. Fields of science that have to do with observing (and interacting with) the external world do significantly differ from a field (math) which only requires reflecting on axiomatic systems.

Given the above is true, it does follow that a human is far more connected to math than to any study of external objects: they are tied to math without even trying to be tied to it, given math exists as a mental creation and not one which requires the senses to intervene.  

But what does “being more connected” mean, in this context? Is math actually intertwined with human thought of all kinds? Obviously we do not innately know about basic “realities” of the external world, such as weight and impact; the risk of a free-fall is something that an infant has to first accept as a reality without grasping why it is so. On the contrary we do, by necessity, already have fundamental awareness of the (arguably) most basic notion in all of mathematics: the notion of the monad.

The monad is the idea of “one”. That anything distinct is a “one”, regardless of whether we mean to include it in a larger group or divide it to constituent parts: each of those larger groups are also “one”, and the same is true for any divisions. “Oneness”, therefore, as the pre-socratics already argued (and Plato examined in hundreds of pages) is arguably one of the most characteristic human notions, and a notion which is generally inescapable and ubiquitous. “One” is also the first digit and  the meter of the set of natural numbers (1,2,3,4…), and this is because the human mind fundamentally identifies differences as distinct, even when the difference may become (in advanced math) extremely complicated and of peculiar types. Yet the humble set of natural numbers also gives us an interesting sequence when altered a bit: the so-called Fibonacci sequence, which I think is a good example to use so as to show why I think that math are only human and not cosmic.

The Fibonacci sequence progresses in a very specific way: each part is formed by adding the two previous parts. The sequence begins with 1 (or 0 and 1), so the first parts of it are (0), 1, 1, 2, 3, 5, 8,13. The entire sequence diverges from both sides (alternating between the next part presenting a numerical difference just smaller or just larger) to the golden ratio, and forms a pretty spiral form (wiki image: https://en.wikipedia.org/wiki/Fibonacci_number#/media/File:FibonacciSpiral.svg). Yet for me it is of more interest that humans do happen to observe a good approximation of this specific, mathematical spiral, on some external objects; namely the shells of a few small animals.

It is pretty clear that the shell of some external being is not itself aware of mathematics. One could argue, of course, that “nature” itself is filled with mathematics, and thus in some way a few external forms happen to approximate a specific spiral, and the tie to the golden ratio etc is only to be expected given nature (and by extension, perhaps, the Cosmos itself) is mathematical. Certainly this can appear to provide an answer; or to be precise it would at least present a cause for this appearance of mathematics and of a specific spiral in the external world. Is it really a good answer, though? In other words, do we observe the Fibonacci or golden ratio spiral approximation on the external world because the external world itself is tied to math, or do we do so because we are tied to math in an even deeper way than we realize and could only project what we have inside of our mental world onto anything external?

My view is that humans are so bound to math (regardless of how knowledgeable one is in mathematics) that we cannot but view the world mathematically. Rockets are built, using math, and by them we can even leave the orbit of our planet – yet consider whether what allowed us to realize how to achieve so impressive a result was not math alone, but math as a kind of very anthropic cane or leg by which we slowly learned to move about:

In essence I do think that due to the human species being so obstructed from developing far more advanced mathematics (to put it another way: due to how difficult advancing math can be even for the best mathematicians) we tend to not identify that math itself is not the cause of development, not the cause of movement and progression, but a leg - the only leg - we have to familiarize ourselves with because we aspire to move on this plane. Imagine a dog which wanted to move from A to B, but couldn’t use its legs. At some point it manages to move one of them, and then enough so as to finally get to B. It is undoubtedly a major achievement for the dog. But the dog shouldn’t proceed to claim that the dirt between A and B is made of moving legs – let alone that it is the case for the entire Cosmos.

I only meant to briefly present my thoughts on this subject, and wish to specify (what very likely is already clear to more mathematically-oriented readers of this post) that my personal knowledge of mathematics is quite basic. I approach the subject from a philosophical and epistemological viewpoint, which is more fitting to my own University studies (Philosophy).

by Kyriakos Chalkopoulos (https://www.patreon.com/Kyriakos)


An attempt to list out my core values and virtues

9 июня, 2019 - 23:02
Published on June 9, 2019 8:02 PM UTC

Last week a friend pressed upon me the importance of writing your own culture. It was a small part of a multi-hour conversation, and I'm not sure if I'm interpreted their meaning correctly, but the correct-seeming-to-me position I took from it was something like this:

Write your own culture. Identify your values the things you consider to be virtues. Not those of the broader culture you exist in or those putatively held by the groups in which you have an identity. Those which are yours, for you, separate from what others might value or consider virtuous.

Perhaps for convenience, to date, I would round off my values/virtues to being Rationalist and EA values. Succinct, perhaps easier to communicate or even convenient as an internal mental handle. But with less personal ownership, or something. Like perhaps they're only "my values" because I'm part of those groups. And that's not true. While the groups might have helped me flesh out and identify my values, they are my values. Also when held this way there is far less nuance to them.

I already have them floating around in my head with my own particular characterization. Yet they float around individually, not as a coherent list. So here goes. Here's a first attempt to capture my values and virtues.

(What's the difference between a value and virtue? I'm not sure exactly, but my brain is labeling some items as more one than the other. Maybe values are things I optimize for and virtues are behaviors I endorse.)

(Also, this isn't an exhaustive list of absolutely everything I'd say I care about or think is good. These are top high-level virtues which subsume all the other things for me. I value cake, but cake arises as something I value further down the chain than anything list here.)

My Values/Virtues

(very loose/hazy ordering of priority)

  • Curiosity/ Wanting to know and understand the world.
  • Truth
    • Over what is comfortable or "instrumentally" advantageous.
      • It feels that I would choose truth even if it would destroy me. Though I wouldn't choose it if it would destroy Miranda (or the world) . . . so I guess there are limits.
    • An abhorrence of rationalization.
    • A revulsion of arguments (or even countenancing the possibility) that truth should be sacrificed for some other gain. I'm not saying that I never would, but I find the suggestion viscerally emotionally upsetting and offensive (perhaps irrationally and dogmatically so).
    • Related to truth, it feels I have an utmost virtue of accepting arguments and reasoning that seem correct and to have lead to true conclusions. Combined with Integrity, that means I will act on arguments even if they lead to unconventional places.
    • Eliezer's 12 Virtues of [Epistemic] Rationality are my virtues because they fall under truth. (The virtue of scholarship also falls under Curiosity.)
    • Downstream of overall Truth and Curiosity is the desire for self-knowledge.
  • A Sense that More is Possible / A Will to Transcendence
    • This feels very core to who I am / want to be. It's a value I pride myself on (my furtive attempt at a personal blog had this title).
    • It's something like, I believe there exist dimensions along which the world can be better or worse, and it is good to make it better. Surely if it is possible, you make it better? I feel like I somewhat lack further justification for this feeling, but that's the feeling I have. We should make things as good as they can be. (Perfect the universe I say, as the goal to aim for even if it's not a realistic/meaningful target).
    • I have uncertainty about what better and more mean exactly. I have guesses and strong feeling (like suffering is bad, knowledge is good), but I place extremely extremely low probability on all states of the world being equally good. For now, push in the obvious directions [1].
    • I do have a very strong sense that the world is a hellova lot better than it used to be. So much change in, in so little time. Much of the time I feel baffled that people don't look at the last few hundred years (or even their own lifetimes), look at the progress of technology, and aren't resultantly clamoring in the streets for why don't we speed up the goddamn progress so we can get to the goddamn utopian future which is just a super reasonable extrapolation of what is possible given our knowledge of the laws of physics and recent history.
    • I am part of the world - and importantly, I am the part of the world I affect the world through - so my own self-improvement is especially important.
      • And I have a solid sense of the many, many ways I could be better.
    • This value/virtue is powerful. It motivates me. It's also dangerous in that it pushes me towards dissatisfaction, always looking at what could be but isn't. I struggle with this. I've been an advocate of Acceptance Commitment Therapy (an Eastern-influenced psychotherapy) for seven years since it helps with this kind of dilemma. I'm still working to enjoy the good things that already are while still striving hard for all that could be.
  • There are better and worse ways for the world to be.
    • This is already subsumed in the Sense that More is Possible but somewhat feels fundamental enough to have it's own high-level bullet point.
  • Optimizing the whole / Long-termism / Making local sacrifices/ Foregoing marshmallows
    • Part of a Sense that More is Possible is wanting to optimize everything across all of time and space. And global optimization often requires local sacrifice. That's just like super basic.
    • I cultivate this as a core virtue. Always be aware if you are sacrificing the whole for the sake of a part. When there is possibility for so much, don't get short-sighted.
    • The result is I'm willing to work on very long and slow feedback loops, generally delaying gratification for some time.
      • Recently I fear I've been doing this too much (i.e. in a way that does not optimize the whole) and am dialing back a little. Maybe it's that being a multiagent agent is hard.
    • I think this value/virtue/attitude is why I've always disliked the question of what would a perfect day look like for you?" Perfection isn't a concept I can apply to days in isolation.
  • Quantitative-Sensitivity
    • One person suffering is bad. Two people suffering is worse. This seems kind of basically true even now I'm not sure how to argue for it. (I dislike that I don't have a better basis, but maybe I can work on that.)
    • Accepting that quantities matter, I follow that through. I shut up and multiply.
    • Shut up and multiplying is a core virtue.
  • Empathy / Caring for Sentient Beings
    • It seems I just do. [Many times,] if I see someone suffering before me, I feel it, and I don't like it. That is not how I want the world to be. I will invest my effort to change. Not just that, I want the flourishing of sentient beings. Being quantitatively-sensitive and trying to optimize the whole, I try to scale up. This feels like the obvious thing to do.
    • A virtue here is that you don't purchase your own benefit at the expense of others.
    • This is related to cooperation as well, but I take Consideration to be an important virtue. You're supposed to think about the effects of your actions on others. All things equal, you don't put your own utility in front of there's (if anything, apply a heavy discount around your own to counter self-serving biases). Most practically, I avoid being late and really, really hate canceling/rescheduling on people.
  • Cooperation
    • This just feels like so obviously the correct thing to do that you just do it. Obviously, we all gain more from cooperation, especially if we have shared values. If not for cooperation, we'd still be single-celled organisms, if that.
    • It's hard, it takes effort, but obviously obviously you push to make it happen.
    • You try to have all the virtues that make it possible. You are honest (or at least meta-honest), you're reliable, predictable, you do what you say will, and act in accordance with your stated beliefs.
    • I really, really prefer to be honest. I lump this under cooperation, but I think it's also part of connection, (and also deception is stressful, but I don't endorse honesty as a virtue because of that).
      • I'm more ready to be dishonest with others though than with myself. By far. If others have chosen to enter into an adversarial situation with me, I don't owe it to them to ensure they have an accurate map with which to harm me.
    • I try very hard to do what I say I will. I honor contracts and agreements and pledges, even when it ends up being costly or if I regret the commitment. Most of the time it means I'll go to lengths to avoid being late. Once I pledged to stay in a job for nine months longer and I did even when I wasn't enjoying it.
    • Cooperation flows naturally from optimizing the whole too. Sure, you can get ahead today via defection, but in the long-term, cooperation wins out.
  • Connection
    • It seems that I value connecting with other minds. This feels kind of weird, but maybe only because I came to recognize it explicitly later than the other values and virtues listed her.
    • I'm not sure what "connecting" means exactly, but it's a thing and it seems good and something I want and value.
  • Gratitude
    • I have always felt gratitude very strongly and deeply. If someone has done something which has benefited me, I am thankful and wish to do good by them too.
    • My feelings of gratitude are evoked by even small things and the feelings will last for years.
    • Is this ideal? I don't know. I haven't thought through the arguments, but I embrace it as a personal-virtue I am happy to have.
    • I suspect it gratitude is in large part what grows into loyalty for me. I think that I am rather loyal
  • Meta-Virtue: Integrity
    • Integrity is the meta-virtue of having your values and virtues and acting in accordance with them. The commitment to living by them. It's kind of weird to need this meta-virtue, but I think you can praise a person for commitment to the values and virtues somewhat independently of sharing their values and virtues.


I can't tell you exactly where I'm going, but I can sure see which direction the arrow points.It's easier, in a way, to talk about the negative motivations — ending disease, decreasing existential risk, that sort of thing — because those are the things that I'm pretty sure of, in light of uncertainty about what really matters to me. I don't know exactly what I want, but I'm pretty sure I want there to be humans (or post-humans) around to see it.But don't confuse what I'm doing with what I'm fighting for. The latter is much harder to describe, and I have no delusions of understanding.You don't get to know exactly what you're fighting for, but the world's in bad enough shape that you don't need to.

From You don't get to know what you're fighting for on Minding Our Way


The expected value of extinction risk reduction is positive

9 июня, 2019 - 22:37
Published on June 9, 2019 3:49 PM UTC

By Jan M. Brauner and Friederike M. Grosse-Holz

Work on this article has been funded by the Centre for Effective Altruism, but the article represents the personal views of the authors.

(Cross-post from the EA forum)

Short summary

There are good reasons to care about sentient beings living in the millions of years to come. Caring about the future of sentience is sometimes taken to imply reducing the risk of human extinction as a moral priority. However, this implication is not obvious so long as one is uncertain whether a future with humanity would be better or worse than one without it.

In this article, we try to give an all-things-considered answer to the question: “Is the expected value of efforts to reduce the risk of human extinction positive or negative?”. Among others, we cover the following points:

  • What happens if we simply tally up the welfare of current sentient beings on earth and extrapolate into the future; and why that isn’t a good idea
  • Thinking about the possible values and preferences of future generations, how these might align with ours, and what that implies
  • Why the “option value argument” for reducing extinction risk is weak
  • How the potential of a non-human animal civilisation or an extra-terrestrial civilisation taking over after human extinction increases the expected value of extinction risk reduction
  • Why, if we had more empirical insight or moral reflection, we might have moral concern for things outside of earth, and how that increases the value of extinction risk reduction
  • How avoiding a global catastrophe that would not lead to extinction can have very long-term effects
Long Summary

If most expected value or disvalue lies in the billions of years to come, altruists should plausibly focus their efforts on improving the long-term future. It is not clear whether reducing the risk of human extinction would, in expectation, improve the long-term future, because a future with humanity may be better or worse than one without it.

From a consequentialist, welfarist view, most expected value (EV) or disvalue of the future comes from scenarios in which (post-)humanity colonizes space, because these scenarios contain most expected beings. Simply extrapolating the current welfare (part 1.1) of humans and farmed and wild animals, it is unclear whether we should support spreading sentient beings to other planets.

From a more general perspective (part 1.2), future agents will likely care morally about the same things we find valuable or about any of the things we are neutral towards. It seems very unlikely that they would see value exactly where we see disvalue. If future agents are powerful enough to shape the world according to their preferences, this asymmetry implies the EV of future agents colonizing space is positive from many welfarist perspectives.

If we can defer the decision about whether to colonize space to future agents with more moral and empirical insight, doing so creates option value (part 1.3). However, most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents. Such “bad” agents will make worse decisions than we, currently, could. Thus, the option value in reducing the risk of human extinction is small.

The universe may not stay empty, even if humanity goes extinct (part 2.1). A non-human animal civilization, extraterrestrials or uncontrolled artificial intelligence that was created by humanity might colonize space. These scenarios may be worse than (post-)human space colonization in expectation. Additionally, with more moral or empirical insight, we might realize that the universe is already filled with beings or things we care about (part 2.2). If the universe is already filled with disvalue that future agents could alleviate, this gives further reason to reduce extinction risk.

In practice, many efforts to reduce the risk of human extinction also have other effects of long-term significance. Such efforts might often reduce the risk of global catastrophes (part 3.1) from which humanity would recover, but which might set technological and social progress on a worse track than they are on now. Furthermore, such efforts often promote global coordination, peace and stability (part 3.2), which is crucial for safe development of pivotal technologies and to avoid negative trajectory changes in general.

Aggregating these considerations, efforts to reduce extinction risk seem positive in expectation from most consequentialist views, ranging from neutral on some views to extremely positive on others. As efforts to reduce extinction risk also seem highly leveraged and time-sensitive, they should probably hold prominent place in the long-termist EA portfolio.

Introduction and background

The future of Earth-originating life might be vast, lasting millions of years and containing many times more beings than currently alive (Bostrom, 2003). If future beings matter morally, it should plausibly be a major moral concern that the future plays out well. So how should we, today, prioritise our efforts aimed at improving the future?

We could try to reduce the risk of human extinction. A future with humanity would be drastically different from one without it. Few other factors seems as pivotal for how the world will look like in the millions of years to come as whether or not humanity survives the next few centuries and millennia. Effective efforts to reduce the risk of human extinction could thus have immense long-term impact. If we were sure that this impact was positive, extinction risk reduction would plausibly be one of the most effective ways to improve the future.

However, it is not at first glance clear that reducing extinction risk is positive from an impartial altruistic perspective. For example, future humans might have terrible lives that they can’t escape from, or humane values might exert little control over the future, resulting in future agents causing great harm to other beings. If indeed it turned out that we weren’t sure if extinction risk reduction was positive, we would prioritize other ways to improve the future without making extinction risk reduction a primary goal.

To inform this prioritisation, in this article we estimate the expected value of efforts to reduce the risk of human extinction.

Moral assumptions

Throughout this article, we base our considerations on two assumptions:

  1. That it morally matters what happens in the billions of years to come. From this very long-term view, making sure the future plays out well is a primary moral concern.
  2. That we should aim to satisfy our reflected moral preferences. Most people would want to act according to the preferences they would have upon idealized reflection, rather than according to their current preferences. The process of idealized reflection will differ between people. Some people might want to revise their preferences after they became much smarter, more rational and had spent millions of years in philosophical discussion. Others might want to largely keep their current moral intuitions, but learn empirical facts about the world (e.g. about the nature of consciousness).

Most arguments further assume that the state the world is brought into by one’s actions is what matters morally (as opposed to e.g. the actions following a specific rule). We thus take a consequentialist view, judging potential actions by their consequences.

Parts 1.1 and 1.2 further take a welfarist perspective, assuming that what matters morally in states of the world is the welfare of sentient beings. In a way, that means assuming our reflected preferences are welfarist. Welfare will be broadly defined as including pleasure and pain, but also complex values or the satisfaction of preferences. From this perspective, a state of the world is good if it is good for the individuals in this world. Across several beings, welfare will be aggregated additively[1], no matter how far in the future an expected being lives. Additional beings with positive (negative) welfare coming into existence will count as morally good (bad). In short, parts 1.1 and 1.2 take the view of welfarist consequentialism with a total view on population ethics (see e.g. (Greaves, 2017)), but the arguments also hold for other similar views.

If we make the assumptions outlined above, nearly all expected value or disvalue in a future with humanity arises from scenarios in which (post-)humans colonize space. The colonizable universe seems very large, so scenarios with space colonization likely contain a lot more beings than scenarios with earthbound life only (Bostrom, 2003). Conditional on human survival, space colonization also does not seem too unlikely, thus nearly all expected future beings live in scenarios with space colonization[2]. We thus take “a future with humanity” to mean “(post-)human space colonization” for the main text and briefly discuss what a future with only earthbound humanity might look like in Appendix 1.

Outline of the article

Ultimately, we want to know “What is the expected value (EV) of efforts to reduce the risk of human extinction?”. We will address this question in three parts:

  • In part 1, we ask “What is the EV of (post-)human space colonization[3]?”. We first attempt to extrapolate the EV from the amounts of value and disvalue in today’s world and how they would likely develop with space colonization. We then turn toward a more general examination of what future agents’ tools and preferences might look like and how they will, in expectation, shape the future. Finally, we consider if future agents could make a better decision on whether to colonize space (or not) than we can, so that it seems valuable to let them decide (option value).

  • In part 1 we tacitly assumed the universe without humanity is and stays empty. In part 2, we drop that assumption. We evaluate how the possibility of space colonization by alternative agents and the possibility of existing but tractable disvalue in the universe change the EV of keeping humans around.

  • In part 3, we ask “Besides reducing extinction risk, what will be the consequences of our efforts?”. We look at how different efforts to reduce extinction risk might influence the long-term future by reducing global catastrophic risk and by promoting global coordination and stability.

We stress that the conclusions of the different parts should not be separated from the context. Since we are reasoning about a topic as complex and uncertain as the long-term future, we take several views, aiming to ultimately reach a verdict by aggregating across them.

A note on disvalue-focus

The moral view on which this article is based is very broad and can include enormously different value systems, in particular different degrees of ‘disvalue-focus’. We consider a moral view disvalue-focused if it holds the prevention/reduction of disvalue is (vastly) more important than the creation of value. One example are views that hold the prevention or reduction of suffering as an especially high moral priority.

The degree of disvalue focus one takes chiefly influences the EV of reducing extinction risk.

From very disvalue-focused views, (post-) human space colonization may not seem desirable even if the future contains a much better ratio of value to disvalue than today. There is little to gain from space colonization if the creation of value (e.g. happy beings) morally matters little. On the other hand, space colonization would multiply the amount of sentient beings and thereby multiply the absolute amount of disvalue.

At first glance it thus seems that reducing the risk of human extinction is not a good idea from a strongly disvalue-focused perspective. However, the value of extinction risk reduction for disvalue-focused views gets shifted upwards considerably by the arguments in part 2 and 3 of this article.

Part 1: What is the EV of (post-)human space colonization?[4] 1.1: Extrapolating from today’s world

Space colonization is hard. By the time our technology is advanced enough, human civilization will possibly have changed considerably in many ways. However, to get a first grasp of the expected value of the long-term future, we can model it as a rough extrapolation of the present. What if humanity as we know it colonized space? There would be vastly more sentient beings, including humans, farmed animals and wild animals[5]. To estimate the expected value of this future, we will consider three questions:

  1. How many humans, farmed animals and wild animals will exist?
  2. How should we weigh the welfare of different beings?
  3. For each of humans, farmed animals and wild animals:
    1. Is the current average welfare net positive/average life worth living?
    2. How will welfare develop in the future?

We will then attempt to draw a conclusion. Note that throughout this consideration, we take an individualistic welfarist perspective on wild animals. This perspective stands in contrast to e.g. valuing functional ecosystems and might seem unusual, but is increasingly popular.

There will likely be more farmed and wild animals than humans, but the ratio will decrease compared to the present

In today’s world, both farmed and wild animals outnumber humans by far. There are about 3-4 times more farmed land animals and about 13 times more farmed fish[6] than humans alive. Wild animals prevail over farmed animals, with about 10 times more wild birds than farmed birds and 100 times more wild mammals than farmed mammals alive at any point. Moving on to smaller wild animals, the numbers increase again, with 10 000 times more vertebrates than humans, and between 100 000 000 - 10 000 000 000 times more insects and spiders than humans[7].

In the future, the relative number of animals compared to humans will likely decrease considerably.

Farmed animals will not be alive if animal farming substantially decreases or stops, which seems more likely than not for both for moral and economical reasons. Humanity’s moral circle seems to have been expanding throughout history (Singer, 2011) and further expansion to animals may well lead us to stop farming animals.[8] Also financially, plant-based meat alternatives or lab-grown meat will likely develop to be more efficient than growing animals (Tuomisto and Teixeira de Mattos, 2011). However, none of these developments seems unequivocally destined to end factory-farming[9], and the historical track record shows that meat consumption per head has been growing for > 50 years[10]. Overall, it seems likely but not absolutely clear that the number of farmed animals relative to humans will be smaller in the future. For wild animals, we can extrapolate from a historical trend of decreasing wild animal populations. Even if wild animals were spread to other planets for terraforming, the animal / human ratio would likely be lower than today.

Welfare of different beings can be weighted by (expected) consciousness

To determine the EV of the future, we need to aggregate welfare across different beings. It seems like we should weigh the experience of a human, a cow and a beetle differently when adding up, but by how much? This is a hard question with no clear answer, but we outline some approaches here. The degree to which an animal is conscious (“the lights are on”, the being is aware of its experiences, emotions and thoughts), or the confidence we have in an animal being conscious, can serve as a parameter by which to weight welfare. To arrive at a number for this parameter, we can use proxies such as brain mass, neuron count and mental abilities directly. Alternatively, we may aggregate these proxies with other considerations into an estimate of confidence that a being is conscious. For instance, the Open Philanthropy Project estimates the probability that cows are conscious at 80%.

The EV of (post-)human lives is likely positive

Currently, the average human life seems to be perceived as being worth living. Survey data and experience sampling suggests that most humans are quite content with their lives and experience more positive than negative emotions on a day-to-day basis[11]. If they find it not worth living, humans can take their life, but relatively few people commit suicide (Suicide accounts for 1.7 % of all deaths in US).[12] We could conclude that human welfare is positive.

We should, however, note the two caveats in this conclusion. First, a live can be perceived as worth living even if it is negative from a welfarist perspective.[13] Second, the average life might not be worth living if the suffering of the worst off was sufficiently more intense than the happiness of the majority of people.

Overall, it seems that from a large majority of consequentialist views, the current aggregated human welfare is positive.

In the future, we will probably make progress that will improve the average human life. Historic trends have been positive across many indicators of human well-being, knowledge, intelligence and capability. On a global scale, violence is declining, cooperation increasing (Pinker, 2011). Yet, the trend does not include all indicators: subjective welfare has (in recent times) remained stable or improved very little, and mental health problems are more prevalent. These developments have sparked research into positive psychology and mental health treatment, which is slowly bearing fruit. As more fundamental issues are gradually improved, humanity will likely shift more resources towards actively improving welfare and mental health. Powerful tools like genetic design and virtual reality could be used to further improve the lives of the broad majority as well as the worst-off. While there are good reasons to assume that human welfare in the future will be more positive than now, we still face uncertainties (e.g. from low probability events like malicious, but very powerful autocratic regimes and unknown unknowns).

EV of farmed animals’ lives is probably negative

Currently, 93% of farmed animals live on factory farms in conditions that likely make their lives not worth living. Although there are positive sides to animal life on farms compared to life in the wild[14], these are likely outweighed by negative experiences[15]. Most farmed animals also lack opportunities to exhibit naturally desired behaviours like grooming. While there is clearly room for improvement in factory farming conditions, the question “is the average life worth living?” must be answered separately for each situation and remains controversial[16]. On average, a factory farm animal life today probably has negative welfare.

In the future, factory farming is likely to be abolished or modified to improve animal welfare as our moral circle expands to animals (see above). We can thus be moderately optimistic that farm animal welfare will improve and/or less farm animals will be alive.

The EV of wild animals’ lives is very unclear, but potentially negative

Currently, we know too little about the lives and perception of wild animals to judge whether their average welfare is positive or negative. We see evidence of both positive[17] and negative[18] experiences. Meanwhile, our perspective on wild animals might be skewed towards charismatic big mammals living relatively good lives. We thus overlook the vast majority of wild animals, based both on biomass and neural count. Most smaller wild animal species (invertebrates, insects etc) are r-selected, with most individuals living very short lives before dying painfully. While vast numbers of those lives seem negative from a welfarist perspective, we may chose to weight them less based on the considerations outlined above. In summary, most welfarist views would probably judge the aggregated welfare of wild animals as negative. The more one thinks that smaller, r-selected animals matter morally, the more negative average wild animal welfare becomes.

In future, we may reduce the suffering of wild animals, but it is unclear whether their welfare would be positive. Future humans may be driven by the expansion of the moral circle and empowered by technological progress (e.g. biotechnology) to improve wild animal lives. However, if average wild animal welfare remains negative, it would still be bad to increase wild animal numbers by space colonization.


It remains unclear whether the EV of a future in which a human civilization similar to the one we know colonized space is positive or negative.

To quantify the above considerations from a welfarist perspective, we created a mathematical model. This model yields a positive EV for a future with space colonization if different beings are weighted by neuron count and a negative EV if they are weighted by sqrt(neuron count). In the first case, average welfare is positive, driven by the spreading of happy (post-)humans. In the second case, average welfare is negative as suffering wild animals are spread. The model is also based on a series of low-confidence assumptions[19], alteration of which could flip the sign of the outcome again.

More qualitatively, the EV of an extrapolated future heavily depends on one’s moral views. The degree to which one is focused on avoiding disvalue seems especially important. Consider that every day, humans and animals are being tortured, murdered, or in psychological despair. Those who would walk away from Omelas might also walk away from current and extrapolated future worlds.

Finally, we should note how little we know about the world and how this impacts our confidence in considerations about an extrapolated future. To illustrate the extent of our empirical uncertainty, consider that we are extrapolating from 100 000 years of human existence, 10 000 years of civilizational history and 200 years of industrial history to potentially 500 million years on earth (and much longer in the rest of the universe). If people in the past had guessed about the EV of the future in a similar manner, they would most likely have gotten it wrong (e.g. they might not have considered moral relevance of animals, or not have known that there is a universe to potentially colonize). We might be missing crucial considerations now in analogous ways.

1.2: Future agents’ tools and preferences

While part 1.1 extrapolates directly from today’s world, part 1.2 takes a more abstract approach. To estimate the EV of (post-)human space-colonization in more broadly applicable terms, we consider three questions:

  1. Will future agents have the tools to shape the world according to their preferences?
  2. Will future agents’ preferences resemble our 'reflected preferences' (see 'Moral assumptions' section)?
  3. Can we expect the net welfare of future agents and powerless beings to be positive or negative?

We then attempt to estimate the EV of future agents colonizing space from a welfarist consequentialist view.

Future agents will have powerful tools to shape the world according to their preferences

Since climbing down from the trees, humanity has changed the world a great deal. We have done this by developing increasingly powerful tools to satisfy our preferences (i.e. preferences to eat, stay healthy and warm, and communicate with friends (even if they are far away)). As far as humans have altruistic preferences, powerful tools have made acting on them less costly. For instance, if you see someone is badly hurt and want to help, you don’t have to carry them home and care for them yourself anymore, you can just call an ambulance. However, powerful tools have also made it easier to cause harm, either by satisfying harmful preferences (e.g. weapons of mass destruction) or as a side-effect of our actions that we are indifferent to. Technologies that enable factory farming do enormous harm to animals, although they were developed to satisfy a preference for eating meat, not for harming animals[20].

It seems likely that future agents will have much more powerful tools than we do today. These tools could be used to make the future better or worse. For instance, biotechnology and genetic engineering could help us cure diseases and live longer, but they could also enforce inequality if treatments are too expensive for most people. Advanced AI could make all kinds of services much cheaper but could also be misused. For more potent and complex tools, the stakes are even higher. Consider the example of technologies that facilitate space colonization. These tools could be used to cause the existence of many times more happy lives than would be possible on Earth, but also to spread suffering.

In summary, future agents will have the tools to create enormous value (more examples here) or disvalue (more examples here).[21] It is thus important to consider the values/preferences that future agents might have.

We can expect future agents to have other-regarding preferences that we would, after reflection, find somewhat positive

When referring to future agents’ preferences, we distinguish between ‘self-regarding preferences’, i.e. preferences about states of affairs that directly affect an agent, and ‘other-regarding preferences’, i.e. preferences about the world that remain even if an agent is not directly affected (see footnote[22] for a precise definition). Future agents’ other-regarding preferences will be crucial for the value of the future. For example, if the future contains powerless beings in addition to powerful agents, the welfare of the former will depend to a large degree on the other-regarding preferences of the latter (much more about that later).

We can expect a considerable fraction of future agents’ preferences to be other-regarding

Most people alive today clearly have (positive and negative) other-regarding preferences, but will this be the case for future agents? It has been argued that over time, other-regarding preferences could be stripped away by Darwinian selection. We explore this argument and several counterarguments in appendix 2. We conclude that future agents will, in expectation, have a considerable fraction of other-regarding preferences.

Future agents’ preferences will in expectation be parallel rather than anti-parallel to our reflected preferences

We want to estimate the EV of a future shaped by powerful tools according to future agents’ other-regarding preferences. In this article we assume that we should ultimately aim to satisfy our reflected moral preferences, the preferences we would have after an idealized reflection process (as discussed in the "Moral assumptions" section above). Thus, we must establish how future agents’ other-regarding preferences (FAP) compare to our reflected other-regarding preferences (RP). Briefly put, we need to ask: “would we want the same things as these future agents who will shape the world?”

FAP can be somewhere on a spectrum from parallel to orthogonal to anti-parallel to RP. If FAP and RP are parallel, future agents agree exactly with our reflected preferences. If the are anti-parallel, future agents see value exactly where we see disvalue. And if the are orthogonal, future agents value what we regard as neutral, and vice versa. We now examine how FAP will be distributed on this spectrum.

Assume that future agents care about moral reflection. They will then have better conditions for an idealized reflection process than we have, for several reasons:

  • Future agents will probably be more intelligent and rational[23]

  • Empirical advances will help inform moral intuitions (e.g. experience machines might allow agents to get a better idea of other beings’ experiences)

  • Philosophy will advance further

  • Future agents will have more time and resources to deliberate

Given these prerequisites, it seems that future agents’ moral reflection would in expectation lead to FAP that are parallel rather than anti-parallel to RP. How much overlap between FAP and RP to expect remains difficult to estimate.[24]

However, scenarios in which future agents do not care about moral reflection might substantially influence the EV of the future. For example, it might be likely that humanity loses control and the agents shaping the future bear no resemblance to humans. This could be the case if developing controlled artificial general intelligence (AGI) is very hard, and the probability that misaligned AGI will be developed is high (in this case, the future agent is a misaligned AI).[25]

Even if (post-)humans remain in control, human moral intuitions might turn out to be contingent the starting conditions of the reflection process and not very convergent across the species. Thus, FAP may not develop into any clear direction, but rather drift randomly[26]. Very strong and fast goal drift might be possible if future agents include digital (human) minds because such minds would not be restrained by the cultural universals rooted in the physical brain architecture.

If it turns out that FAP develop differently from RP, FAP will in expectation be orthogonal to RP rather than anti-parallel. The space of possible preferences is vast, so it seems much more likely that FAP will be completely different from RP, rather than exactly opposite[27] (See footnote[28] for an example). In summary, FAP parallel or orthogonal to RP both seem likely, but a large fraction of FAP being anti-parallel to RP seems fairly unlikely. This main claim seems true for most “idealized reflection processes” that people would choose.

However, FAP being between parallel and orthogonal to RP in expectation does not necessarily imply the future will be good. Actions driven by (orthogonal) FAP could have very harmful side-effects, as judged by our reflected preferences. Harmful side-effects could be devastating especially if future agents are indifferent towards beings we (would on reflection) care about morally. Such negative side-effects might outweigh positive intended effects, as has happened in the past[29]. Indeed, some of the most discussed “risks of astronomical future suffering” are examples of negative side-effects.[30]

Future agents’ tools and preferences will in expectation shape a world with probably net positive welfare

Above we argued that we can expect some overlap between future agents’ other-regarding preferences (FAP) and our reflected other-regarding preferences (RP). We can thus be somewhat optimistic about the future in a very general way, independent of our first-order moral views, if we ultimately aim to satisfy our reflected preferences. In the following section, we will drop some of that generality. We will examine what future agents’ preferences will imply for the welfare of future beings. In doing so, we assume that we would on reflection hold an aggregative, welfarist altruistic view (as explained in the background-section).

If we assume these specific RP, can we still expect FAP to overlap with them? After all, other-regarding preferences anti-parallel to welfarist altruism – such as sadistic, hateful, revengeful preferences - clearly exist within present day humanity. If current human values transferred broadly into the future, should we then expect a large fraction of FAP being anti-parallel to welfarist altruism? Probably not. We argue in appendix 3 that although this is hard to quantify, the large majority of human other-regarding preferences seem positive.

Assuming somewhat welfarist FAP, we explore what the future might be like for two types of beings: Future agents (post-humans) who have powerful tools to shape the world, and powerless future beings. To aggregate welfare for moral evaluation, we need to estimate how many beings of each type will exist. Powerful agents will likely be able to create powerless beings as “tools” if this seems useful for them. Sentient “tools” could include animals, farmed for meat production or spread to other planets for terraforming (e.g. insects), but also digital sentient minds, like sentient robots for task performance or simulated minds created for scientific experimentation or entertainment. The last example seems especially relevant, as digital minds could be created in vast amounts if digital sentience is possible at all, which does not seem unlikely. If we find we morally care about these “tools” upon reflection, the future would contain many times more powerless beings than powerful agents.

The EV of the future thus depends on the welfare of both powerful agents and powerless beings, with the latter potentially much more relevant than the former. We now consider each in turn, asking:

  • How will their expected welfare be affected by intended effects and side-effects of future agents’ actions?
  • How to evaluate this morally?
The aggregated welfare of powerful future agents is in expectation positive

Future agents will have powerful tools to satisfy their self-regarding preferences and be somewhat benevolent towards each other. Thus, we can expect future agents’ welfare to be increased through intended effects of their actions.

Side-effects of future agents’ actions negative for other agents’ welfare would mainly arise if their civilization is not coordinated well. However, compromise and cooperation seem to usually benefit all involved parties, indicating that we can expect future agents to develop good tools for coordination and use them a lot.[31] Coordination also seems essential to avert many extinction risks. Thus, a civilization that avoided extinction so successfully that it colonizes space is expected to be quite coordinated.

Taken together, vastly more resources will likely be used in ways that improve the welfare of powerful agents than in ways that diminish their welfare. From the big majority of welfarist views, future agents’ aggregated welfare is thus expected to be positive. This conclusion is also supported by human history, as improved tools, cooperation and altruism have increased the welfare of most humans and average human lives are seen as worth living by many (see part 1.1).

The aggregated welfare of powerless future beings may in expectation be positive

Assuming that future agents are mostly indifferent towards the welfare of their “tools”, their actions would affect powerless beings only via (in expectation random) side-effects. It is thus relevant to know the “default” level of welfare of powerless beings. If the affected powerless beings were animals shaped by evolution, their default welfare might be net negative. This is because evolutionary pressure might result in a pain-pleasure asymmetry with suffering being much more intense than pleasure (see footnote for further explanation[32]). Such evolutionary pressure would not apply for designed digital sentience. Given that our experience with welfare is restricted to animals (incl. humans) shaped by evolution, it is unclear what the default welfare of digital sentients would be. If there is at least some moral concern for digital sentience, it seems fairly likely that the creating agents would prefer to give their sentient tools net positive welfare[33].

If future agents intend to affect the welfare of powerless beings, they might - besides from treating their sentient “tools” accordingly - create (dis-)value optimized sentience: minds that are optimized for extreme positive or negative welfare. For example, future agents could simulate many minds in bliss, or many minds in agony. The motivation for creating (dis-)value optimized sentience could be altruism, sadism or strategic reasons[34]. Creating (dis-)value optimized sentience would likely produce much more (negative) welfare per unit of invested resources than the side-effects on sentient tools mentioned above, as sentient tools are optimized for task performance, not production of (dis-)value[35]. (Dis-)value optimized sentience would then be the main determinant of the expected value of post-human space colonization, and not side-effects on sentient tools.

FAP may be orthogonal to welfarist altruism, in which case little (dis-)value optimized sentience will be produced. However, we expect a much larger fraction of FAP to be parallel to welfarist altruism than anti-parallel to it, and thus expect that future agents will use many more resources to create value-optimized sentience than disvalue-optimized sentience. The possibility of (dis-)value optimized sentience should increase the net expected welfare of powerless future beings. However, there is considerable uncertainty about the moral implications of one resource-unit spent optimized for value or disvalue (see e.g. here and here). On the one hand, (dis)value optimized sentience created without evolutionary pressure might be equally efficient in producing moral (dis)value, but used a lot more to produce value. On the other hand, disvalue optimized sentience might lead to especially intense suffering. Many people intuitively give more moral importance to the prevention of suffering the worse it gets (e.g. prioritarianism).

In summary, it seems plausible that a little concern for the welfare of sentient tools could go a long way. Even if most future agents were completely indifferent towards sentient tools (=majority of FAP orthogonal to RP), positive intended effects – creation of value-optimized sentience – could plausibly weigh heavier than side-effects.


Morally evaluating the future scenarios sketched in part 1.2 is hard because we are uncertain. Both empirically uncertain what the future will be like and morally uncertain what our intuitions will be like. The key unanswered questions are

  • How much can we expect the preferences that shape the future to overlap with our reflected preferences?
  • In absence of concern for the welfare of sentient tools, how good or bad is their default welfare?
  • How will the scales of intended effects and side-effects compare?

Taken together, we believe that the arguments in this section indicate that the EV of (post)-human space colonization would only be negative from relatively strongly disvalue-focused views. From the majority, but not overwhelming majority, of welfarist views the EV of (post)-human space colonization seems positive.[36][37]

In parts 1.1 and 1.2, we directly estimated the EV of (post-)human space colonization and found it to be very uncertain. In the remaining parts, we will improve our estimate via other approaches that are less dependent on specific predictions about how (post-)humans will shape the future.

1.3: Future agents could later decide not to colonize space (option value)

We are often uncertain about what the right thing to do is. If we can defer the decision to someone wiser than ourselves, this is generally a good call. We can also defer across time: we can keep our options open for now, and hope our descendants will be able to make better decisions. This option value may give us a reason to prefer to keep our options open.

For instance, our descendants may be in a better position to judge whether space colonization would be good or bad. If they can see that space colonization would be negative, they can refrain from (further) colonizing space: They have the option to limit the harm. In contrast, if humanity goes extinct, the option of (post)-human space colonization is forever lost. So avoiding extinction creates ‘option value’(e.g. Macaskill).[38] This specific type of ‘option value’ - from future agents choosing not to colonize space - and not the more general value of keeping options open, is what we will be referring to throughout this section.[39] This type of option value exist for nearly all moral views, and is very unlikely to be negative.[40] However, as we will discuss in this chapter, this value is rather small compared to other considerations.

A considerable fraction of futures contains option value

Reducing the risk of human extinction only creates option value if future agents will make a better decision, by our (reflected) lights, about whether to colonize space than we could. If they will make worse decisions than us, we would rather decide ourselves.

In order for future agents to make better decisions than us and actually act on them, they need to surpass us in at least one of the following aspects:

  • Better values
  • Better judgement what space colonization will be like (based on increased empirical understanding and rationality)
  • Greater willingness and ability to make decisions based on moral values (non-selfishness and coordination)

Human values change. We are disgusted by many of our ancestors’ moral views, and they would find ours equally repugnant. We can even look back on our own moral views and disagree. There is no reason for these trends to stop exactly now: human morality will likely continue to change.

Yet at each stage in the change, we are likely to view our values as obviously correct. This encourages a greater degree of moral uncertainty than feels natural. We should expect that our moral views would change after idealized reflection (although this also depends on which meta-ethical theory is correct and how idealized reflection works).

We argued in part 1.2 that future agents’ preferences will in expectation have some overlap with our reflected preferences. Even if that overlap is not very high, a high degree of moral uncertainty would indicate that we would often prefer future agents’ preferences over our current, unreflected preferences. In a sizeable fraction of future scenarios, future agents with more time and better tools to reflect, can be expected to make better decisions than one could today.

Empirical understanding and rationality

We now understand the world better than our ancestors, and are able to think more clearly. If those trends continue, future agents may understand better what space colonization will be like, and so better understand how good it will be on a given set of values.

For example, future agents’ estimate of the EV of space colonization will benefit from

  • Better empirical understanding of the universe (for instance about questions discussed in part 2.2)[41] and better predictions, fuelled by more scientific knowledge and better forecasting techniques
  • Increased intelligence and rationality[42], allowing them to more accurately determine what the best action is based on their values.

As long as there is some overlap between their preferences and one’s reflected preferences, this gives an additional reason to defer to future agents’ decisions (example see footnote).[43]

Non-selfishness and coordination

We often know what’s right, but don’t follow through on it anyway. What is true for diets also applies here:

  • Future agents would need to actually make the decision about space colonization based on moral reasoning[44]. This might imply acting against strong economic incentives pushing towards space colonization.

  • Future agents need to be coordinated well enough to avoid space colonization. That might be a challenge in non-singleton futures since future civilization would need ways to ensure that not a single agent starts space colonization.

It seems likely that future agents would probably surpass our current level of empirical understanding, rationality, and coordination, and in a considerable fraction of possible futures they might also do better on values and non-selfishness. However, we should note that to actually not colonize space, they would have to surpass a certain threshold in all of these fields, which may be quite high. Thus, a little bit of progress doesn’t help - option value is only created in deferring the decision to future agents if they surpass this threshold.

Only the relative good futures contain option value

For any future scenario to contain option value, the agents in that future need to surpass us in various ways, as outlined above. This has an implication that further diminishes the relevance of the option value argument. Future agents need to have relatively good values and be relatively non-selfishness to decide not to colonize space for moral reasons. But even if these agents colonized space, they would probably do it in a relatively good manner. Most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents (like misaligned AI). Such “bad” agents will make worse decisions about whether or not to colonize space than we, currently, could, because their preferences are very different from our (reflected) preferences. Potential space colonization by indifferent or malicious agents thus generates large amounts of expected future disvalue, which cannot be alleviated by option value. Option value doesn’t help in the cases where it is most needed (see footnote for an explanatory example)[45]


If future agents are good enough, there is option value in deferring the decision whether to colonize space to them. In some not-too-small fraction of possible futures, agents will fulfill the criteria and thus option value adds positively to the EV of reducing extinction risk. However, the futures accounting for most expected future disvalue are likely controlled by indifferent or malicious agents. Such “bad” agents would likely make worse decisions than we could. A large amount of expected future disvalue is thus not amendable from alleviation through option value. Overall, we think the option value in reducing the risk of human extinction is probably fairly moderate, but there is a lot of uncertainty and contingency on one’s specific moral and empirical views[46]. Modelling the considerations of this section showed that if the 90% confidence interval of value of the future was from -0.9 to 0.9 (arbitrary value units), option value was 0.07.

Part 2: Absence of (post-)human space colonization does not imply a universe devoid of value or disvalue

Up to now, we have tacitly assumed that the sign of EV of (post)-human space colonization determines whether extinction risk reduction is worthwhile. This only holds if without humanity, the EV of the future is roughly zero, because the (colonizable) universe is and will stay devoid of value or disvalue. We now consider two classes of scenarios in which this is not the case, with important implications especially for people who think that EV of (post-)human space colonization is likely negative.

2.1 Whether (post-)humans colonizing space is good or bad, space colonization by other agents seems worse

If humanity goes extinct without colonizing space, some kind of other beings would likely survive on earth[47]. These beings might evolve into a non-human technological civilization in the hundreds of millions of years left on earth and eventually colonize space. Similarly, extraterrestrials (that might already exist or come into existence in the future) might colonize (more of) our corner of the universe, if humanity does not.

In these cases, we must ask whether we prefer (post-)human space colonization over the alternatives. Whether alternative civilizations would be more or less compassionate or cooperative than humans, we can only guess. We may however assume that our reflected preferences depend on some aspects of being human, such as human culture or the biological structure of the human brain[48]. Thus, our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations. As future agents will have powerful tools to shape the world according to their preferences, we should prefer (post-)human space colonization over space colonization by an alternative civilization.

To understand how we can factor this consideration into the overall EV of a future with (post-) human space colonization, consider the following example of Ana and Chris. Ana thinks the EV of (post-)human space colonization is negative. For her, the EV of potential alternative space colonization is thus even more negative. This should cause people who, like Ana, are pessimistic about the EV of (post-)human space colonization (and thus the value of reducing the risk of human extinction) to update towards reducing the risk of human extinction because the alternative is even worse (technical caveat in footnote[49]).

Chris thinks that the EV of (post-)human space colonization is positive. For him, the EV of potential alternative space colonization could be positive or negative. For people like Chris, who are optimistic about the EV of (post-)human space colonization (and thus the value of reducing the risk of human extinction), the direction of update is thus less clear. They should update towards reducing the risk of human extinction if the potential alternative civilization is bad, or away from it if the potential alternative civilization is merely less good. Taken together, this consideration implies a stronger update for future pessimists like Ana than for future optimists like Chris. This becomes clearer in the mathematical derivation[50] or when considering an example[51].

It remains to estimate how big the update should be. Based on our best guesses about the relevant parameters (Fermi-estimate see here), it seems like future pessimists should considerably shift their judgement of the EV of human extinction risk reduction into the less negative direction. Future optimists should moderately shift their judgement downwards. Therefore, if one was previously uncertain with roughly equal credence in future pessimism and future optimism, one’s estimate for the EV of human extinction risk reduction should increase.

We should note that this is a very broad consideration, with details contingent on the actual moral views people hold and specific empirical considerations[52].

A specific case of alternative space colonization could arise if humanity gets extinguished by misaligned AGI. It seems likely that misaligned AI would colonize space. Space colonization by an AI might include (among other things of value/disvalue to us) the creation of many digital minds for instrumental purposes. If the AI is only driven by values orthogonal to ours, it would likely not care about the welfare of those digital minds. Whether we should expect space colonization by a human-made, misaligned AI to be morally worse than space colonization by future agents with (post-)human values has been discussed extensively elsewhere. Briefly, nearly all moral views would most likely rather have human value-inspired space colonization than space colonization by AI with arbitrary values, giving extra reason to work on AI alignment especially for future pessimists.

2.2 Existing disvalue could be alleviated by colonizing space

With more empirical knowledge and philosophical reflection, we may find that the universe is already filled with beings/things that we morally care about. Instead of just increasing the number of morally relevant things (i.e. earth originating sentient beings), future agents might then influence the states of morally relevant beings/things already existing in the universe[53]. This topic is highly speculative and we should stress that most of the EV probably comes from “unknown unknowns”, which humanity might discover during idealized reflection. Simply put, we might find some way in which future agents can make the existing world (a lot) better if they stick around. To illustrate this general concept, consider the following ideas.

We might find that we morally care about things other than sentient beings, which could be vastly abundant in the universe. For example, we may develop moral concern for fundamental physics, e.g. in the form of panpsychicism. Another possibility could arise if the solution to the simulation argument (Bostrom, 2003) is indeed that we live in a simulation, with most things of moral relevance positioned outside of our simulation but modifiable by us in yet unknown ways. It might also turn out that we can interact with other agents in the (potentially infinite) universe or multiverse by acausal trade or multiverse-wide cooperation, thereby influencing existing things of moral relevance (to us) in their part of the universe/multiverse. These specific ideas may look weird. However, given humanity’s history of realizing that we care about more/other things than previously thought[54], it should in principle seem likely that our reflected preferences include some yet unknown unknowns.

We argued in part 1.2 that future agents’ preferences will in expectation be parallel rather than anti-parallel to our reflected preferences. If the universe is already filled with things/beings of moral concern, we can thus assume that future agents will in expectation improve the state of these things[55]. This creates additional reason to reduce the risk of human extinction: There might be a moral responsibility for humanity to stick around and “improve the universe”. This perspective is especially relevant for disvalue-focused views. From a (strongly) disvalue-focused view, increasing the numbers of conscious beings by space colonization is negative because it generates suffering and disvalue. It might seem that there is little to gain if space colonization goes well, but much to lose if it goes wrong. If, however, future agents could alleviate existing disvalue, then humanity’s survival (potentially including space colonization) has upsides that may well be larger than the expected downsides (Fermi-estimate see footnote[56]).[57]

Part 3: Efforts to reduce extinction risk may also improve the future

If we had a button that reduces human extinction risk, and has no other effect, we would only need the considerations in parts 1 and 2 to know whether we should press it. In practice, efforts to reduce extinction risk often have other morally relevant consequences, which we examine below.

3.1: Efforts to reduce non-AI extinction risk reduce global catastrophic risk[58]

Global catastrophe here refers to a scenario of hundreds of millions of human deaths and resulting societal collapse. Many potential causes of human extinction, like a large scale epidemic, nuclear war, or runaway climate change, are far more likely to lead to a global catastrophe than to complete extinction. Thus, many efforts to reduce the risk of human extinction also reduce global catastrophic risk. In the following, we argue that this effect adds substantially to the EV of efforts to reduce extinction risk, even from the very-long term perspective of this article. This doesn’t hold for efforts to reduce risks that, like risks from misaligned AGI, are more likely to lead to complete extinction than to a global catastrophe.

Apart from being a dramatic event of immense magnitude for current generations, a global catastrophes could severely curb humanity’s long-term potential by destabilizing technological progress and derailing social progress[59].

Technological progress might be uncoordinated and incautious in a world that is politically destabilized by global catastrophe. For pivotal technologies such as AGI, development in an arms race scenario (e.g. driven by post-catastrophe resource scarcity or war) could lead to adverse outcomes we cannot correct afterwards.

Social progress might likewise divert towards opposing open society and general utilitarian-type values. Can we expect the “new” value system emerging after a global catastrophe to be robustly worse than our current value system? While this issue is debated[60], Nick Beckstead gives a strand of arguments suggesting the “new” values would in expectation be worse. Compared to the rest of human history, we currently seem to be on a unusually promising trajectory of social progress. What exactly would happen if this period was interrupted by a global catastrophe is a difficult question, and any answer will involve many judgements calls about the contingency and convergence of human values. However, as we hardly understand the driving factors behind the current period of social progress, we cannot be confident it would recommence if interrupted by a global catastrophe. Thus, if one sees the current trajectory as broadly positive, one should expect this value to be partially lost if a global catastrophe occurs.

Taken together, reducing global catastrophic risk seems to be a valuable effect of efforts to reduce extinction risk. This aspect is fairly relevant even from a very-long term perspective because catastrophes are much more likely than extinction. A Fermi-Estimate suggests the long-term impact from the prevention of global catastrophes is about 50% of the impact from avoiding extinction events. The potential long-term consequences from a global catastrophe include worse values and an increase in the likelihood of misaligned AI scenarios. These consequences seem bad from most moral perspectives, including strongly disvalue-focused ones. Considering the effects on global catastrophic risk should suggest a significant update in the evaluation of the EV of efforts to reduce extinction risk towards more positive (or less negative) values.

3.2: Efforts to reduce extinction risk often promote coordination, peace and stability, which is broadly good

The shared future of humanity is a (transgenerational) global public good (Bostrom, 2013), thus society needs to coordinate to preserve it, e.g. by providing funding and other incentives. Most extinction risk also arises from technologies that allow for one agent (intentionally or by mistake) to start a potential extinction event (e.g. release a harmful virus or start a nuclear war). Coordinated action and careful decisions are thus needed and indeed, the broadest efforts to reduce extinction risk directly promote global coordination, peace and stability. More focused efforts often promote “narrow cooperation” within a specific field (e.g. nuclear non-proliferation) or set up processes (e.g. pathogenic surveillance) that increase global stability by reducing perceived levels of threat from non-extinction events (e.g. bioterrorist attacks).

Taken together, efforts to reduce extinction risk also promote a more coordinated, peaceful and stable global society. Future agents in such a society will probably make wiser and more careful decisions, reducing the risk of unexpected negative trajectory changes in general. Safe development of AI will specifically depend on these factors. Therefore, efforts to reduce extinction risk may also steer the world away from some of the worst non-extinction outcomes, which likely involve war, violence and arms races.

Note that there may be a trade-off as most targeted efforts seem more neglected and therefore promising levers for extinction risk reduction. However, their effects on global coordination, peace and stability are less certain and likely smaller than the effects of broad efforts aimed directly at increasing these factors. Broad efforts to promote global coordination, peace and stability might be among the most promising approaches to robustly improve the future and reduce the risk of dystopian outcomes conditional on human survival.

Conclusion The expected value of efforts to reduce the risk of human extinction (from non-AI causes) seems robustly positive

So all things considered, what is the expected value of efforts to reduce the risk of human extinction? In the first part, we considered what might happen if human extinction is prevented for long enough that future agents, maybe our biological descendants, digital humans, or (misaligned) AGI created by humans, colonize space. The EV of (post-)human space colonization is probably positive from many welfarist perspectives, but very uncertain. We also examined the ‘option value argument’, according to which we should try to avoid extinction and defer the decision to colonize space (or not) to wiser future agents. We concluded that option value, while mostly positive, is small and the option value argument hardly conclusive.

In part 2, we explored what the future universe might look like if humans do go extinct. Vast amounts of value or disvalue might (come to) exist in those scenarios as well. Some of this (dis-)value could be influenced by future agents if they survive. This insight has little impact for people who were optimistic about the future anyway, but shifts the EV of reducing extinction risk upwards for people who were previously pessimistic about the future. In part 3, we extended our considerations to additional effects of many efforts to reduce extinction risk, namely reducing the risk of “mere” global catastrophes and increasing global cooperation and stability. These effects generate considerable additional positive long-term impact. This is because global catastrophes would likely change the direction of technological and social progress in a bad way, while global cooperation and stability are prerequisites for a positive long-term trajectory.

Some aspects of moral views make the EV of reducing extinction risk looks less positive than suggested above. We will consider three such aspects:

  • From a strongly disvalue-focused view, increasing the total number of sentient beings seems negative regardless of the empirical circumstances. The EV of (post-) human space colonization (part 1.1 and 1.2) is thus negative, at least if the universe is currently devoid of value.
  • From a very stable moral view (with low moral uncertainty, thus very little expected change in preferences upon idealized reflection), there are no moral insights for future agents to discover and act upon. Future agents could then only make better decisions than us about whether to colonize space through empirical insights. Likewise, future agents could only discover opportunities to alleviate astronomical disvalue that we currently do not see through empirical insights. Option value (part 1.3) and the effects from potentially existing disvalue (part 2.2) are reduced.
  • From a very unusual moral view (with some of one’s reflected other-regarding preferences expected to be anti-parallel to most of humanity’s reflected other-regarding preferences), future agents will sometimes do the opposite of what one would have wanted[61]. This would be true even if future agents are reflected and act altruistically (according to a different conception of ‘altruism’). From that view the future looks generally worse. There is less option value (part 1.3), and if the universe is already filled with beings/things that we morally care about (part 2.2), sometimes future agents might do the wrong thing upon this discovery.

To generate the (hypothetical) moral view that is most sceptical about reducing extinction risk, we unite all of the three aspects above. We assume a strongly disvalue-focused, very stable and unusual moral view. Even from this perspective (in rough order of descending relevance):

  • Efforts to reduce extinction risk may improve the long-term future by reducing the risk of global catastrophes and increasing global cooperation and stability (part 3).
  • There may be some opportunity for future agents to alleviate existing disvalue (as long as the moral view in question isn’t completely ‘unusual’ in all aspects) (part 2.2)
  • (Post-)humans space colonization might be preferable to space colonization by non-human animals or extraterrestrials (part 2.1)
  • Small amounts of option value might arise from empirical insights improving decisions (part 1.3).

From this maximally sceptical view, targeted approaches to reduce the risk of human extinction likely seem somewhat unexciting or neutral, with high uncertainty (see footnote[62] for how advocates of strongly disvalue-focused views see the EV of efforts to reduce extinction risk). Reducing the risk of extinction by misaligned AI probably seems positive because misaligned AI would also colonize space (see part 2.1).

From views that value the creation of happy beings or creation of value more broadly, have considerable moral uncertainty, and believe future reflected and altruistic agents could make good decisions, the EV of efforts to reduce extinction risk is likely positive and extremely high.

In aggregation, efforts to reduce the risk of human extinction seem in expectation robustly positive from many consequentialist perspectives.

Efforts to reduce extinction risk should be a key part of the EA long-termist portfolio

Effective altruists whose primary moral concern is making sure the future plays out well will, in practice, need to allocate their resources between different possible efforts. Some of these efforts are optimized to reduce extinction risk (e.g. promoting biosecurity), others are optimized to improve the future conditional on human survival while also reducing extinction risk (e.g. promoting global coordination or otherwise preventing negative trajectory changes) and some are optimized to improve the future without making extinction risk reduction a primary goal (e.g. promoting moral circle expansion or "worst-case" AI safety research).

We have argued above that the EV of efforts to reduce extinction risk is positive, but is it large enough to warrant investment of marginal resources? A thorough answer to this question requires detailed examination of the specific efforts in question and goes beyond the scope of this article. We are thus in no position to provide a definitive answer for the community. We will, however, present two arguments that favor including efforts to reduce extinction risk as a key part in the long-termist EA portfolio. Efforts to reduce the risks of human extinction are time-sensitive and seem very leveraged. We know of specific risks this century, we have reasonably good ideas for ways to reduce them, and if we actually avert an extinction event, this has robust impact for millions of years (at least in expectation) to come. As a very broad generalization, many efforts optimized to otherwise improve the future - such as improving today’s values in the hope that they will propagate to future generations - are less time-sensitive or leveraged. In short, it seems easier to prevent an event from happening in this century than to otherwise robustly influence the future millions of years down the line.

Key caveats to this argument include that it is not clear how big differences in time-sensitivity and leverage are[63] and that we may still discover highly leveraged ways to “otherwise improve the future”. Therefore, it seems that the EA long-termist portfolio should contain all of the efforts described above, allowing each member of the community to contribute to their comparative advantage. For those holding very disvalue-focused moral views, the more attractive efforts would plausibly be those optimized to improve the future without making extinction risk reduction a primary goal.


We are grateful to Brian Tomasik, Max Dalton, Lukas Gloor, Gregory Lewis, Tyler John, Thomas Sittler, Alex Norman, William MacAskill and Fabienne Sandkühler for helpful comments on the manuscript. Additionally, we thank Max Daniel, Sören Mindermann, Carl Shulman and Sebastian Sudergaard Schmidt for discussions that helped inform our views on the matter.

Author contributions:

Jan conceived the article and the arguments presented in it. Friederike and Jan contributed to structuring the content and writing.

Appendix 1: What if humanity stayed earthbound?

In this appendix, we use the approach of part 1.1 and apply it to a situation in which humanity stays Earth-bound. It is recommended to first read part 1.1 before reading this appendix.

We think that scenarios in which humanity stays Earth-bound are of very limited relevance for the EV of the future for two reasons:

  • Even if humanity staying Earth-bound was the most likely outcome, probably only a small fraction of expected beings live in these scenarios, so they only constitute a small fraction of expected value or disvalue (as argued in the introduction).
  • Humanity staying Earth-bound may not actually be a very likely scenario because reaching post-humanity and realizing astronomical value might be a default path, conditional on humanity not going extinct (Bostrom, 2009)

If we assume humanity will stay Earth-bound, it seems that most welfarist views would probably favour reducing extinction risk. If one thinks humans are much more important than animals, it is obvious (unless one combined that view with suffering-focused ethics, such as antinatalism). If one also cares about animals, then very plausibly humanity's impact on wild animals is more relevant than humanity’s impact on farmed animals, because of the enormous numbers of the former (and especially since it seems plausible that factory farming will not continue indefinitely). So far, humanity’s main effect on wild animals has been a permanent decrease of population size (through habitat destruction), which is expected to continue as human population size grows. Compared to that, direct influence on wild animal well-being currently is unclear and probably small (though it is less clear for aquatic life):

  • We kill significant numbers of wild animals, but we don’t know how painful human-caused death compared to non-human caused death is
  • Wild animal generation times are very short, so the number of animals affected by “never coming into existence” is probably much larger

If one thinks that wild animals are on net suffering, future population size reduction seems beneficial. If one thinks that wild animal welfare is net positive, then habitat reduction would be bad. However, there is still unarguably a lot of suffering in nature. Humanity might eventually - if we have much more knowledge and better tools, that allow us to do so at limited costs to ourselves - improve wild animals’ lives (like we already do with e.g. vaccinations), so the prospect of that might offset some of the negative value of current habitat reduction. Obviously, habitat destruction is negative from a conservationist/environmentalist perspective.

Appendix 2: Future agents will in expectation have a considerable fraction of other-regarding preferences

Altruism in humans likely evolved as a “shortcut” solution to coordination problems. It was often impossible to forecast how much an altruistic act would help spread your own genes, but it often would (especially in small tribes, where all members were closely related). Thus, humans for whom altruism just felt good had a selective advantage.

As agents become more rational and long-term planning, a tendency to help for purely selfless reasons seems less adaptive. Agents can deliberately cooperate for strategic reasons whenever necessary and for the exactly optimal amount to optimize for their own reproductive fitness. One might fear that in the long run, only preferences for increasing one’s own power and influence (and that of one’s descendants) might remain under Darwinian selection.

But this is not necessarily the case, for two reasons:

Darwinian processes will select for patience, not “selfishness” (Paul Christiano)

Agents reasoning from a long-term perspective, and the better the tools to preserve values and influence into the future, may reduce the need for altruistic preferences, but also strongly reduce selection pressure for selfishness. In contrast to short-term planning (overly) altruistic agents, long-term planning agents that want to create value would realize that amassing power is an instrumental goal for that, and will try to survive, get resources for instrumental reasons, and coordinate with others against unchecked expansion of selfish agents. Thus, future evolution might select not for selfishness, but for patience or how strongly an agent cares about the long-term. Such long-term preferences should be expected to be more altruistic.

Carl Shulman additionally makes the point that in a space colonization scenario, agents that want to create value would only be very slightly disadvantaged in direct competition with agents that only care about expanding.

Brian Tomasik thinks Christiano’s argument is valid and altruism might not be driven to zero in the future, but is doubtful that very-long term altruist will have strategic advantages over medium-term corporations and governments and cautions against putting too much weight on theoretical arguments: “Human(e) values have only a mild degree of control in the present. So it would be surprising if such values had significantly more control in the far future.”

Preferences might not even be subject to Darwinian processes indefinitely

If the losses from evolutionary pressure indeed loom large, it seems quite likely that future generations would coordinate against it, e.g. by forming a singleton (Bostrom, 2006) (which broadly encompasses many forms of global coordination or value/goal-preservation). (Of course, there are also future scenarios that would strip away all other-regarding preferences, e.g. in Malthusian scenarios.)

In conclusion, we will end up somewhere between no other-regarding preferences and even more than today, with a considerable probability of future agents having a considerable fraction of other-regarding preferences.

Appendix 3: What if current human values transferred broadly into the future?

Most humans (past and present) intend to do what we now consider good (be loving, friendly, altruistic) more than they intend to harm (be sadistic, hateful, seek revenge). Positive[64] other-regarding preferences might be more universal: most people would, all else equal, prefer all human or animals to be happy, while fewer people would have such a general preference for suffering. This relative overhang of positive preferences in human society is evident from rules that ban hurting (some) others, but not helping others. These rules will (if they persist) also shape the future, as they increase the costs of doing harm.[65]

Throughout human history, there has been a trend away from cruelty and violence.[66] Although humans cause a lot of suffering in the world today, this is mostly because people are indifferent or “lazy”, rather than evil. All in all, it seems fair to say that the significant majority of human other-regarding preferences is positive, and that most people would, all else equal, prefer more happiness and less suffering. However, we admit this is hard to quantify.[67]

References (only those published in peer-reviewed journals, and books): Bjørnskov, C., Boettke, P.J., Booth, P., Coyne, C.J., De Vos, M., Ormerod, P., Sacks, D.W., Schwartz, P., Shackleton, J.R., Snowdon, C., 2012. ... and the Pursuit of Happiness-Wellbeing and the Role of Government. Bostrom, N., 2013. Existential risk prevention as global priority. Global Policy 4, 15–31. Bostrom, N., 2011. INFINITE ETHICS. Analysis and Metaphysics 9–59. Bostrom, N., 2009. The Future of Humanity, in: New Waves in Philosophy of Technology, New Waves in Philosophy. Palgrave Macmillan, London, pp. 186–215.[ https://doi.org/10.1057/9780230227279_10](https://doi.org/10.1057/9780230227279_10) Bostrom, N., 2006. What is a singleton. Linguistic and Philosophical Investigations 5, 48–54. Bostrom, N., 2004. The future of human evolution. Death and anti-death: Two hundred years after Kant, fifty years after Turing 339–371. Bostrom, N., 2003a. Astronomical waste: The opportunity cost of delayed technological development. Utilitas 15, 308–314. Bostrom, N., 2003b. Are We Living in a Computer Simulation? The Philosophical Quarterly 53, 243–255.[ https://doi.org/10.1111/1467-9213.00309](https://doi.org/10.1111/1467-9213.00309) Greaves, H., 2017. Population axiology. Philosophy Compass 12, e12442. Killingsworth, M.A., Gilbert, D.T., 2010. A wandering mind is an unhappy mind. Science 330, 932.[ https://doi.org/10.1126/science.1192439](https://doi.org/10.1126/science.1192439) Pinker, S., 2011. The Better Angels of our Nature. New York, NY: Viking. Sagoff, M., 1984. Animal Liberation and Environmental Ethics: Bad Marriage, Quick Divorce. Philosophy & Public Policy Quarterly 4, 6.[ https://doi.org/10.13021/G8PPPQ.41984.1177](https://doi.org/10.13021/G8PPPQ.41984.1177) Singer, P., 2011. The expanding circle: Ethics, evolution, and moral progress. Princeton University Press. Tuomisto, H.L., Teixeira de Mattos, M.J., 2011. Environmental Impacts of Cultured Meat Production. Environ. Sci. Technol. 45, 6117–6123.[ https://doi.org/10.1021/es200130u](https://doi.org/10.1021/es200130u) Footnotes
  1. Simply put: two beings experiencing positive (or negative) welfare are morally twice as good (or bad) as one being experiencing the same welfare ↩︎

  2. Some considerations that might reduce our certainty that, even given the moral perspective of this article, most expected value or disvalue comes from space colonization:

  3. In this article, the term ‘(post-)human space colonization’ is meant to include any form of space colonization that originates from a human civilization, including cases in which (biological) humans or human values don’t play a role (e.g. because humanity lost control over artificial superintelligence, which then colonizes space). ↩︎

  4. … assuming that without (post-)human space colonization, the universe is and stays devoid of value or disvalue, as explained in “Outline of the article” ↩︎

  5. We here assume that humanity does not change substantially, excluding e.g. digital sentience from our considerations. This may be overly simplistic, as interstellar travel seems so difficult that a space-faring civilization will likely be extremely different from us today. ↩︎

  6. Around 80 billion farmed fish, which live around one year, are raised and killed per year. ↩︎

  7. All estimates from Brian Tomasik ↩︎

  8. There are convincing anecdotes and examples for an expanding moral circle from family to nation to all humans: The abolishment of slavery; human rights; reduction in discrimination based on gender, sexual orientation, race. However, there doesn’t seem to be a lot of hard evidence. Gwern lists a few examples of a narrowing moral circle (such as infanticide, torture, other examples being less convincing). ↩︎

  9. For example:

    • lab-grown meat is very challenging with few people working on it, little funding, …
    • Consumer adoption is far from inevitable
    • Some people will certainly not want to eat in-vitro meat, so it is unlikely the number of factory-farmed will be abolished completely in the medium term, if the circle of empathy doesn’t increase or governments don’t regulate.
  10. There are also contrary trends. E.g. in Germany, meat consumption per head has been decreasing since 2011, from 62.8 kg in 2011 to 59.2 kg in 2015. In the US, it has been stagnant for 10 years. ↩︎

  11. For example:

    • Many more people remember feeling enjoyment or love than pain or depression across many countries (Figure 13, here)
    • In nearly every country, (much) more than 50% of people report feeling very happy or rather happy (section “Economic growth and happiness”, here)
    • Average happiness in experience sampling in US: 65/100 (Killingsworth and Gilbert, 2010)
  12. One could claim that this just shows that people are afraid of dying or don’t commit suicide for other reasons, but people that suffer from depression have lifetime suicide rates of 2-15%, 10-25 times higher than general population. This at least indicates that suicide rates increase if quality of life decreases. ↩︎

  13. Reported well-being: People on average seem to report being content with their lives. This is only moderate evidence for their lives being positive from a welfarist view because people don’t generally think in welfarist terms when evaluating their lives and there might be optimism bias in reporting. Suicide rates: There are many reasons why people with lives not worth living might refrain from suicide, for example:

    • possibility of failing and then being institutionalized and/or living with serious disability
    • obligations to parents, children, friends
    • fear of hell
  14. For example:

    • always enough food and water (with some exceptions)
    • Domesticated animals have been bred for a long time and now in general have lower basal stress levels and stress reactions than wild animals (because they don’t need them)
  15. For example:

    • harmful breeding (e.g. broiler chicken are potentially in pain during the last 2 weeks of their life, because their joints cannot sustain their weight)
    • There is no incentive to satisfy the emotional and social needs of farmed animals. It is quite likely that e.g. pigs can’t exhibit their natural behavior (e.g. gestation crates). Pigs, hens, veal cattle are often kept in ways that they can’t move (or only very little) for weeks.
    • stress (intense confinement, chicken and pigs show self-mutilating behavior)
    • extreme suffering (some percentage of farmed animals suffering to death or experiencing intense pain during slaughter)
  16. The book Compassion by the pound, for example, rates the welfare of caged laying hens and pigs as negative, but beef cattle, dairy cows, free range laying hens and broiler chickens (market animals) as positive. Other experts disagree, especially on broiler chickens having lives worth living. ↩︎

  17. Ability to express natural behaviour, such as sex, eating, social behavior, etc. ↩︎

  18. Often painful deaths, disease, parasitism, predation, starvation, etc. In general, there is danger of anthropomorphism. Of course I would be cold in Antarctica, but a polar bear wouldn’t. ↩︎

  19. Specifically: moral weight for insects, probability that humanity will eventually improve wild animal welfare, future population size multiplier (insect relative to humans) and human and insect welfare. ↩︎

  20. If anything, attitudes towards animals have arguably become more empathetic. The majority of people around the globe express concern for farm animal well-being. (However, there is limited data, several confounders, and results from indirect questioning indicate that the actual concern for farmed animals might be much lower). See e.g.: http://ec.europa.eu/commfrontoffice/publicopinion/archives/ebs/ebs_270_en.pdf https://www.horizonpoll.co.nz/attachments/docs/horizon-research-factory-farming-survey-report.pdf http://www.tandfonline.com/doi/abs/10.2752/175303713X13636846944367 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4196765/ But also: https://link.springer.com/article/10.1007/s11205-009-9492-z ↩︎

  21. Future technology, in combination with unchecked evolutionary pressure, might also lead to futures that contain very little of what we would value upon reflection (Bostrom, 2004). ↩︎

  22. Self-regarding preferences are preferences that depend on the expected effect of the preferred state of affairs on the agent. These are not synonymous with purely “selfish preferences”. Acting according to self-regarding preferences can lead to acts that benefit others, such as in trade.

    Other-regarding preferences are preferences that don’t depend on the expected effect of the preferred state of affairs on the agent. Other-regarding preferences can lead to acts that also benefit the actor. E.g. parents are happy if they know their children are happy. However, the parents would also want their children to be happy if they wouldn’t come to know about it. As defined here, other-regarding preferences are not necessarily positive for others. They can be negative (e.g. sadistic/hateful preferences) or neutral (e.g. aesthetic preferences).

    Example of two parties at war:

    • Self-regarding preference: Members of the one party want members of the other party to die, so they can win the war and conquer the other party’s resources.
    • Other-regarding preference: Members of the one party want members of the other party to die, because they developed intense hate against them. Even if they don’t get any advantage from it, they would still want the enemy to suffer.
  23. Individual humans as well as human society have become more intelligent over time. See: history of education, scientific revolution, Flynn effect, information technology. Genetic engineering or artificial intelligence may further increase our individual and collective cognition. ↩︎

  24. Even if FAP and RP don’t have a lot of overlap, there might be additional reasons to defer to the values of future generations. Paul Christiano advocates one should sympathize with future agents’ values, if they are reflected, for strategic cooperative reasons, and for a willingness to discard idiosyncratic judgements. ↩︎

  25. Even if earth-originating AI is initially controlled, this might not guarantee control over the future: Goal preservation might be costly, if there are trade-offs between learning and goal preservation during self-improvement, especially in multipolar scenarios. ↩︎

  26. How meaningful moral reflection is, and whether we should expect human values to converge upon reflection, also depends on unsolved questions in meta-ethics. ↩︎

  27. Of course, orthogonal other-regarding preferences can sometimes still lead to anti-parallel actions. Take as an example the debate of conservationism vs. wild animal suffering. Both parties have other-regarding preferences over wild animals. Conservationist don’t have a preferences for wild animal suffering, just for conserving eco-systems. Wild animal suffering advocates don’t have a preference against conserving eco-systems (per se), just against wild animal suffering. In practice, these orthogonal views likely recommend different actions regarding habitat destruction. However, if there will be future agents with preferences on both sides, then there is wildly more room for gains through trade and compromise (such as the implementation of David Pearce’s Hedonistic imperative) in cases like this than if other-regarding preferences were actually anti-parallel. Still, as I also remark in the conclusion, people who think their reflected preferences will be sufficiently unusual to have only a small overlap with other-regarding preferences of other humans, even if they are reflected, will find the whole part 1.2 less compelling for that reason. ↩︎

  28. Maybe we would, after idealized reflection, include a certain class of beings into our other-regarding preferences, and we would want them to be able to experience, say, freedom. It seem quite likely that future agents won’t care about these being at all. However, it seems very unlikely that they would have a particular other-regarding preference for such being to be un-free.

    Or consider the paperclip-maximiser, a canonical example for misaligned AI and thus a example for FAP certainly not being parallel to RP. Still, a paperclip-maximizer does not have a particular aversion against flourishing life, just as we don’t have a particular aversion against paperclips. ↩︎

  29. Examples of negative “side-effects” as defined here:

    • The negative “side-effects” of warfare on the losing party are bigger than the positive effects for the winning party (assuming that the motivation for the war was not “harming the enemy”, but e.g. acquiring the enemy’s resources)
      • This is an example of side effects of powerful agents’ self-regarding preferences on other powerful agents.
    • The negative “side-effects” of factory farming (animal suffering) are bigger than the positive effects for humanity (ability to eat meat). Many people do care about animals, so this is also an example of self-regarding preferences conflicting with other-regarding preferences.
    • The negative “side-effects” of slave-labor on the slave are bigger than the positive effects for the slave owner (gain in wealth)
      • These are both examples of side effects of powerful agents’ self-regarding preferences on powerless beings.

    Of course there are also positive side-effects, cooperative and accidental: E.g.

    • positive “side-effects” of powerful agents acting according to their preferences on other powerful agents: All gains from trade and cooperation
    • positive “side-effects” of powerful agents acting according to their preferences on powerless beings: Rabies vaccination for wild animals. Arguably, wild animal population size reduction.
  30. Additionally, one might object that FAP may not be the driving force shaping the future. Today, it seems that major decision are mediated by a complex system of economical and political structures that often leads to outcomes that don’t align with the preferences of individual humans and that overweights the interests of the economically and politically powerful. On that view, we might expect the influence of human(e) values over the world to remain small. We think that future agents will probably have better tools to actually shape the world according to their preferences, which includes better tools for mediating disagreement and reaching meaningful compromise. But insofar as the argument in this footnote applies, it gives an additional reason to expect orthogonal actions, even if FAP aren’t orthogonal. ↩︎

  31. Note that cooperation does not require caring about the partner one cooperates with. Even two agents that don’t care about each other at all may cooperate instead of waging war for the resources the other party holds, if they have good tools/institutions to arrange compromise, because the cost of warfare is high. ↩︎

  32. Evolutionary reasons for the asymmetry between biological pain and pleasure that would not necessarily remain in designed digital sentience (ideas owed to Carl Shulman):

    • Animals try to minimize the duration of pain (e.g. by moving away from the source of pain), and try to maximize the duration of pleasurable events (e.g by continuing to eat). Thus, painful events are on average shorter than pleasurable events, and so need to be more intense to induce the same learning experience.
    • Losses in reproductive fitness from one single negative event (e.g. a deadly injury) can be much greater than the gains of reproductive fitness from any single positive event, so animals evolved to want to avoid these events at all cost.
    • Boredom/satiation can be seen as evolved protection against reward channel hacking. Animals for which one pleasant stimulus stayed pleasant indefinitely (e.g. animal that just continued eating) had less reproductive success. Pain channels need less protection against hacking, because pain channel hacking...:
      • only works if there is sustained pain in the first place, and
      • is much harder to learn than pleasure channel hacking (the former: after getting hurt, an animal would need to find and eat a pain-relieving plant; the latter: an animal just needs to continue eating despite not having any use for additional calories)

    This might be part of the reason why pain seems much easier to instantiate on demand than happiness. ↩︎

  33. Even if future powerful agents have some concern for the welfare of sentient tools, sentient tools’ welfare might still be net negative, if there are reasons that make positive-welfare tools much more expensive than negative welfare tools (e.g. if suffering is very important for task performance). But even if maximal efficiency and welfare of tools are not completely correlated, we think that most suffering can be avoided while still keeping most productivity, so that a little concern for sentient tools could thus go a long way. ↩︎

  34. Strategic acts in scenarios with little cooperation could motivate the creation of disvalue-optimized sentience, especially in multipolar scenarios that contain both altruistic and indifferent agents (blackmailing). However, because uncooperative acts are bad for everyone, these scenarios in expectation seem to involve little resources. On the positive side, there can also be gains from trade between altruistic and indifferent agents. ↩︎

  35. Sentient tools are optimized for performance in the task they are created for. Per resource-unit, future agents would create: a number of minds as is most efficient, with hedonic experience as is most efficient, optimized for task.

    (Dis)value-optimized sentience might be directly optimized for extent of consciousness or intensity of experience (if that is actually what future generations value altruistically). Per resource-unit, future agents would create: as many minds as is optimal for (dis)value, with as positive/negative as possible hedonic experience, optimized for conscious states.

    Such sentience might be orders of magnitude more efficient in creating conscious experience than sentience not optimized for it. E.g. in humans, only a tiny fraction of energy is used for peak conscious experience: about 20% of energy is used for the brain, only a fraction of that is used for conscious experience, only a fraction of which are “peak” experiences. ↩︎

  36. The driving force behind this judgement is not necessarily the belief that most futures will be good. Rather, it is the belief that the ‘rather good’ futures will contain more net value than the ‘rather bad’ futures will contain net disvalue.

    • The ‘rather good’ futures contain agents with other-regarding preferences highly parallel to our reflected preferences. Many resources will be spent in a way that optimizes for value (by our lights).
    • In the ‘rather bad’ futures, agents are largely selfish, or have other-regarding preferences completely orthogonal to our reflected other-regarding preferences. In these futures, most resources will be spent for goals that we do not care about, but very few resources will be spent to produce things we would disvalue in an optimized way. On whichever side of ”zero” these scenarios fall, they seem much closer to parity than the “rather good futures” (from most moral views).
  37. As also noted in the discussion at the end of the article, part 1 is less relevant for people who have other-regarding preferences very different from other people, and who believe their RP to be very different from the RP of the rest of humanity. ↩︎

  38. Option value is not a separate kind of value, and it would be already integrated in the perfect EV calculation. However, it is quite easy to overlook, and somewhat important in this context, so it is discussed separately here. ↩︎

  39. In a general sense, ‘option value’ includes the value of any change of strategy, for the better or worse, that future agents might take upon learning more. However, the general fact that agents can learn more and adapt their strategy is not surprising and was already factored into considerations 1, 2 and 4. ↩︎

  40. In the more general definition, option value is not always positive. In general, giving future agents the option to choose between different strategies can be bad, if the values of future agents are bad or their epistemics are worse. In this section, ‘option value’ only refers to the option of future agents not to colonize space, if they find colonizing space would be bad from an altruistic perspective. It seems very unlikely that, if future agents refrain from space colonization for altruistic reasons at all, they would do so exactly in those cases in which we (current generation) would have judge space colonization as positive (according to our reflected preferences). So this kind of option value is very unlikely to be negative. ↩︎

  41. Although empirical insights about the universe play a role in both option value and part 2.2, these two considerations are different:

    • Part 2.2: Further insight about the universe might show that there already is a lot of disvalue out there. A benevolent civilization might reduce this disvalue.
    • Option value: Further insight about the universe might show that there already is a lot of value or disvalue out there. That means that we should be uncertain about the EV of (post-)human space colonization. Our descendants will be less uncertain, and can then, if they know there is NOT already a lot of disvalue out there, still decide to not spread to the stars.
  42. Individual humans as well as human society have become more intelligent over time. See: history of education, scientific revolution, Flynn effect, information technology. Genetic engineering or artificial intelligence may further increase our individual and collective cognition. ↩︎

  43. For example, if we care only about maximizing X, but future agents will care about maximizing X, Y and Z to equal parts, letting them decide whether or not to colonize space might still lead to more X than if we decided, because they have vastly more knowledge about the universe and are generally much more capable of making rational decisions. ↩︎

  44. Even if future agents can make better decisions regarding our other-regarding preferences than we (currently) could, future agents also need to be non-selfish enough to act accordingly - their other-regarding preferences need to constitute a sufficiently large fraction of their overall preferences. ↩︎

  45. Say we are uncertain about the value in the future in two ways:

    • 50% credence that disvalue-focused view would be my preferred moral view after idealized reflection, 50% credence in a ‘balanced view’ that also values the creation of value.
    • 50% credence that the future will be controlled by indifferent actors, with preferences completely orthogonal to our reflected preferences, 50% credence that it will be controlled by good actors who have exactly the preferences we would have after idealized reflection.

    The following table shows expected net value of space colonization without considering option value (again: made-up numbers):

    Indifferent actors Good actors Disvalue-focused view -100 -10 ‘Balanced view’ - 5 100

    Now with option value, only the good actors would limit the harm if the disvalue-focused view was indeed our (and thus, their) preferred moral view after idealized reflection:

    Indifferent actors Good actors Disvalue-focused view -100 0 ‘Balanced view’ - 5 100 ↩︎
  46. There is more option value, if:

    • One one currently has high moral uncertainty (one expects one’s views to change considerably upon idealized reflection). With high moral uncertainty, it is more likely that future agents will have significantly more accurate moral values. Expects future agents to have a significantly better empirical understanding
    • One’s uncertainty about the EV of the future comes mainly from moral, and not empirical, uncertainty. For example, say you are uncertain about the expected value of the future because you are unsure whether you would, in your reflected preferences, endorse a strongly disvalue-focused view. If you are generally optimistic about future agents, you can assume future generations to be better informed about which moral view to take. Thus, there is a lot of option value in reducing the risk of human extinction. If, one the other hand, you are uncertain about the EV of the future because you think there is a high chance that future agents just won’t be altruistic, there is no option value in deferring the decision about space colonization to them.
  47. It seems likely that some life-forms would survive, except if human extinction is caused by some cosmic catastrophes (not a focus area for effective altruists, because unlikely and intractable) or by specific forms of nano-technology or by misaligned AI. ↩︎

  48. The extent to which it is true depends on the reflection process one chooses. Several people who read an early draft of this article commented that they would imagine their reflected preferences to be independent of human-specific factors. ↩︎

  49. The argument in the main text assumed that the alternative space colonization contains a comparable amount of things that we find morally relevant as the (post-)human colonization. But in many cases, the EV of an alternative space colonization would actually be (near) neutral, because the alternative civilization’s preferences would be orthogonal to ours. Our values would just be so different from the AI’s or extraterrestrial values that space colonization by these agents might often look neutral to us. The argument in the main text still applies, but only for those alternative space colonizations that contain comparable absolute amounts of value and disvalue.

    However, a very similar argument applies even for alternative colonizations that contain less absolute amount of things we morally care about. The value of alternative space colonization would be shifted more towards zero, but future pessimists would in expectation always find alternative space colonization a worse outcome than no space colonization. From the future pessimistic perspective, human extinction leads to a bad outcome (alternative colonization), and not a neutral one (no space colonization). Future pessimists should thus update towards extinction risk reduction being less negative. Future optimists might find the alternative space colonization better or worse than no colonization.

    The mathematical derivation in the next footnote takes this caveat into account. ↩︎

  50. Assumption: This derivation makes the assumption that people who think the EV of human space colonization is negative and those who think it is positive would still rank a set of potential future scenarios in the same order when evaluating them normatively. This seems plausible, but may not be the case. Let’s simplify the value of human extinction risk reduction to:

    EV(reduction of human extinction risk) = EV(human space colonization) - EV(human extinction)

    (This simplification is very uncharitable towards extinction risk reduction, even if only considering the long-term effects, see parts 2 and 3 of this article). Assuming that no non-human animal or extraterrestrial civilization would emerge in case of human extinction, then EV(human extinction)=0, and so future pessimists judge:

    EV(reduction of human extinction risk) = EV(human space colonization) - EV(human extinction)= EV(human space colonization) < 0

    And future optimists believe:

    EV(reduction of human extinction risk) = EV(human space colonization) - EV(human extinction) = EV(human space colonization) > 0

    Let’s say, if humanity goes extinct, there will be non-human space colonization eventually with the probability p. (p can be down-weighted in a way to account for the fact that later space colonization probably means less final area colonized). This means that:

    EV(human extinction) = p * EV(non-human space colonization)

    Let’s define the amount of value and disvalue created by human space colonization as Vₕ and Dₕ, and the amount value and disvalue created by the non-human civilization as Vₙₕ and Dₙₕ.

    We can expect two relations:

    1. On average, a non-human civilization will care less about creating value and care less about reducing disvalue than a human civilization. We can expect the ratio of value to disvalue to be worse in the case of a non-human civilization:

    (i) Vₙₕ/Dₙₕ = (Vₕ/Dₕ) * r, with 0 <= r <= 1

    1. On average, non-human animals and extraterrestrial values will be alien to us, their preferences will be orthogonal to ours. I seems likely that on average these futures will contain less value or disvalue than a future with human space-colonization.

    (ii) (Vₙₕ + Dₙₕ) = (Vₕ + Dₕ) * a, with 0 <= a <= 1

    Finally, the expected value of non-human space colonization can be expressed as (by definition):

    (iii) EV(non-human space colonization) = Vₙₕ - Dₙₕ

    Using (i), (ii), and (iii) we get:

    EV(human extinction) = EV(non-human space colonization) * Probability(non-human space colonization) = (Vₙₕ - Dₙₕ) * p = [a * (Vₕ + Dₕ) / ((Vₕ/ Dₕ) * r + 1)] * (r * Vₕ/ Dₕ - 1) * p

    The first term [in square brackets] is always positive. The sign of the second term (in bold) can change depending on whether we were previously optimistic or pessimistic about the future.

    If we were previously pessimistic about the future, we thought:

    Vₕ - Dₕ < 0     ->     Vₕ/ Dₕ < 1

    The second term is negative, EV of human extinction is negative. Compared to the “naive” pessimistic view (assuming EV(human extinction) = 0), pessimists should update their view into the direction of EV(reducing human extinction risk) being less negative.

    If we were previously optimistic about the future, we thought:

    Vₕ - Dₕ > 0     ->     Vₕ/ Dₕ > 1

    Now the second term can be negative, neutral, or positive. Compared to the naive view, future optimists should sometimes be more enthusiastic (if Vₙₕ/ Dₙₕ= r * Vₕ/ Dₕ < 1) and sometimes be less enthusiastic (if Vₙₕ/ Dₙₕ= r * Vₕ/ Dₕ > 1) about extinction risk reduction than they previously were. ↩︎

  51. Let’s define future pessimists as people who judge the expected value of (post-)human space colonization as negative; future optimists analogously. Now consider the example of a non-human civilization significantly worse than human civilization (by our lights), such that future optimists would find it normatively neutral, and future pessimists find it significantly more negative than human civilization. Then future optimists would not update their judgement (compared to before considering the possibility of a non-human animal spacefaring civilization), but pessimists would update significantly into the direction of human extinction risk reduction being positive. ↩︎

  52. E.g. one might think that humanity might be comparatively bad at coordination (compared to e.g. intelligent ants), and so relatively likely to create uncontrolled AI wrong, which might be an exceptionally bad outcome, maybe even worse than an intelligent ant civilization. However, considerations like this seem to require highly specific judgements and are likely not very robust. ↩︎

  53. Section 4.2 is not dependent on a welfarist or even consequentialist view. More generally, it applies to any kind of empirical or moral insight that we might have, which would make us realize that other things than we previously thought are of great moral value or disvalue. ↩︎

  54. For example:

    • The history of an “expanding moral circle” (Singer, 2011), from tribes to nations to all humans…
    • The relatively new notion of environmentalism
    • The new notion of wild animal suffering
    • The new notion of future beings being (astronomically) important (Bostrom, 2003)
  55. Assuming that the side-effects of resources spent for self-regarding preferences of future agents are neutral/symmetric with regards to the beings/things out there (which seems to be a reasonable assumption). ↩︎

  56. Fermi-estimate (wild guesses, again):

    1. Assume a 20% probability that, with more moral and empirical insight, we would conclude that the universe is already filled with beings/things that we morally care about
    2. Assume that the altruistic impact future agents could have is always proportional to the amount of resources spent for altruistic purposes. If the universe is devoid of value or disvalue, then altruistic resources will be spent on creating new value (e.g. happy beings). If the universe is already filled with beings/things that we morally care about, it will likely contain some disvalue. Assume that in these cases, 25% of altruistic resources will be used to reduce this disvalue (and only 75% to create new value). Also assume that resources can be used at the same efficiency e to create new disvalue, or to reduce existing disvalue.
    3. Assume that resources spent for self-regarding preferences of future agents would on average not improve or worsen the situation for the things of (dis)value already out there.
    4. Assume that in expectation, future agents will spend 40 times as many resources pursuing other-regarding preferences parallel to our reflected preferences (“altruistic”) than on pursuing other-regarding preferences anti-parallel to our reflected preferences (“anti-altruistic”). Note that this is compatible with future agents, in expectation, spending most of their resources on other-regarding preferences completely orthogonal to our reflected preferences.
    5. From a disvalue-focused perspective, creation of new value does not matter, only creation of new disvalue, or reduction of already existing disvalue. From such a perspective: (R: total amount of resources spent on parallel or anti-parallel other-regarding preferences).
    • Expected creation of new disvalue = (1/40) * R * e = 2.5% * R * e
    • Expected reduction of already existing disvalue = 20% * 25% * (1-(1/40)) * R * e = 5% * R * e

    Thus, the expected reduction of disvalue through (post-)humanity is 2 times greater than expected creation of disvalue. This is, however, an upper bound. The calculation assumed that the universe contains enough disvalue that future agents could actually spend 25% altruistic resources on alleviating it, before having alleviated it all. In some cases, the universe might not contain that much disvalue, so some resources would go into the creation of value again. ↩︎

  57. Analogous to part 1.2, this part 2.2 is less relevant for people who believe that some of their reflected other-regarding preferences will be so unusual that they will be anti-parallel to most of humanity’s reflected other-regarding preferences. Such a view is e.g. defended by Brian Tomasik in the context of suffering in fundamental physics. Tomasik argues that, even if he (after idealized reflection) and future generation both came around to care for sentience in fundamental physics, and even if future generations were to influence fundamental physics for altruistic reasons, they would still be more likely to do it in a way that increases the vivacity of physics, which Tomasik (after idealized reflection) would oppose. ↩︎

  58. This section draws heavily on Nick Beckstead’s thoughts. ↩︎

  59. Global catastrophes that do not directly cause human extinction may initiate developments that lead to extinction later on. For the purposes of this article, these cases are not different from direct extinction, and are omitted here. ↩︎

  60. E.g. Paul Christiano: “So if modern civilization is destroyed and eventually successfully rebuilt, I think we should treat that as recovering most of Earth’s altruistic potential (though I would certainly hate for it to happen).” In his article, Christiano outlines several empirical and moral judgement calls that lead him to his conclusion, such as:

    • As long a moral reflection and sophistication process is ongoing, which seems likely, civilizations will reach very good values (by his lights).
    • He is willing to discard his idiosyncratic judgements.
    • He directly cares about others’ (reflected) values.
  61. It is of course a question whether one should stick with one’s own preferences, if the majority of reflected and altruistic agents have opposite preferences. According to some empirical and meta-ethical assumptions, one should. ↩︎

  62. Different advocates of strong suffering-focused views come to different judgements on the topic. They all seem to agree that, from a purely suffering-focused perspective, it is not clear whether efforts to reduce the risk of human extinction are positive or negative:

    Lukas Gloor: "it tentatively seems to me that the effect of making cosmic stakes (and therefore downside risks) more likely is not sufficiently balanced by positive effects on stability, arms race prevention and civilizational values (factors which would make downside risks less likely). However, this is hard to assess and may change depending on novel insights.” … “We have seen that efforts to reduce extinction risk (exception: AI alignment) are unpromising interventions for downside-focused value systems, and some of the interventions available in that space (especially if they do not simultaneously also improve the quality of the future) may even be negative when evaluated purely from this perspective.”

    David Pearce: “Should existential risk reduction be the primary goal of: a) negative utilitarians? b) classical hedonistic utilitarians? c) preference utilitarians? All, or none, of the above? The answer is far from obvious. For example, one might naively suppose that a negative utilitarian would welcome human extinction. But only (trans)humans - or our potential superintelligent successors - are technically capable of phasing out the cruelties of the rest of the living world on Earth. And only (trans)humans - or rather our potential superintelligent successors - are technically capable of assuming stewardship of our entire Hubble volume.” … “In practice, I don't think it's ethically fruitful to contemplate destroying human civilisation, whether by thermonuclear Doomsday devices or utilitronium shockwaves. Until we understand the upper bounds of intelligent agency, the ultimate sphere of responsibility of posthuman superintelligence is unknown. Quite possibly, this ultimate sphere of responsibility will entail stewardship of our entire Hubble volume across multiple quasi-classical Everett branches, maybe extending even into what we naively call the past [...]. In short, we need to create full-spectrum superintelligence.”

    Brian Tomasik: “I'm now less hopeful that catastrophic-risk reduction is plausibly good for pure negative utilitarians. The main reason is that some catastrophic risk, such as from malicious biotech, do seem to pose nontrivial risk of causing complete extinction relative to their probability of merely causing mayhem and conflict. So I now don't support efforts to reduce non-AGI "existential risks". [...] Regardless, negative utilitarians should just focus their sights on more clearly beneficial suffering-reduction projects” ↩︎

  63. For example, interventions that aim at improving humanity’s values/increasing the circle of empathy might be highly leveraged and time-sensitive, if humanity achieves goal conservation soon, or values are otherwise sticky. ↩︎

  64. “Positive”/”negative” as defined from a welfarist perspective. ↩︎

  65. Societies may increase the costs, and thereby reducing the frequency, of acts following from negative other-regarding preferences, as long as negative other-regarding preferences are a minority. E.g. if 5% of a society have a other-regarding preference for inflicting suffering on a certain group (of powerless beings), but 95% have a preference against it, in many societal forms less than 5% of people will actually inflict suffering on this group of powerless beings, because there will be laws against it, ... ↩︎

  66. This fact could be interpreted either as human nature that we will revert to, or as a trend of moral progress. The latter seems more likely to us. ↩︎

  67. Another possible operationalization of the ratio between positive and negative other-regarding preferences: How much money is spent on pursuing positive and negative other-regarding preferences?

    • Some state budgets are clearly pursuant to positive other-regarding preferences
    • It is less clear whether there are budgets that are clearly pursuant to negative other-regarding preferences, although at least a part of military spending is.


References that treat human values as units of selection?

9 июня, 2019 - 21:41
Published on June 9, 2019 6:04 PM UTC

When I read AI alignment literature, the author usually seems to be assuming something like: “Human values are fixed, they’re just hard to write down. But we should build intelligent agents that adhere to them.“ Or occasionally: “Human values are messy, mutable things, but there is some (meta-)ethics that clarifies what it is we all want and what kind of intelligent agents we should build.”

This makes it hard for me to engage with the AI alignment discussion, because the assumption I bring to it is: “Values are units of selection, like zebra stripes or a belief in vaccinating your children. You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents. You can only talk about what values win (i.e. persist over more time and space in the future).”

I’ve tried to find references whose authors address this assumption, or write about human values from this relativist/evolutionary perspective, but I’ve come up short. Can anyone point me toward some? Or disabuse me of my assumption?


In Defense of Those Reclusive Authors

9 июня, 2019 - 17:14
Published on June 9, 2019 2:14 PM UTC

Notable Authors of the 20th Century Who Were Introverted

Writing  fiction is usually a solitary profession. Among those individuals who  end up producing art to express themselves, one can logically assume  that there will be many who like to keep their distance from society.  While there have always been writers who are sociable, a few of the  greats were largely solitary and lonely, socially awkward, or even  reclusive individuals.

Apart from the 20th century’s very famous cases of this  type of creator—the Argentinian writer J.L. Borges, the Portuguese  author Fernando Pessoa and the Czech-Jewish allegorist Franz  Kafka—reclusive attitudes and highly introverted interests can be easily  identified in a number of notable artists who merely happen to have  earned less renown. H.P. Lovecraft, with his imagining of a world  populated by primordial monstrosities, or Robert Walzer, who despite  having been one of Kafka’s literary heroes, remains virtually unknown to  this day. Yet he penned hundreds of short stories as well as a few  large novels  which were all about the sense of alienation and lack of  belonging to the world. And Henry James (with his nominally secured  position in the literary canon of 20th century English  literature notwithstanding) who is by now only infrequently referenced  as an insightful anatomist of introversion and co-morbid indifference to  the external world.

Regarding the Degree of Introversion

Pronounced,  evident lack of interest – or at least professed such lack of interest –  about the external world, can be observed in a number of quotes by the  aforementioned writers. During the First World War, Franz Kafka wrote in  his diary that he was then being rewarded for never have been involved  in worldly affairs… Borges – far more reclusive than Kafka – had penned  silent cries, in which he accused his contemporary society of being even  unworthy of suffering in Hell; he argues, that is, that human malice is  just too crude to deserve a metaphysical punishment! Pessoa, who spent  his days as a shadow in the busy streets of downtown Lisbon, working as a  translator for various trading firms, claimed, in one of his most  famous poems, that he put on clothes which didn’t suit him, and was  taken for someone else, and was subsequently lost…

Then and Now

While  in more recent years – primarily, perhaps, due to the ubiquity of  television – writers have at times been presented – some of them  willingly – as another type of media celebrity, in the not so distant  past it was still quite difficult to reach an author from outside the  circuit of the publishing world. Writers used to mostly be identified  through their written work, and it was the norm for a reader to be aware  of an author, to like or even love their work, and yet be fully  ignorant of their physical likeness – and also unaware of most of the  biographical information that by now is routinely accessed; from the  opening pages of the book itself, or from external sources. This isn't  of secondary importance in our examination, given one would scarcely  imagine Pessoa, Lovecraft, or even Kafka, giving a TV interview; and  perhaps many would question even if individuals with so reclusive  personalities would, had they lived now, be offered a publishing deal at  all.

Are Highly Introverted Writers Actually Needed?

Publishing  is a business, and a publishing house is not likely to invest on a  writer’s work if it stands to lose money... And yet an author is  arguably different to a performer of popular art; the latter is mostly  tied to entertainment, while the former – at least in theory –  incorporates a cerebral quality, and aspires to other heights of  artistry. In practice, of course, not all authors differ that  significantly from performance artists; but to – whether actively or  unwittingly – bring about an increase in links between the two  professions, will certainly result in fewer published authors who are  characterized by acute introversion.

Even assuming that the above is true, would it be necessarily a  negative outcome? Does the reader actually stand to gain something  specifically out of reading the fictional work of an introvert, or even a  recluse?

An Allegory as the Epilogue

A  brief answer may be provided, in the form of an allegory: In a group of  travelers, sharing stories, the more original ones would tend to come  from those who ventured further away. One shouldn’t lose patience with  the more estranged story-tellers, for journeys to the most distant lands  can make the traveler lose interest in the homeland; where everyone is  familiar with the geography, the customs and the people’s faces. And  such journeys also can make the person feel that the ties to his  countrymen have been practically severed, and the wondrous information  contained within him, from those distant lands he visited, can’t  actually interest this crowd...

Shouldn’t we, therefore, expect that if such a fellow decides, at  some point, to actually speak, the words we might then listen to could  indeed present us with material that we hadn’t yet the chance to reflect  upon?

After all, a book we take interest in is always going to function as a map to our own, mostly unexplored inner world.

by Kyriakos Chalkopoulos - https://www.patreon.com/posts/in-defense-of-27095565


Learning magic

9 июня, 2019 - 15:29
Published on June 9, 2019 12:29 PM UTC

Magic (primarily misdirection and cold-reading, but also the mechanics like sleight-of-hand) seems like an extremely good case-study in the study of how the human mind works and the predictable ways in which human maps differ from the territory. There are several magicians out there who offer to teach classes and so forth, but are there any who can be vouched for as really "knowing their stuff" as a teacher if I wanted to approach the subject in this light? Relatedly, can anyone vouch for the quality of the Penn and Teller course on the MasterClass platform?

Bonus points for teachers in London who can be hired for small (up to 10 people) groups.