Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 3 минуты назад

Four Ways An Impact Measure Could Help Alignment

8 августа, 2019 - 03:10
Published on August 8, 2019 12:10 AM UTC

Impact penalties are designed to help prevent an artificial intelligence from taking actions which are catastrophic.

Despite the apparent simplicity of this approach, there are in fact a plurality of different frameworks under which impact measures could prove helpful. In this post, I seek to clarify the different ways that an impact measure could ultimately help align an artificial intelligence or otherwise benefit the long-term future.

It is my personal opinion that some critiques of impact are grounded in an intuition that it doesn't help us achieve X, where X is something that the speaker thought impact was supposed to help us with, or is something that would be good to have in general. The obvious reply to these critiques is then to say that it was never intended to do X, and that impact penalties aren't meant to be a complete solution to alignment.

My hope is that in distinguishing the ways that impact penalties can help alignment, I will shed light on why some people are more pessimistic or optimistic than others. I am not necessarily endorsing the study of impact measurements as an especially tractable or important research area, but I do think it's useful to gather some of the strongest arguments for it.

Roughly speaking, I think that that an impact measure could potentially help humanity in at least one of four main scenarios.

1. Designing a utility function that roughly optimizes for what humans reflectively value, but with a recognition that mistakes are possible such that regularizing against extreme maxima seems like a good idea (ie. Impact as a regularizer).

2. Constructing an environment for testing AIs that we want to be extra careful about due to uncertainty regarding their ability to do something extremely dangerous (ie. Impact as a safety protocol).

3. Creating early-stage task AIs that have a limited function, but are not intended to do any large scale world optimization (ie. Impact as an influence-limiter).

4. Less directly, impact measures could still help humanity with alignment because researching them could allow us to make meaningful progress on deconfusion (ie Impact as deconfusion).

Impact as a regularizer

In machine learning a regularizer is a term that we add to our loss function or training process that reduces the capacity of a model in the hopes of being able to generalize better.

One common instance of a regularizer is a scaled .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} L2 norm penalty of the model parameters that we add to our loss function. A popular interpretation of this type of regularization is that it represents a prior over what we think the model parameters should be. For example, in Ridge Regression, this interpretation can be made formal by invoking a Gaussian prior on the parameters.

The idea is that in the absence of vast evidence, we shouldn't allow the model to use its limited information to make decisions that we the researchers understand would be rash and unjustified given the evidence.

One framing of impact measures is that we can apply the same rationale to artificial intelligence. If we consider some scheme where an AI has been the task of undertaking ambitious value learning, we should make it so that whatever the AI initially believes is the true utility function U, it should be extra cautious not to optimize the world so heavily unless it has gathered a very large amount of evidence that U really is the right utility function.

One way that this could be realized is by some form of impact penalty which eventually gets phased out as the AI gathers more evidence. This isn't currently the way that I have seen impact measurement framed. However, to me it is still quite intuitive.

Consider a toy scenario where we have solved ambitious value learning and decide to design an AI to optimize human values in the long term. In this scenario, when the AI is first turned on, it is given the task of learning what humans want. In the beginning, in addition to its task of learning human values, it also tries helping us in low impact ways, perhaps by cleaning our laundry and doing the dishes. Over time, as it gathers enough evidence to fully understand human culture and philosophy, it will have the confidence to do things which are much more impactful, like becoming the CEO of some corporation.

I think that it's important to note that this is not what I currently think will happen in the real world. However, I think it's useful to imagine these types of scenarios because they offer concrete starting points for what a good regularization strategy might look like. In practice, I am not too optimistic about ambitious value learning, but more narrow forms of value learning could still benefit from impact measurements. As we are still somewhat far from any form of advanced artificial intelligence, uncertainty about which methods will work makes this analysis difficult.

Impact as a safety protocol

When I think about advanced artificial intelligence, my mind tends to forward chain from current AI developments, and imagines them being scaled up dramatically. In these types of scenarios, I'm most worried about something like mesa optimization, where in the process of making a model which performs some useful task, we end up searching over a very large space of optimizers that ultimately end up optimizing for some other task which we never intended for.

To oversimplify things for a bit, there are a few ways that we could ameliorate the issue of misaligned mesa optimization. One way is that we could find a way to robustly align arbitrary mesa objectives with base objectives. I am a bit pessimistic about this strategy working without some radical insights, because it currently seems really hard. If we could do that, it would be something which would require a huge chunk of alignment to be solved.

Alternatively, we could whitelist our search space such that only certain safe optimizers could be discovered. This is a task where I see impact measurements could be helpful.

When we do some type of search over models, we could construct an explicit optimizer that forms the core of each model. The actual parameters that we perform gradient descent over would need to be limited enough such that we could still transparently see what type of "utility function" is being inner optimized, but not so limited that the model search itself would be useless.

If we could constrain and control this space of optimizers enough, then we should be able to explicitly add safety precautions to these mesa objectives. The exact way that this could be performed is a bit difficult for me to imagine. Still, I think that as long as we are able to perform some type of explicit constraint on what type of optimization is allowed, then it should be possible to penalize mesa optimizers in a way that could potentially avoid catastrophe.

During the process of training, the model will start unaligned and gradually shift towards performing better on the base objective. At any point during the training, we wouldn't want the model to try to do anything that might be extremely impactful, both because it will initially be unaligned, and because we are uncertain about the safety of the trained model itself. An impact penalty could thus help us to create a safe testing environment.

The intention here is not that we would add some type of impact penalty to the AIs that are eventually deployed. It is simply that as we perform the testing, there will be some limitation on much power we are giving the mesa optimizers. Having a penalty for mesa optimization can then be viewed as a short term safety patch in order to minimize the chances that an AI does something extremely bad that we didn't expect.

It is perhaps at first hard to see how an AI could be dangerous during the training process. But I believe that there is good reason to believe that as our experiments get larger, they will require artificial agents to understand more about the real world while they are training, which incurs significant risk. There are also specific predictable ways in which a model being trained could turn dangerous, such as in the case of deceptive alignment. It is conceivable that having some way to reduce impact for optimizers in these cases will be helpful.

Impact as an influence-limiter

Even if we didn't end up putting an impact penalty directly into some type of ambitiously aligned AGI, or use it as a safety protocol during testing, there are still a few disjunctive scenarios in which impact measures could help construct limited AIs. A few examples would be if we were constructing Oracle AIs and Task AGIs.

Impact measurements could help Oracles by cleanly providing a separation between "just giving us true important information" and "heavily optimizing the world in the process." This is, as I understand, one of the main issue with Oracle alignment at the moment, which means that intuitively an impact measurement could be quite helpful in that regard.

One rationale for constructing a task AGI is that it allows humanity to perform some type of important action which buys us more time to solve the more ambitious varieties of alignment. I am personally less optimistic about this particular solution to alignment, as in my view it would require a very advanced form of coordination of artificial intelligence. In general I incline towards the view that competitive AIs will take the form of more service-specific machine models, which might imply that even if we succeeded at creating some low impact AGI that achieved a specific purpose, it wouldn't be competitive with the other AIs which that themselves have no impact penalty at all.

Still, there is a broad agreement that if we have a good theory about what is happening within an AI then we are more likely to succeed at aligning it. Creating agentic AIs seems like a good way to have that form of understanding. If this is the route that humanity ends up taking, then impact measurements could provide immense value.

This justification for impact measures is perhaps the most salient in the debate over impact measurements. It seems to be behind the critique that impact measurements need to be useful rather than just safe and value-neutral. At the same time, I know from personal experience that there at least one person currently thinking about ways we can leverage current impact penalties to be useful in this scenario. Since I don't have a good model for how this can be done, I will refrain from specific rebuttals of this idea.

Impact as deconfusion

The concept of impact appears to neighbor other relevant alignment concepts, like mild optimization, corrigibility, safe shutdowns, and task AGIs. I suspect that even if impact measures are never actually used in practice, there is still some potential that drawing clear boundaries between these concepts will help clarify approaches for designing powerful artificial intelligence.

This is essentially my model for why some AI alignment researchers believe that deconfusion is helpful. Developing a rich vocabulary for describing concepts is a key feature of how science advances. Particularly clean and insightful definitions help clarify ambiguity, allowing researchers to say things like "That technique sounds like it is a combination of X and Y without having the side effect of Z."

A good counterargument is that there isn't any particular reason to believe that this concept requires priority for deconfusion. It would be bordering on a motte and bailey to claim that some particular research will lead to deconfusion and then when pressed I appeal to research in general. I am not trying to do that here. Instead, I think that impact measurements are potentially good because they focus attention on a subproblem of AI, in particular catastrophe avoidance. And I also think there has empirically been demonstrable progress in a way that provides evidence that this approach is a good idea.

Consider David Manheim and Scott Garrabrant's Categorizing Variants of Goodhart's Law. For those unaware, Goodhart's law is roughly summed up in the saying "Whenever a measure becomes a target, it ceases to become a good measure." This paper tries to catalog all of the different cases which this phenomenon could arise. Crucially, it isn't necessary for the paper to actually present a solution to Goodhart's law in order to illuminate how we could avoid the issue. By distinguishing ways in which the law holds, we can focus on addressing those specific sub-issues rather than blindly coming up with one giant patch for the entire problem.

Similarly, the idea of impact measurement is a confusing concept. There's one interpretation in which an "impact" is some type of distance between two representations of the world. In this interpretation, saying that something had a large impact is another way of saying that the world changed a lot as a result. In newer interpretations of impact, we like to say that an impact is really about a difference in what we are able to achieve.

A distinction between "difference in world models" and "differences in what we are able to do" is subtle, and enlightening (at least to me). It allows a new terminology in which I can talk about the impact of artificial intelligence. For example, in Nick Bostrom's founding paper on existential risk studies, his definition for existential risk included events which could

permanently and drastically curtail [humanity's] potential.

One interpretation of this above definition is that Bostrom was referring to potential in the sense of the second definition of impact rather than the first.

A highly unrealistic way that this distinction could help us is if we had some future terminology which allowed us to unambiguously ask AI researchers to "see how much impact this new action will have on the world." AI researchers could then boot up an Oracle AI and ask the question in a crisply formalized framework.

More realistically, the I could imagine that the field may eventually stumble on useful cognitive strategies to frame the alignment problem such that impact measurement becomes a convenient precise concept to work with. As AI gets more powerful, the way that we understand alignment will become nearer to us, forcing us to quickly adapt our language and strategies to the specific evidence we are given.

Within a particular subdomain, I think an AI researcher could ask questions about what they are trying to accomplish, and talk about it using the vocabulary of well understood topics, which could eventually include impact measurements. The idea of impact measurement is simple enough that it will (probably) get independently invented a few times as we get closer to powerful AI. Having thoroughly examined the concept ahead of time rather than afterwards offers future researchers a standard toolbox of precise, deconfused language.

I do not think the terminology surrounding impact measurements will ever quite reach the ranks of terms like "regularizer" or "loss function" but I do have an inclination to think that simple and common sense concepts should be rigorously defined as the field advances. Since we have intense uncertainty about the type of AIs that will end up being powerful, or about the approaches that will be useful, it is possibly most helpful at this point in time to develop tools which can reliably be handed off for future researchers, rather than putting too much faith into one particular method of alignment.



Discuss

In defense of Oracle ("Tool") AI

7 августа, 2019 - 17:21
Published on August 7, 2019 2:21 PM UTC

Low confidence; offering this up for discussion

An Oracle AI is an AI that only answers questions, and doesn't take any other actions. The opposite of an Oracle AI is an Agent AI, which might also send emails, control actuators, etc.

I'm especially excited about the possibility of non-self-improving oracle AIs, dubbed Tool AI in a 2012 article by Holden Karnofsky.

I've seen two arguments against this "Tool AI":

  • First, as in Eliezer's 2012 response to Holden, we don't know how to safely make and operate an oracle AGI (just like every other type of AGI). Fair enough! I never said this is an easy solution to all our problems! (But see my separate post for why I'm thinking about this.)
  • Second, as in Gwern's 2016 essay, there's a coordination problem. Even if we could build a safe oracle AGI, the argument goes, there will still be an economic incentive to build an agent AGI, because you can do more and better and faster by empowering the AGI to take actions. Thus, agreeing to never ever build agent AGIs is a very hard coordination problem for society. I don't find the coordination argument compelling—in fact, I think it's backwards—and I wrote this post to explain why.
Five reasons I don't believe the coordination / competitiveness argument against oracles

1. If the oracle isn't smart or powerful enough for our needs, we can solve that by bootstrapping. Even if the oracle is not inherently self-modifying, we can ask it for advice and do human-in-the-loop modifications to make more powerful successor oracles. By the same token, we can ask an oracle AGI for advice about how to design a safe agent AGI.

2. Avoiding coordination problems is a pipe dream; we need to solve the coordination problem at some point, and that point might as well be at the oracle stage. As far as I can tell, we will never get to a stage where we know how to build safe AGIs and where there is no possibility of making more-powerful-and-less-safe AGIs. If we have a goal in the world that we really really want to happen, a low-impact agent is going to be less effective than a not-impact-restrained agent; an act-based agent is going to be less effective than a goal-seeking agent;[1] and so on and so forth. It seems likely that, no matter how powerful a safe AGI we can make, there will always be an incentive for people to try experimenting with even more powerful unsafe alternative designs.

Therefore, at some point in AI development, we have to blow the whistle, declare that technical solutions aren't enough, and we need to start relying 100% on actually solving the coordination problem. When is that point? Hopefully far enough along that we realize the benefits of AGI for humanity—automating the development of new technology to help solve problems, dramatically improving our ability to think clearly and foresightedly about our decisions, and so on. Oracles can do all that! So why not just stop when we get to AGI oracles?

Indeed, once I started thinking along those lines, I actually see the coordination argument going in the other direction! I say restricting ourselves to oracle AI make coordination easier, not harder! Why is that? Two more reasons:

3. We want a high technological barrier between us and the most dangerous systems: These days, I don't think anyone takes seriously the idea of building an all-powerful benevolent dictator AGI implementing CEV. At least as far as I can tell from the public discourse, there seems to be a growing consensus that humans should always and forever be in the loop of AGIs. (That certainly sounds like a good idea to me!) Thus, the biggest coordination problem we face is: "Don't ever make a human-out-of-the-loop free-roaming AGI world-optimizer." This is made easier by having a high technological barrier between the safe AGIs that we are building and using, and the free-roaming AGI world-optimizers that we are forbidding. If we make an agent AGI—whether corrigible, aligned, norm-following, low-impact, or whatever—I just don't see any technological barrier there. It seems like it would be trivial for a rogue employee to tweak such an AGI to stop asking permission, deactivate the self-restraint code, and go tile the universe with hedonium at all costs (or whatever that rogue employee happens to value). By contrast, if we stop when we get to oracle AI, it seems like there would be a higher technological barrier to turning it into a free-roaming AGI world-optimizer—probably not that high a barrier, but higher than the alternatives. (The height of this technological barrier, and indeed whether there's a barrier at all, is hard to say.... It probably depends on how exactly the oracles are constructed and access-controlled.)

4. We want a bright-line, verifiable rule between us and the most dangerous systems: Even more importantly, take the rule:

"AGIs are not allowed to do anything except output pixels onto a screen."

This is a nice, simple, bright-line rule, which moreover has at least a chance of being verifiable by external auditors. By contrast, if we try to draw a line through the universe of agent AGIs, defining how low-impact is low-impact enough, how act-based is act-based enough, and so on, it seems to me like it would inevitably be a complicated, blurry, and unenforceable line. This would make a very hard coordination problem very much harder still.

[Clarifications on this rule: (A) I'm not saying this rule would be easy to enforce (globally and forever), only that it would be less hard than alternatives; (B) I'm not saying that, if we enforce this rule, we are free and clear of all possible existential risks, but rather that this would be a very helpful ingredient along with other control and governance measures; (C) Again, I'm presupposing here that we succeed in making superintelligent AI oracles that always give honest and non-manipulative answers; (D) I'm not saying we should outlaw all AI agents, just that we should outlaw world-modeling AGI agents. Narrow-AI robots and automated systems are fine. (I'm not sure exactly how that line would be drawn.)]

Finally, one more thing:

5. Maybe superintelligent oracle AGI is "a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that." (copying from this Paul Christiano post). I hate this argument. It's a cop-out. It's an excuse to recklessly plow forward with no plan and everything at stake. But I have to admit, it seems to have a kernel of truth...

  1. See Paul's research agenda FAQ section 0.1 for things that act-based agents are unlikely to be able to do. ↩︎



Discuss

Self-Supervised Learning and AGI Safety

7 августа, 2019 - 17:21
Published on August 7, 2019 2:21 PM UTC

Abstract: We should seriously consider the possibility that we'll build AGIs by self-supervised learning. If so, the AGI safety issues seem to be in many respects different (and I think more promising) than in the usual reinforcement learning paradigm. In particular, I'll propose that if we follow certain constraints, these systems can be turned into safe, unambitious AGI oracles. I'll end with lots of open questions.

Epistemic status: Treat as brainstorming. This post supersedes my previous post on this general topic, The Self-Unaware AI Oracle for reasons mentioned below.

What is self-supervised learning, and why might it lead to AGI?

Self-supervised learning consists of taking data, masking off part of it, and training an ML system to use the unmasked data to predict the masked data. To make the predictions better and better, the system needs to develop an increasingly deep and comprehensive semantic understanding of the world. For example, say there's a movie with a rock falling towards the ground. If you want to correctly predict what image will be in the frame a few seconds ahead, you need to predict that the rock will stop when it gets to the ground, not continue falling. Or if there's a movie with a person saying "I'm going to sit down now", you need to predict that they might well sit in a chair, and probably won't start dancing.

Thus, predicting something has close relationship with understanding it. Imagine that you're in a lecture class on a topic where you're already an expert. As the professor is talking, you feel you can often finish their sentences. By contrast, when you're lost in a sea of jargon you don't understand, you can only finish their sentences with a much wider probability distribution ("they're probably going to say some jargon word now").

The term "self-supervised learning" (replacing the previous and more general term "unsupervised learning") seems to come from Yann LeCun, chief AI scientist at Facebook and co-inventor of CNNs. As he wrote here:

I now call it "self-supervised learning", because "unsupervised" is both a loaded and confusing term.

In self-supervised learning, the system learns to predict part of its input from other parts of it input. In other words a portion of the input is used as a supervisory signal to a predictor fed with the remaining portion of the input.

Self-supervised learning uses way more supervisory signals than supervised learning, and enormously more than reinforcement learning. That's why calling it "unsupervised" is totally misleading. That's also why more knowledge about the structure of the world can be learned through self-supervised learning than from the other two paradigms: the data is unlimited, and amount of feedback provided by each example is huge.

Self-supervised learning has been enormously successful in natural language processing...So far, similar approaches haven't worked quite as well for images or videos because of the difficulty of representing distributions over high-dimensional continuous spaces.

Doing this properly and reliably is the greatest challenge in ML and AI of the next few years in my opinion.

As mentioned in this quote, we have a couple good examples of self-supervised learning. One is humans. Here's Yann LeCun again:

Most of human and animal learning is [self]-supervised learning. If intelligence was a cake, [self]-supervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake.

Indeed, the brain is constantly making probabilistic predictions about everything it is going to see, hear, and feel, and updating its internal models when those predictions are wrong. Have you ever taken a big drink from a glass of orange juice that you had thought was a glass of water? Or walked into a room and immediately noticed that the table is in the wrong place? It's because you were predicting those sensations in detail before they happened. This is a core part of what the human brain does, and what it means to understand things.[1] Self-supervised learning not the only ingredient in human intelligence—obviously we also have internal and external rewards, environmental interaction, and so on—but I'm inclined to agree with Yann LeCun that self-supervised learning is the cake.[2]

Beyond humans, language models like GPT-2 are a nice example of how far self-supervised learning can go, even when restricted to today's technology and an impoverished text-only datastream. See the SSC post GPT-2 as step towards general intelligence.

Thus, this post is nominally about the scenario where we make superintellignent AGI by 100% self-supervised learning (all cake, no cherries, no icing, in Yann LeCun's analogy). I consider this a live possibility but far from certain. I'll leave to future work the related scenario of "mostly self-supervised learning plus some reinforcement learning or other techniques". (See Open Questions below.)

What does self-supervised learning look like under the hood? GPT-2 and similar language models are based on a Transformer neural net, trained by stochastic gradient descent as usual. The brain, to my limited understanding, does self-supervised learning (mainly using the neocortex, hippocampus, and thalamus) via a dozen or so interconnected processes, some of which are massively parallelized (the same computational process running in thousands of locations at once).[3] These processes are things vaguely like: "If a spatiotemporal pattern recurs several times in a certain context, assign that pattern a UUID, and pass that UUID upwards to the next layer in the hierarchy", and "Pull two items out of memory, look for a transformation between them, and catalog it if it is found".[4]

If I had to guess, I would expect that the ML community will eventually find neural net data structures and architectures that can do these types of brain-like processes—albeit probably not in exactly the same way that the brain does them—and we'll get AGI shortly thereafter. I think it's somewhat less likely, but not impossible, that we'll get AGI by just, say, scaling up the Transformer to some ridiculous size. Either way, I won't speculate on how long it might take to get to AGI, if this is the path. But my vague impression is that self-supervised learning is a hot research area undergoing rapid progress.

Impact for safety: General observations 1. Output channel is not part of the training

In the reinforcement learning (RL) path to AGI, we wind up creating a system consisting of a world-model attached to an output channel that reaches out into the world and does useful things, and we optimize this whole system. By contrast, in the Self-Supervised Learning path to AGI, we wind up by default with more-or-less just a bare world model. At least, that's the way I'm thinking about it. I mean, sure, it has an output channel in the sense that it can output predictions for masked bits of a file, but that's incidental, not really part of the AGI's primary intended function, like answering questions or doing things. We obviously need some sort of more useful output, but whatever that output is, it's not involved in the training. This opens up some options in the design space that haven't been well explored.

My last post, The Self-Unaware AI Oracle, is a rather extreme example of that, where we would try to build a world model that not only has no knowledge that it is attached to an output channel, but doesn't even know it exists as an information-processing system! My (admittedly vague) proposal for that was to isolate the world-model from any reflective information about how the world-model is being built and analyzed. This is an extreme example, and after further thought I now think it's entirely unnecessary, but I still think it's a nice example of new things we might do in this different design space.[5]

2. No need for a task-specific reward signal

Let's say we want to build a question-answering system by RL. Well, we need to train it by asking questions, and rewarding it for a good answer. This is tricky, to put it mildly. The questions we really care about are ones where we don't necessarily know the answer. How do we get lots of good data? What if we make mistakes in the training set? What if the system learns to optimize a proxy to the reward function, rather than the reward function itself? What if it learns to manipulate the reward function, e.g. by self-fulfilling prophecies? I don't want to say these are impossible tasks, and I'm glad lots of people are working on them. But the self-supervised learning approach seems to largely skirt all these issues.

For self-supervised learning, the paradigm is a bit different. There is, of course, a reward signal guiding the machine learning, namely "predict the masked bits in the input data, using the unmasked bits". But we treat that as purely a method to construct a world-model. Then we ask a separate question: "Now that we have a predictive world-model, how do we use it to make the world a better place"? This step is mainly about building an interface, not building an intelligence. Now, I don't think this task is by any means easy (see "Open questions" below), but I do suspect that it's easier than the various RL challenges above.

3. The world-model acquires some safety-enhancing knowledge automatically
  • Natural-language bridge into the world-model: When we build a world model by self-supervised learning of human-created content, the world model should wind up with an understanding of the real world, an understanding of human language and concepts, and (importantly) an understanding of how these two map onto each other. This should be very helpful for using the world model to answer our questions, and might also offer some (limited) degree of interpretability of the unimaginably complicated guts of the world-model.

  • Understanding of typical human norms, behaviors, and values: When we build a world model by self-supervised learning of human-created content, the world model should wind up with an excellent predictive understanding of typical human behavior. So let's say we can ask it a counterfactual question like If there were no AGIs in the world, what's the likeliest way that a human might find a cure for Alzheimer's? (Put aside for now the issue of how we ask the question—see Open Questions below.) The answer to that question would be a kind of overlap between real-world compatibility (the cure actually works) and human compatibility (it should be the kind of "cures for Alzheimer's" that humans want and would plausibly be looking for). That's what we want! By contrast, a superintelligent biologist with alien motivation system would be likelier to "cure Alzheimer's" by some method that is impractical, or that exploits loopholes in our definition of "cure", etc. (So this incidentally also helps with Goodhart's law.)

Constraints under which a self-supervised-learning world-model can be built into a safe AGI system

A system that can make great predictions of masked bits in a data file is not really useful, and not really something I would call an AGI. We need to build something around the world model, so that it can answer questions or take actions.

Once we start moving in this direction, I think it's very easy to wander into dangerous territory. But, as a starting point, I offer this set of constraints. I believe that if we build an AGI using 100% self-supervised learning, and we follow all these constraints, then the system will be safe—more specifically, it will not do not wield its world model to do foresighted planning towards real-world consequences. Here's my list of constraints:

  • The system is used as an oracle, i.e. it answers questions but does not take any other actions. (See my separate post: In defense of Oracle ("Tool") AI.)
  • We only ask non-self-referential counterfactual questions along the lines of: If there were no AGIs in the world, what's a likely way that someone might design a better solar cell? This avoids the various problem of self-fulfilling prophecies, such as discussed by Stuart Armstrong in this and follow-up posts, as well as steering towards solutions compatible with human norms and values as discussed above.
  • We fix the training data (what data to look at and what bits to predict, in what order) before we ever start training the system, then we "lock down" the world-model after the training (as opposed to letting it evolve further during question-and-answer operation), and moreover do not allow the oracle to do anything except output answers (no follow-up questions, no asking for more data about some topic, no running simulations in COMSOL, etc.).
  • For asking questions and getting answers, we don't try to build a new interface into the world model, but rather build some wrapper around its existing predict-masked-bits interface. (It's not obvious that we can build such a wrapper that works well, but I'm hopeful, and see Open Questions below.)

I'm pretty sure some of these constraints are unnecessarily strict; and conversely, I may be forgetting or leaving off some important requirement.

Why won't it try to get more predictable data?

Why do I think that such an AGI would be safe? (Again, I mean more narrowly that it won't wield its world model to do foresighted planning towards real-world consequences—which is the worst, but not only, thing that can go wrong.) Here's one possible failure mode that I was thinking about. I'm sure some readers are thinking it too:

Oh, it optimizes its ability to predict masked bits from a file, eh? Well it's going to manipulate us, or threaten us, or hack itself, to get easy-to-predict files of all 0's!!

I say that this will not happen with any reasonable self-supervised learning algorithm that we'd be likely to build, if we follow the constraints above.

Gradient descent is a particularly simple example to think about here. In gradient descent, when the world-model makes a bad prediction, the model is edited such that the updated model would (in retrospect) have been likelier to get the right answer for that particular bit. Since the optimization pressure is retrospective ("backwards-facing" in Stuart Armstrong's terminology), we are not pushing the system to do things like "seeking easier-to-predict files", which help with forward-looking optimization but are irrelevant to retrospective optimization. Indeed, even if the system randomly stumbled upon that kind of forward-looking strategy, further gradient descent steps would be just as likely to randomly discard it again! (Remember, I stipulated above that we will fix in advance what the training data will be, and in what order it will be presented.)

(Neuromorphic world-model-building algorithms do not work by gradient descent, and indeed we don't really know exactly how they would work, but from what we do know, I think they also would be unlikely to do forward-looking optimization.)

The algorithm will, of course, search ruthlessly for patterns that shed light on the masked bits. If it takes longer to read 1s from RAM than 0s (to take a silly example), and if the pattern-finder has access to timing information, then of course it will find the pattern and start making better predictions on that basis. This is not a malign failure mode, and if it happens, we can fix these side channels using cybersecurity best practices.

Open questions (partial list!)
  • Input and output: How exactly do we give the system input and output? Predicting masked bits has obvious shortcomings, like that it naturally tries to predict what a flawed humans might say on some topic, rather than what the truth is. This problem may be solvable by doing things like flagging some inputs as highly reliable, and trusting the system to automatically learn the association of that flag with truth and insight, and then priming the system with that flag when we ask it questions. Or would it be better to build a new, separate interface into the guts of the world model? If so, how?
  • Sculpting the world model: There may be too much information in the world for our best algorithms to learn it all. In humans, we use attention; thus an entomologist and botanist can look at the same picture of a garden but enrich their world-models in very different directions. (When I look at a picture of a garden, I zone out and learn nothing whatsoever.) Can we flag some masked bits as really important to "think about" longer and harder before making a prediction? Or what?
  • Training data: Exactly what data should we feed it? Is there dangerous data that we need to censor? Is there unhelpful data that makes it dumber? Is text enough to get AGI, or does there also need to be audio and video? (People can be perfectly intelligent without sight or hearing...) Books, articles, and YouTube are obvious sources of training data; are there less obvious but important data sources we should also be thinking about? How does the choice of training data impact safety and interpretability?
  • Adding in supervised or reinforcement learning: I've been assuming that we use 100% self-supervised learning to build the AGI. Can we sprinkle in some supervised or reinforcement learning, e.g. as a fine-tuning step, as a way to build an interface into the world model, or as a kind of supervisor to the main training? If so, how, and how would that impact safety and capabilities?
  • Capabilities: Would a system subject to the constraints listed above nevertheless be powerful enough to do the things we want AGIs to do, and if not, can we relax some of those constraints? Can a world model get bigger and richer forever, or will it grind to a halt after deeply understanding the first 300,000 journal articles it reads? More generally, how do these things scale?
  • Agency: Can we safely give it a bit of agency in requesting more information, or asking follow-up questions, or interfacing with other software or databases? Or more boldly, can we safely give it a lot of agency, e.g. a 2-way internet connection, and if so, how?
  • Experiments: Can we shed light on any of these questions by playing with GPT-2 or other language models? Is there some other concrete example we can find or build? Like, is there any benefit to doing self-supervised learning on a corpus of text and movies talking about the Game Of Life universe?
  • Safety: I mentioned a couple possible failure mode above, and said that I didn't think they were concerning. But this obviously needs much more careful thought and analysis before we can declare victory. What are other failure modes we should be thinking about?
  • Timelines and strategy: If we get AGI by 100% self-supervised learning, how does that impact takeoff speed, timelines, likelihood of unipolar vs multipolar scenarios, CAIS-like suites of narrow systems versus monolithic all-purpose systems, etc.?
Conclusion

I hope I've made the point that self-supervised learning scenarios are plausible, and bring forth a somewhat different and neglected set of issues and approaches in AGI safety. I hope I'm not the only one working on it! There's tons of work to do, and god knows I can't do it myself! (I have a full-time job...) :-)

  1. See Andy Clark's Surfing Uncertainty (or SSC summary), or Jeff Hawkins On Intelligence, for more on this. ↩︎

  2. Here are some things guiding my intuition on the primacy of self-supervised learning in humans: (1) In most cultures and most of human history, children have been largely ignored by adults, and learn culture largely by watching adults. They'll literally just sit for an hour straight, watching adults work and interact. This seems to be an instinct that does not properly develop in modern industrialized societies! (See Anthropology of Childhood by Lancy.) (2) Babies' understanding of the world is clearly developing long before they have enough motor control to learn things by interacting (see discussion in Object permanence on wikipedia, though I suppose one could also give the credit to brain development in general rather than self-supervised learning in particular). (3) By the same token, some kids (like me!) start talking unusually late, but then almost immediately have age-appropriate language skills, e.g. speaking full sentences. (See the book Einstein Syndrome, or summary on wikipedia.) I take this as evidence that we primarily learn to speak by self-supervised learning, not trial-and-error. (4) If you read math textbooks, and try to guess how the proofs are going to go before actually reading them, that seems like a pretty good way to learn the content. ↩︎

  3. See The Brain as a Universal Learning Machine and my post on Jeff Hawkins. ↩︎

  4. The first of these is vaguely paraphrasing Jeff Hawkins, the latter Doug Hofstadter ↩︎

  5. I haven't completely lost hope in self-unaware designs, but I am now thinking it's likely that cutting off reflective information might make it harder to build a good world-modeler—for example it seems useful for a system to know how it came to believe something. More importantly, I now think self-supervised learning is capable of building a safe AGI oracle even if the system is not self-unaware, as discussed below. ↩︎



Discuss

Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

7 августа, 2019 - 01:22
Published on August 6, 2019 10:22 PM UTC

There seems to be some amount of consensus that people working on AI safety (at least within fairly mainstream ML / AI paradigms) shouldn't worry much about the effects of their projects on AI capabilities. New researchers might even try to push capabilities research forward, to build career capital. The best argument for this, IMO, is that someone else is probably going to do it if you don't, likely within the next 6 months (given the current pace of research).

I mostly agree with this view, but I do still think a bit about effects of my research on capabilities, and think others should as well. Being concerned about advancing capabilities has, in the past, moved me away from pursuing ambitious capabilities projects which might have been very good for my career if they paid off, but I always saw someone else do the thing I was considering soon afterwards anyways...

But as far as I know, nobody has tried to evaluate this question thoroughly and systematically. This is concerning, because it seems like current attitudes could plausibly be a result of motivated reasoning (i.e. "I want to keep doing my research, and probably would do so even if I saw a compelling case against it") and groupthink ("nobody else is worrying about this"). I'm not sure it's really tractable, but I think it could be worth ~1-4 people spending a bit of time (possibly up to ~6-24 months, if it ends up looking tractable after some initial thought/investigation) on trying to do a fairly comprehensive treatment of this question.

The main deliverables could be practical guidelines for AI safety researchers, e.g.:

  • Figuring out when it makes sense to be concerned about advancing AI capabilities via ones' research.
  • How to decide how significant those concerns are, and whether they should preclude working on that line of research or research project, or change the publication model for them.

The project could intersect with current "dual-use" considerations (e.g. RE GPT-2).

(also worth mentioning): I know MIRI now has secret research, and I think they have a reasonable case for that, since they aren't in the mainstream paradigms. I do think it would be good for them to have a "hit publication" within the ML community, and might be worth pushing some out-of-the-box ideas which might advance capabilities. The reason is that MIRI has very little credibility, or even name recognition in the ML community, ATM, and I think it would be a big deal in terms of "perception of AI safety concerns within the ML community" if that changed. And I think the ML communities perceptions are important, because the ML community's attitude seems of critical importance for getting good Xrisk reduction policies in place (IIRC, I talked to someone at MIRI who disagreed with that perspective).

The idea to write this post came out of discussion with Joe Collman.




Discuss

Subagents, neural Turing machines, thought selection, and blindspots

7 августа, 2019 - 00:15
Published on August 6, 2019 9:15 PM UTC

In my summary of Consciousness and the Brain (Dehaene, 2014), I briefly mentioned that one of the functions of consciousness is to carry out artificial serial operations; or in other words, implement a production system (equivalent to a Turing machine) in the brain.

While I did not go into very much detail about this model in the post, I’ve used it in later articles. For instance, in Building up to an Internal Family Systems model, I used a toy model where different subagents cast votes to modify the contents of consciousness. One may conceptualize this as equivalent to the production system model, where different subagents implement different production rules which compete to modify the contents of consciousness.

In this post, I will flesh out the model a bit more, as well as applying it to a few other examples, such as emotion suppression, internal conflict, and blind spots.

Evidence accumulation

Dehaene has outlined his model in a pair of papers (Zylberberg, Dehaene, Roelfsema, & Sigman, 2011; Dehaene & Sigman, 2012), though he is not the first one to propose this kind of a model. Daniel Dennett’s Consciousness Explained (1991) also discusses consciousness as implementing a virtual Turing machine; both cite as examples earlier computational models of the mind, such as Soar and ACT, which work on the same principles.

An important building block in Dehane’s model is based on what we know about evidence accumulation and decision-making in the brain, so let’s start by taking a look at that.

Sequential sampling models (SSMs) are a family of models from mathematical psychology that have been developed since the 1960s (Forstmann, Ratcliff, & Wagenmakers, 2016). A particularly common SSM is the diffusion decision model (DDM) of decision-making, in which a decision-maker is assumed to noisily accumulate evidence towards a particular choice. Once the evidence in favor of a particular choice meets a decision threshold, that choice is taken.

A DDM is a simple model with just four parameters: starting point bias (a person may start biased towards one particular alternative), a decision threshold, a drift rate, and non-decision time (when measuring e.g. reaction times, a constant delay introduced by factors such as perceptual processing which take time but are not involved in the decision process itself).

These parameters can be measured from behavioral experiments, and the model manages to fit a wide variety of behavioral experiments and intuitive phenomena well (Forstmann et al., 2016; Ratcliff, Smith, Brown, & McKoon, 2016; Roberts & Hutcherson, 2019). For example, stronger evidence in favor of a particular decision is reflected in a faster drift rate towards the decision threshold, causing faster decisions. On the other hand, making mistakes or being falsely told that one’s performance on a trial is below that of most other participants prompts caution, increasing people’s decision thresholds and slowing down response times (Roberts & Hutcherson, 2019).

While the models have been studied the most in the context of binary decisions, one can easily extend the model to a choice between n alternatives by assuming the existence of multiple accumulators, each accumulating decision towards their own choice, possibly inhibiting the others in the process. Neuroscience studies have identified structures which seem to correspond to various parts of SSMs. For example, in random dot motion tasks, where participants have to indicate the direction that dots on a screen are moving in,

the firing rates of direction selective neurons in the visual cortex (area MT/V5) exhibit a roughly linear increase (or decrease) as a function of the strength of motion in their preferred (or anti-preferred) direction. The average firing rate from a pool of neurons sharing similar direction preferences provides a time varying signal that can be compared to an average of another, opposing pool. This difference can be positive or negative, reflecting the momentary evidence in favor of one direction and against the other. (Shadlen & Shohamy, 2016)

Shadlen & Shohamy (2016) note that experiments on more “real-world” decisions, such as decisions on which stock to pick or which snack to choose, also seem to be compatible with an SSM framework. However, this raises a few questions. For instance, it makes intuitive sense why people would take more time on a random motion task when they lose confidence: watching the movements for a longer time accumulates more evidence for the right answer, until the decision threshold is met. But what is the additional evidence that is being accumulated in the case of making a decision based on subjective value?

The authors make an analogy to a symbol task which has been studied in rhesus monkeys. The monkeys need to decide between two choices, one of which is correct. For this task, they are shown a series of symbols, each of which predicts one of the choices as being correct with some probability. Through experience, the monkeys come to learn the weight of evidence carried by each symbol. In effect, they are accumulating evidence not by motion discrimination but memory retrieval: retrieving some pre-learned association between a symbol and its assigned weight. This “leads to an incremental change in the firing rate of LIP neurons that represent the cumulative [log likelihood ratio] in favor of the target in its response field”.

The proposal is that humans make choices based on subjective value using a similar process: by perceiving a possible option and then retrieving memories which carry information about the value of that option. For instance, when deciding between an apple and a chocolate bar, someone might recall how apples and chocolate bars have tasted in the past, how they felt after eating them, what kinds of associations they have about the healthiness of apples vs. chocolate, any other emotional associations they might have (such as fond memories of their grandmother’s apple pie) and so on.

Shadlen & Shohamy further hypothesize that the reason why the decision process seems to take time is that different pieces of relevant information are found in physically disparate memory networks and neuronal sites. Access from the memory networks to the evidence accumulator neurons is physically bottlenecked by a limited number of “pipes”. Thus, a number of different memory networks need to take turns in accessing the pipe, causing a serial delay in the evidence accumulation process.


The biological Turing machine

In Consciousness and the Brain, Dehaene considers the example of doing arithmetic. Someone who is calculating something like 12 * 13 in their head, might first multiply 10 by 12, keep the result in memory, multiply 3 by 12, and then add the results together. Thus, if a circuit in the brain has learned to do multiplication, consciousness can be used to route its results to a temporary memory storage, with those results then being routed from the storage to a circuit that does addition.

Production systems in AI are composed of if-then rules (production rules) which modify the contents of memory: one might work by detecting the presence of an item like “10 * 12” and rewriting it as “120”. On a conceptual level, the brain is proposed to do something similar: various contents of consciousness activate neurons storing something like production rules, which compete to fire. The first one to fire gets to apply its production, changing the contents of consciousness.

If I understand Deheane’s model correctly, he proposes to apply the neural mechanisms discussed in the previous sections - such as neuron groups which accumulate evidence towards some kind of decision - at a slightly lower level. In the behavioral experiments, there are mechanisms which accumulate evidence towards which particular physical actions to take, but a person might still be distracted by unrelated thoughts while performing that task. Dehaene’s papers look at the kinds of mechanisms choosing what thoughts to think. That is, there are accumulator neurons which take “actions” to modify the contents of consciousness and working memory.

We can think of this as a two-stage process:

  1. A process involving subconscious “decisions” about what thoughts to think, and what kind of content to maintain in consciousness. Evidence indicating the kind of conscious content is most suited for the situation is in part based on hardwired priorities, and in part stored associations about the kinds of thoughts that previously produced beneficial results.
  2. A higher-level process involving decisions about what physical actions to take. While the inputs to this process do not necessarily need to go through consciousness, consciously perceived evidence has a much higher weight. Thus, the lower-level process has significant influence on which evidence gets to the accumulators on this level.

To be clear, this does not necessarily correspond to two clearly distinct levels: Zylberberg, Dehaene, Roelfsema, & Sigman (2011) do not talk about there being any levels, and they suggest that “triggering motor actions” is one of the possible decisions involved. But their paper seems to mostly be focused on actions - or, in their language, production rules - which manipulate the contents of consciousness.

There seems to me to be a conceptual difference between the kinds of actions that change the contents of consciousness, and the kinds of actions which accumulate evidence over many items in consciousness (such as iterative memories of snacks). Zylberberg et al. talk about a “winner-take-all race” to trigger a production rule, which to me implies that the evidence accumulated in favor of each production rule is cleared each time that the contents of consciousness is changed. This is seemingly incompatible with accumulating evidence over many consciousness-moments, so postulating a two-level distinction between accumulators seems like a straightforward way of resolving the issue.

(As an aside, I am, as Dehaene is, treating consciousness and working memory as basically synonymous for the purposes of this discussion. This is not strictly correct; e.g. there may be items in working memory which are not currently conscious. However, since it’s generally thought that items in working memory need to be actively rehearsed through consciousness in order to avoid be maintained, I think that this equivocation is okay for these purposes.)

Here’s a conceptual overview of the stages in the “biological Turing machine’s” operation (as Zylberberg et al. note, a production firing “is essentially equivalent to the action performed by a Turing machine in a single step”):

1. The production selection stage

At the beginning of a cognitive cycle, a person’s working memory contains a number of different items, some internally generated (e.g. memories, thoughts) and some external (e.g. the sight or sound of something in the environment). Each item in memory may activate (contribute evidence to) neurons which accumulate weight towards triggering a particular kind of production rule. When some accumulator neurons reach their decision threshold, they apply their associated production rule.

In the above image, the blue circles at the bottom represent active items in working memory. Two items are activating the same group of accumulator neurons (shown red) and one is activating an unrelated one (shown brown).

2. Production rule ignition

Once a group of accumulator neurons reach their decision threshold and fire a production rule, the model suggests that there are a number of things that the rule can do. In the above image, an active rule is modifying the contents of working memory: taking one of the blue circles, deleting it, and creating a new blue circle nearby. Hypothetically, this might be something like taking the mental objects holding “120” and “36”, adding them together, and storing the output of “156” in memory.

Obviously, since we are talking about brains, expressions like "writing into memory" or "deleting from memory" need to be understood in somewhat different terms than in computers; something being “deleted from working memory” mostly just means that a neuronal group which was storing the item in its firing pattern stops doing so.

The authors suggest that among other things, production rules can:

  • trigger motor actions (e.g. saying or doing something)
  • change the contents of working memory to trigger a new processing step (e.g. saving the intermediate stage of an arithmetic operation, together with the intention to proceed with the next step)
  • activate and broadcast information that is in a “latent” state (e.g. retrieving a memory and sending it to consciousness)
  • activate peripheral processors capable of performing specific functions (e.g. changing the focus of attention)
3. New production selection

After the winning production rule has been applied, the production selection phase begins anew. At this stage or a future one, some kind of a credit assignment process likely modifies the decision weights involved in choosing production rules: if a particular piece of evidence rule was activated at a later time and seemed to produce positive consequences, then the connections which caused those circumstances to be considered evidence for that rule are strengthened.

Practical relevance

Okay, so why do we care? What is the practical relevance of this model?

First, this helps make some of my previous posts more concrete. In Building up to an Internal Family Systems model, I proposed some sort of a process where different subagents were competing to change the contents of consciousness. For instance, “manager” subagents might be trying to manipulate the contents of consciousness so as to avoid unpleasant thoughts and to keep the person out of dangerous circumstances.

People who do IFS, or other kinds of “parts work”, will notice that different subagents are associated with different kinds of bodily sensations and flavors of consciousness. A priori, there shouldn’t be any particular reason for this… except, perhaps, if the strength of such sensations correlated with the activation of a particular subagent, with those sensations then being internally used for credit assignment to identify and reward subagents which had been active in a given cognitive cycle. (This is mostly pure speculation, but supported by some observations to which I hope to return in a future post.)

In my original post, I mostly talked about exiles - neural patterns blocked from consciousness by other subagents - as being subagents related to a painful memory. But while it is not emphasized as much, IFS holds that other subagents can in principle be exiled too. For example, a subagent which tends to react using anger may frequently lead to harmful consequences, and then be blocked by other subagents. This can easily be modeled using the neural Turing machine framework: over time, the system learns that decisions which modify consciousness so as to prevent the activation of a production rule that gives power to the angry subagent. As this helps avoid harmful consequences, this begins to happen more and more often.

Hazard has a nice recent post of this kind of a thing happening with emotions in general:

So young me is upset that the grub master for our camping trip forgot half the food on the menu, and all we have for breakfast is milk. I couldn't "fix it" given that we were in the woods, so my next option was "stop feeling upset about it." So I reached around in the dark of my mind, and Oops, the "healthily process feelings" lever is right next to the "stop listening to my emotions" lever.The end result? "Wow, I decided to stop feeling upset, and then I stopped feeling upset. I'm so fucking good at emotional regulation!!!!!"My model now is that I substituted "is there a monologue of upsetness in my conscious mental loop?" for "am I feeling upset?". So from my perspective, it just felt like I was very in control of my feelings. Whenever I wanted to stop feeling something, I could. When I thought of ignoring/repressing emotions, I imagined trying to cover up something that was there, maybe with a story. Or I thought if you poked around ignored emotions there would be a response of anger or annoyance. I at least expected that if I was ignoring my emotions, that if I got very calm and then asked myself, "Is there anything that you're feeling?" I would get an answer.Again, the assumption was, "If it's in my mind, I should be able to notice if I look." This ignored what was actually happening, which was that I was cutting the phone lines so my emotions couldn't talk to me in the first place.

Feeling upset feels bad, ceasing to feel upset feels good. Brain notices that there is some operation which causes the feeling of upset to disappear from consciousness: carrying out this operation also produces a feeling of satisfaction in the form of “yay, I’m good at emotional regulation!”. As a result of being rewarded, it eventually becomes so automatic as to block even hints of undesired emotions, making the block in question impossible to notice.

Another observation is that in IFS as well as in Internal Double Crux, an important mental move seems to be “giving subagents a chance to finish talking”. For instance, subagent A might hold a consideration pointing in a particular direction, while subagent B holds a consideration in the opposite direction. When A starts presenting its points, B interrupts with its own point; in response, A interrupts with its point. It seems to be possible to commit to not taking a decision before having heard both subagents, and having done that, ask them to take turns presenting their points and not interrupt each other. What exactly is going on here?

Suppose that a person is contemplating the decision, “should I trust my friend to have my back in a particular risky venture”. Subagent A holds the consideration “allies are important, and we don’t have any, we should really trust our friend so that we would have more allies”. Subagent B holds the consideration “being betrayed would be really bad, and our friend seems untrustworthy, it’s important that we don’t sign up for this”. Subagent A considers it really important to go on this venture together; subagent B considers it really important not to.

Recall that human decision-making happens by accumulating evidence towards different choices, until a decision threshold is met. If A were allowed to present its evidence in favor of signing up on the venture, that might sway the decision over the threshold before B was able to present the evidence against. Thus, there is a mechanism which allows B to “interrupt” A, in order to present its own evidence. Unfortunately, it is now A which risks B’s evidence being sufficient to meet a decision threshold prematurely unless B is prevented from presenting its evidence, so A must interrupt.

Subjectively, this is experienced as intense internal conflict, with two extreme considerations pushing in opposite directions, allowing no decision to be made - unless there is a plausible commitment to not making a decision until both have been heard out. (To me, this feels like my attention being caught in a tug-of-war between one set of considerations versus another. Roberts & Hutcherson (2019) note that A large body of work suggests that negative information draws focus through rapid detection [64–68] and attentional capture [69–71]. [....] Several studies now show that attending to a choice alternative or attribute increases its weighting in the evidence accumulation process [72–75]. To the extent that negative affect draws attention to a choice-relevant attribute or object, it should thus increase the weight it receives.)

There’s one more important consideration. Eliezer has written about cached thoughts - beliefs which we have once acquired, then never re-evaluated and just acted on them from that onwards. But this model suggests that things may be worse: it’s not just that we are running on cached thoughts. Instead, even the pre-conscious mechanisms deciding which thoughts are worth re-evaluating are running on cached values.

Sometimes external evidence may be sufficient to force an update, but there can also be self-fulfilling blind spots. For instance, you may note that negative emotions never even surface into your consciousness. This observation then triggers a sense of satisfaction about being good at emotional regulation, so that thoughts about alternative - and less pleasant - hypotheses are never selected for consideration. In fact, evidence to the contrary may feel actively unpleasant to consider, triggering subagents which use feelings such as annoyance - or if annoyance would be too suspicious, just plain indifference - to push that evidence out of consciousness, before it can contribute to a decision.

And the older those flawed assumptions are, the more time there is for additional structures to build on top of them.

References

Dehaene, S. (2014). Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts. New York, New York: Viking.

Dehaene, S., & Sigman, M. (2012). From a single decision to a multi-step algorithm. Current Opinion in Neurobiology, 22(6), 937–945.

Dennett, D. C. (1991). Consciousness Explained (1st edition). Boston: Little Brown & Co.

Forstmann, B. U., Ratcliff, R., & Wagenmakers, E.-J. (2016). Sequential Sampling Models in Cognitive Neuroscience: Advantages, Applications, and Extensions. Annual Review of Psychology, 67, 641–666.

Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion Decision Model: Current Issues and History. Trends in Cognitive Sciences, 20(4), 260–281.

Roberts, I. D., & Hutcherson, C. A. (2019). Affect and Decision Making: Insights and Predictions from Computational Models. Trends in Cognitive Sciences, 23(7), 602–614.

Shadlen, M. N., & Shohamy, D. (2016). Decision Making and Sequential Sampling from Memory. Neuron, 90(5), 927–939.

Zylberberg, A., Dehaene, S., Roelfsema, P. R., & Sigman, M. (2011). The human Turing machine: a neural framework for mental programs. Trends in Cognitive Sciences, 15(7), 293–300.



Discuss

Percent reduction of gun-related deaths by color of gun.

6 августа, 2019 - 23:28
Published on August 6, 2019 8:28 PM UTC

A thought experiment: Assume all guns were pink by law tomorrow.

Would that have an impact on the number of gun-related deaths? What percentage?

Would it have an impact on the number of mass shootings?

This is not an actual policy suggestion, so the feasibility of such a policy is irrelevant. It is more a question of the psychological impact of the color on the behavior of gun users (otherwise the usage of the color pink stays the same). Choose other colors if that's relevant for your answer.

Disclaimer: This is not my idea and you can quickly find out who asked it on twitter, but there are no answers on twitter yet, so better google somewhere else (also to avoid politics the mind-killer). I will reveal the source of the question later in the comments.



Discuss

Weak foundation of determinism analysis

6 августа, 2019 - 22:54
Published on August 6, 2019 7:03 PM UTC

Determinism is the belief that every action in time is born by the previous one. [1] In deterministic terms you cannot have an event E if you didn't have an event D before, which in turn is the result of a cause C and so on, without any type of alphabetical bound. This philosophical conception is one of the most discussed, directly or indirectly, because it is the generator of very important ontological implications, first of all the existence of free will which, we could claim, is one of the most courted topics by the philosophers of all time. Something that has always struck me about the endless debates on the subject it is a simple logical flaw that people seem to commit when they put forward their arguments, both for and against determinism. This (ir)rational weakness lies in the very concept core of determinism itself and, for simplicity, I will call it the "What if ?" problem. It goes like this:

  • " If determinism is true, then should criminals be persecuted for their crimes ? "
  • " If determinism is true, then should science cease to exists ? "
  • " Why, if determinism is true, do we look carefully before crossing the road ? "
  • " If determinism is true why should I do anything ? "

I think you got the point. Now, questions like these can be found in countless publications, online blog discussions, talk with friends on a drug-induced Friday night and they generally give way to endless verbosity flows, some of which may also contain angular viewpoints that shed new lights on your beliefs. So far so beautiful, too bad all these intellectual disputes are inconsistent with the very premise of determinism. Think of the first question, the only correct answer is:

  • "If determinism is true, criminals will continue to commit crimes and we will condemn them for this, we could not change our way of acting because otherwise it would not be determinism."

The second answer is a reflection of this and also the third. In fact, this answer is a blueprint for every possible observation to such questions which, notice well, make up a good 95% of the total discussions on the material. The same errors are dragged by induction into the reasoning on any type or by-product of the main theme.
For example, I remember a post on reddit in which a user wondered if, taken as assumption the veracity of eternalism (block universe determinism [2]), then it would have been more ethically appropriate for human beings to stop having children because you know, life is unfair and in this way those poor beings would suffer forever. To this my answer was the following:

  • "If eternalism is true nothig is created. Everything already exist. You fail to see it from a 4D perspective. Everything is already arranged, begginning to end, every state of matter, every permutation of it all. There is no movement, no action, no intention. "

You can observe a certain isomorphism between this answer and the one given above. If you want to explore these concepts (eternalism and its philosophical, physical and ethical implications) in a baroque, literary fascination, I strongly recommend the monumental novel Jerusalem by Alan Moore [3]. Nowadays science has not yet succeeded in proving the existence of a single random natural source and even if quantum physics seems to put sticks in the wheels of determinism, my personal belief is in line with that of nobel prize Gerard 't Hooft [4] , which hypothesizes that there may be a mechanistic structure at the base of everything, of which quantum physics is nothing more than an emergent property that we do not yet fully understand [5][6]. However, the thesis of this post is not to affirm the existence of determinism, the thesis, to make it short, is to affirm that determinism (in all its spectrum of forms) is a sort of ontological cul-de-sac. I don't think we will ever be able to design experiments, physical or psychological, that can prove or disprove it, let alone arrive at a solution through the tools of logical investigation. The existence of determinism could very well be an undecidable problem. Psychological studies have already been conducted, showing that people who believe in determinism are less productive, flirt more easily with depression and have a more creaky morality than others but these studies (and consequently the results) fall into the same categorical errors analyzed above. If everything is carved in the marble of time, we cannot change things in any way and people who believe in stochastic salvation do it because they cannot do otherwise and are (generally) more serene because they cannot do otherwise. In essence, in my opinion, it is impossible to determine whether a system is deterministic or not from within the system itself. Determinism, as a concept, can develop the same sometimes annoying and sometimes fascinating self-referentiality of the halting problem.







Discuss

New paper: Corrigibility with Utility Preservation

6 августа, 2019 - 22:05
Published on August 6, 2019 7:04 PM UTC

I am pleased to announce the availability of a long-format paper with new results on AGI safety: Corrigibility with Utility Preservation.

You can get the paper at https://arxiv.org/abs/1908.01695 , and in the related software repository at https://github.com/kholtman/agisim .

Abstract

Corrigibility is a safety property for artificially intelligent agents. A corrigible agent will not resist attempts by authorized parties to alter the goals and constraints that were encoded in the agent when it was first started. This paper shows how to construct a safety layer that adds corrigibility to arbitrarily advanced utility
maximizing agents, including possible future agents with Artificial General Intelligence (AGI). The layer counter-acts the emergent incentive of advanced agents to resist such alteration.

A detailed model for agents which can reason about preserving their utility function is developed, and used to prove that the corrigibility layer works as intended in a large set of non-hostile universes. The corrigible agents have an emergent incentive to protect key elements of their corrigibility layer. However, hostile universes may contain forces strong enough to break safety features. Some open problems related to graceful degradation when an agent is successfully attacked are identified.

The results in this paper were obtained by concurrently developing an AGI agent simulator, an agent model, and proofs. The simulator is available under an open source license. The paper contains simulation results which illustrate the safety related properties of corrigible AGI agents in detail.

This post can be used for comments and questions.

The paper contains several results and observations that do not rely on the heavy use of math, but other key results and discussions are quite mathematical. Feel to post questions and comments even if you have not read all the mathematical parts.

As this is my first post on LessWrong, and my first paper on AGI safety, I feel I should say something to introduce myself. I have a Ph.D. in software design, but my professional life so far has been very diverse and multidisciplinary. Among other things I have been an experimental physicist, a standards developer and negotiator, an Internet privacy advocate, a wireless networking expert, and a systems architect in an industrial research lab. So I bring a wide range of tools and methodological traditions to the field. What made me interested in the field of AGI safety in particular is that it seems to have open problems where real progress can be made using mathematical techniques that I happen to like. I am currently on a sabbatical: basically this means that I decided to quit my day job, and to use my savings to work for a while on some interesting problems that are different from the interesting problems I worked on earlier.



Discuss

Trauma, Meditation, and a Cool Scar

6 августа, 2019 - 19:17
Published on August 6, 2019 4:17 PM UTC

[Trigger Warning: I’ll be discussing a physical injury, recovery, and panic attacks in detail. The first three pictures linked are gory. Again, they are linked, not directly shown]

Trauma

One year ago today, I was in an accident with an industrial drone. It was spinning too fast while arming (like how helicopters spin up before they took off), but nothing we tried would fix it. Eventually, I changed the PWM value back to the default value, and it spun up even faster. Fast enough to take off right into me.

It tore up my arm. It tore up my face. After screaming, it didn’t hurt that bad, so I thought I overreacted. I told everyone “I think I’m okay”. They didn’t believe me, and I was rushed to the hospital. The pain was horrible, but the nausea was worse. I had made everyone apple pie that day, but I didn’t get to keep my piece.

The doctor thought I needed facial reconstruction surgery, so they put me in an ambulance and shipped me to another hospital. They stitched me up, said no facial surgery was needed, but that my lens and iris were destroyed in my left eye. A couple days later, my eyeball bruised. A week of checkups and eye drops 4 times a day, they then put me under for surgery.

I woke up in so much pain, so confused. They told me to keep my head down. I asked Why am I in so much pain? repeatedly. They put me in a wheelchair to take me outside, and told me to keep my head down. But all I could do was feel terrified because I was in pain and no one was doing anything about it. I’m told to keep my head down as they put me in my dad’s car, so I kept my head down and hurt.

For a week, I had to keep my head down. When I ate, my head was down. When I talked to someone, my head was down. When I slept, my head was down.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} 1. I couldn’t play piano like I used to because of my arm. I couldn’t read like I used to because of my eye. I couldn’t even think like I used to because my working memory was shot. I felt so powerless and isolated.

How am I supposed to program or learn new things when I could barely keep 3 things on my mind, when I could barely read off a screen for 2 minutes before having to take a break? How am I supposed to connect with someone when I could barely look them in the eye, when I couldn’t even give them my full attention?

On top of that, I was on eye drops to sooth my eye from all the other eye drops I was taking. I was on laxatives to relieve constipation from all the pain medicine I was taking. Even though, I was on a tablet and two drops for eye pressure, I still got glaucoma headaches. So another surgery, and more checkups. And of course, there were the panic attacks.

Any unexpected loud noise would fill me with distress, it felt like I was being attacked, like it was happening again. A couple of months later, I was playing piano more like I was used to. A picture frame on top of the piano fell, freaked me out, and I cried because I thought I was over this. It was frustrating how scared I was, how easily I could feel overwhelmed.

I’ve never been angrier in my life.

As a kid, I used to think “what doesn’t kill you makes you stronger”, that if I went through horrible events, I would come out cooler, more mature. That I would be like Sasuke from Naruto whose whole family died, but he came out so cool, and edgy, and he got the girl! But really, horrible events mess you up, and I wouldn’t wish that on anyone. There’s not a guarantee that things will be better, not even that things will be as good as they were before.

But... things did get better.

Meditation

I read Hazard’s post and took up meditating with the mind illuminated. I took Elo up on talking about meditating, told him about my panic attacks, and we fixed them! By “fixed” I mean they still happened, but drastically affected me less and less. And then they started happening less and less. Now, I really don’t mind them more than an itch.

I was told:

  1. Break down previous panic attacks into a sequence of events/sensations such as physical sensations (jaws clenching, shoulders tensed, heart racing, breathing change), and mental sensations (specific thoughts, movements of attention, loss of awareness).
  2. Be aware of the sensations you experience during the actual panic attack. From Elo, “The piece of knowledge to maintain is that you are not these reactions, you have them but they do not have you. You get to watch them happen.”

For me, I could see “jerking back, elevated heart rate, cortisol/adrenaline feeling, teary eyed because of how I reacted, eyes focus, shoulder tension, toes clenching”, but later, in the moment of actually having a panic attack, it was [noise]->[involuntary yelp]->[chest tightness with stress]->[eyes widen]->[thinking that I’m fine].

I would like to clarify that “chest tightness with stress” is a mental object in word form, but I felt it as a physical sensation like a bad warmth spreading through my body starting from my chest. But even that description fails to convey the reality of the sensation! What’s important is that I described it to myself in hard-to-convey physical sensations. The same is true for the other links in the chain.

Doing this, I realized “Pain is inevitable; suffering is optional” with the next few panic attacks. They happened. They sucked...but then they were over. Through meditating I was building this skill even more, this skill of non-reacting, of accepting the reality of sensations exactly as they were, of not fighting it, of not getting trapped in a series of thoughts, of not holding on to impulses. I used to think “Man, I’m so hungry”. Now it’s, “Oh the sensation of hunger is there. Oh, now it’s gone. What time is it? 11:00? I’ll work another hour and then eat”. All that miserable anger that would keep me up at nights, I've now let it all go.

I wish I would’ve had a consistent meditation practice before the accident. I predict that I would’ve suffered much less. If you are going through a difficult life trauma now, I highly recommend getting professional help, and you’re welcome to PM me about it as well.

A Cool Scar

I can read and think like I used to (which were two of the most debilitating effects). My left eye rarely hurts anymore, though I still can’t see out of it2. I’m not nauseous nor do I have glaucoma headaches, though I am still on one eye drop indefinitely. I have most of the strength and flexibility back in my left arm, though it will act up if I hit it just right. I am technically bi-chromatic now because my iris was destroyed! Though, that also means my left eye is a giant pupil, and I need shades to go outside when it’s sunny.

Just like in Valentine’s Grieving Well, I was able to see what was important in my life. I quit my job and started leveraging academia this Spring, I found a girl who kisses my scars, and I’ve grown a lot closer to my family3.

Although I’m not as edgy as Sasuke (probably for the best) the scar does make me a little bit cooler, and, well, I did get the girl.


1. I had an air bubble in my eye and had to keep my head down so that the bubble would do something to my retina (keep pressure to it?). Pro tip: put pillows between the bed and your chest when you sleep so you don't suffocate.

2. I can see a little actually. White is perfect vision, black is blind.

Do you notice the blind spot (black circle) in my right eye (on the left)? Notice how that's most of my left eye?

3. My brother and I have such a good relationship that he made me this:

which is ripped from webcomicname

*Special thanks to Elo for reviewing the draft of this post




Discuss

Why is the nitrogen cycle so under-emphasized compared to climate change

6 августа, 2019 - 12:25
Published on August 6, 2019 9:25 AM UTC

While listing various environmental issues of how planetary boundaries are exceeded the Stockholm Resilience Centre lists climate change in the "increasing risk category" while listing "Nitrogen and phosphorus flows to the biosphere and oceans" in the high risk category which suggests that they consider it a more grave environmental issue.

Do I misread them? Otherwise, are there views inline with other scientists in their field?

If that's a consensus view, view do we have so much more debate about climate change but none about nitrogen and phosphorus flows?



Discuss

How would a person go about starting a geoengineering startup?

6 августа, 2019 - 10:34
Published on August 6, 2019 7:34 AM UTC



Discuss

Diagnosis: Russell Aphasia

6 августа, 2019 - 07:43
https://status451dotcom.files.wordpress.com/2019/06/russell.jpg

What are the best resources for examining the evidence for anthropogenic climate change?

6 августа, 2019 - 05:53
Published on August 6, 2019 2:53 AM UTC

A while back I was researching the evidence for evolution. It's not that I didn't initially believe in evolution or understand natural selection, but it's just that I didn't really understand the full breadth of the evidence and predictions that the theory makes. Before, I had a tendency to simply assert that "the evidence is overwhelming" in discussions without really going into detail.

When I began researching evidence, I had a few choices. I could just read basic surface arguments that I found on the internet, such as this article from Khan academy or the Wikipedia page. While these resources are valuable, they aren't very comprehensive, and don't appear like they'd convince a hard-nosed skeptic. There are popular books, such as Jerry A. Coyne's Why Evolution Is True and Richard Dawkin's The Greatest Show on Earth. The last two sources left me feeling like I still wasn't getting the full story, since they assumed a beginner background in philosophy and science, and weren't as nuanced as I wanted them to be (although I did not read both of them cover to cover).

Eventually I stumbled across 29+ Evidences for Macroevolution: The Scientific Case for Common Descent by Douglas Theobald, which exceeded my expectations, and satisfied my desire to understand the evidence for common descent. While this last work does not assume the reader is a professional biologist, it also doesn't shy away from presenting specific technical evidences and the context they play in modern biology.

I wonder whether there is a similar publication which can satisfy my desire to understand anthropogentic climate change. My prior is that climate change is real, and primarily caused by human activity. I believe this because I generally side with the scientific consensus, and most intelligent people I know believe it. However, I am a little embarrassed from the fact that I couldn't really convincingly argue with a skeptic. I imagine a highly educated climate change skeptic like Roy Spencer could argue circles around me, which is never a good sign.

In light of the previous discussion, what are the best resources for understanding the full breadth of evidence for anthropogenic climate change?



Discuss

Do decision theories underspecify policies?

6 августа, 2019 - 05:38
Published on August 6, 2019 2:38 AM UTC

(thinking within the general RL framework):

If I know what I'm optimizing over, does a decision theory tell me what my policy should do on trajectories which are known to be counterfactual according to the decision theory?

e.g. if my decision theory says "always take action1", then I will never see (partial) trajectories with action0 in them. So on the face of it, I should be able to choose the policy freely for those (partial) trajectories.

But I'm pretty sure that's not true, because (I think?) decision theories need to have the right counter-factuals (e.g. for Newcomb's problem).

So then the question is: (when) does a decision theory specify actions on ALL possible (partial) trajectories (including all counterfactuals)?

(and is it important or desirable to do so? etc.)



Discuss

A Survey of Early Impact Measures

6 августа, 2019 - 04:22
Published on August 6, 2019 1:22 AM UTC

In the context of AI alignment an impact penalty is one way of avoiding large negative side effects from misalignment. The idea is that rather than specifying negative impacts, we can try to avoid catastrophes by avoiding large side effects altogether.

Impact measures are ways to map a policy to a number which is intended to correspond to "how big of an impact will this action have on the world?" Using an impact measure, we can regularize any system with a lot of optimization power by adding an impact term to its utility function.

This post records and summarizes much of the early research on impact penalties. I emphasize the aims of each work, and the problems associated with each approach, and add occasional commentary along the way. In the next post I will dive more deeply into recent research which, at least in my opinion, is much more promising.


The mathematics of reduced impact: help needed, by Stuart Armstrong (2012)

This is the first published work I could find which put forward explicit suggestions for impact measures and defined research directions. Armstrong proposed various ways that we could measure the difference between worlds, incorporate this information into a probability distribution, and then use that to compare actions.

Notably, this post put a lot of emphasis on comparing specific ontology-dependent variables between worlds in a way that is highly dependent sensitive to our representation. This framing of low impact shows up in pretty much all of the early writings on impact measures.

One example of an impact measure is the "Twenty (million) questions" approach, where humans define a vector of variables like "GDP" and "the quantity of pesticides used for growing strawberries." We could theoretically add some .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} L1 regularizer to the utility function, which measures the impact difference between a proposed action and the null action, scaled by a constant factor. The AI would then be incentivized to keep these variables as close possible to what they would have been in the counterfactual where the AI had never done anything at all.

The flaw with this specific approach was immediately pointed out, both by Armstrong, and in the comments below. Eliezer Yudkowsky objected to the approach since it implied that some artificial intelligence would try to manage the state of affairs of the entire universe in order to keep the state of the world to be identical to the counterfactual where the AI never existed,

Coarse-grained impact measures end with the AI deploying massive-scale nanotech in order to try and cancel out butterfly effects and force the world onto a coarse-grained path as close as possible to what it would've had if the AI "hadn't existed" however that counterfactual was defined. [...] giving an AI a huge penalty function over the world to minimize seems like an obvious recipe for building something that will exert lots and lots of power.

I agree that this is an issue with the way that the impact measure was defined — in particular, the way that it depended on some metric comparing worlds. However, the last line sounds a bit silly to me. If your impact measure provides an artificial intelligence an incentive to "exert lots and lots of power" then it doesn't really sound like an impact measure at all.

This critique is expanded in Yudkowsky's Arbital article which came out four years later, which I also go into below.

Low Impact Artificial Intelligences, by Stuart Armstrong and Benjamin Levinstein (2015)

This work starts from fundamentally the same starting point as the Lesswrong post above, but expands upon it by having more explicit and rigorous approaches.

One approach outlined in the article is the approach of viewing impact as a form of news which informs us of the world that we are living in.

The idea is simple enough: if you were notified tomorrow that Deepmind had constructed an advanced superintelligence, this would rightfully change the way that you thought about the world, and would inform you that some radical things are soon arriving which could alter the trajectory of what we value. Putting an emphasis on importance and the value of information forms the intuition behind this method of measuring impact. This also enables us to look at importance by putting us in the shoes of an agent which is given information.

How would we measure importance? One approach is to look at a large set U of different utility functions, and check that knowing X makes little expected difference for any of them.

In other words, conditional on the action taking place, how does this affect utility?

A concept which neighbors this approach is to imagine that impact is a measure of of how detectable an action is. Presumably, the exact way that hydrogen atoms are distributed throughout the center of the sun shouldn't affect the impact measure very much, since their arrangement gives us pretty much no actionable information here on Earth. Consequently, knowing a lot about the particular way that atoms in the sun are arranged doesn't change what we predict will happen in the world — at least unless you are so powerful that it allows you to simulate the entire universe flawlessly, and feeds into the final input into this simulation.

By shifting the focus from particle positions and state representations, this paper gets closer to the way that I personally interpret impact measurement, and how I understand impact is defined in more recent research.

The paper moves then towards describing very abstract ways of measuring impact, including measuring general changes in probability distributions, such as some generalized cross-entropy between two probability distributions. The intuition here is that worlds with a low-impact artificial intelligence will have a broad probability distribution, whereas worlds with high-impact artificial intelligence will have almost all the probability density on a specific narrow region. If we could somehow measure the divergence between these probability distributions in a way that was natural and was resilient to the choice of representation, this would provide a way of measuring impact that clearly isn't value laden.

The weakness of the abstract approach is that there is no clear interpretation of what's going on. The choice of how we compare probability distributions seems a bit arbitrary, and it also doesn't seem like the type of thing a human would care about if they were naively minimizing impact. Furthermore, this approach, like the one that came before, requires some baseline weak prediction capability in order for it to be applied consistently. To see why, consider that a sufficiently advanced superintelligence will always have essentially all of its probability distribution on a single future — the actual future.

Armstrong and Levinstein wrote a brief discussion for how we can calibrate the impact measure. The way that machine learning practitioners have traditionally calibrated regularizers is by measuring their effect on validation accuracy. After plotting validation accuracy against some μ scaling factor, practitioners settle on the value which allows their model to generalize the best. In AI alignment we must take a different approach, since it would be dangerous to experiment with small scaling values for the impact penalty without an idea of the magnitude of the measurement.

The paper points to an additional issue: if the impact measure has sharp, discontinuous increases, then calibrating the impact measure may be like balancing a pen on the tip of a finger.

It is conceivable that we spend a million steps reducing µ through the ‘do nothing’ range, and that the next step moves over the ‘safe increase of u’, straight to the ‘dangerous impact’ area.

This above problem motivates the following criterion for impact measures. An impact measure should scale roughly linearly in the measurement of impact on the world. Creating one paperclip might have some effect X on the world, and creating two paperclips might have some effect X+Y on the world, but creating three paperclips should have some effect that is close to X+2Y or else the impact measure is broken.

Since measuring impact is plausibly something that can be done without the help of a superintelligence, this provides a potential research opportunity. In other words, we can check ex ante whether any impact measure is robust to these types of linear increases in impact. On the other hand, if we find that some impact penalty requires superintelligent capabilities in order to measure, then it may cast doubt on the method of measurement, and our ability to trust it. And of course, it couldn't reflect any algorithm which humans are running in their head.

After this point in time, specifying desiderata in order to avoid failure modes becomes a shift in the way that impact measures are formulated.

Concrete Problems in AI Safety (Section 3), by Dario Amodei and Chris Olah et al. (2016)

The main contribution in this paper is that it proposes a way to learn impact. In this sense, impact is less of an actual thing that we add to the utility function, and more of something that is discovered by repeated demonstration. The intention behind this shift is to move focus away from explicit ways of representing impact, which can be brittle if we focus too much on exactly how the environment is represented. The downside is that it doesn't appear to scale to superintelligent capabilities.

If I understand this proposal correctly, an example of impact in this case would be to penalize some ML system each time it makes a large error. Over time the ML system would have an implicit penalty term for errors of that type, such that in the future it won't be very likely to do something which has a large impact on the world. Of course, if we consider that as AI systems grow in competence they are likely to try strategies which we had not even thought about before, this approach is particularly susceptible to edge instantiation.

The paper also discusses using empowerment to measure influence before immediately rejecting it as a solution. The authors consider that researching simple and already well-trodden mathematical functions could yield fruitful results, providing optimism for future research into this area.

Despite these issues, the example of empowerment does show that simple measures (even purely information-theoretic ones!) are capable of capturing very general notions of influence on the environment. Low impact, by Eliezer Yudkowsky (2016)

This Arbital article summarizes impact measurements more succinctly and comprehensively any previous work. It outlines the aim of impact research and provides useful critiques for impact penalties which are based on measuring the distance between state representations. In particular, the article expands on three foreseeable difficulties in an impact measure:

1. An artificial intelligence could try to minimize its influence by doing bad things that were low impact relative to the normal course of reality. An example would be an AI that tried to cure cancer but kept death rates constant, since this would have happened if we didn't cure cancer.

2. Due to the existence of chaotic systems, an AI would be indifferent to particular systems which are nearly impossible to control directly, such as the weather. In this case, since the AI is indifferent, it might as well deploy nanobots in the sky since there's not much we can do to keep weather constant.

3. The AI will end wanting to keep everything in complete stasis, which introduces the incentive of taking over the entire world in order to keep things the way they would have turned out if the AI didn't exist. For instance, if we constructed a low impact AI in order to learn more about alignment experimentally, the low-impact AI would want us to learn as little as possible from the experiment because every new insight we gain would be something we would not have gotten if the AI did not exist.

As I have indicated above, I think that these types of errors are quite important to consider, but I do think that impact can be framed differently in order to avoid them. In particular, there is a lot of focus on measuring the distance between worlds in some type of representation. I am skeptical that this will forever remain a problem because I don't think that humans are susceptible to this mistake, and I also think that there are agent-relative definitions of impact which are more useful to think about.

To provide one example which guides my intuitions, I would imagine that being elected president is quite impactful from an individual point of view. But when I make this judgement I don't primarily think about any particular changes to the world. Instead, my judgement of the impact of this event is focused more around the type of power I gain as president, such as being able to wield the control of the military. Conversely, being in a nuclear war is quite impactful because of how it limits our current situation. Having the power to exert influence, or being able to live a safe and happy life is altered dramatically in a world affected by nuclear war.

This idea is related to instrumental convergence, which is perhaps more evidence that there is a natural core to this concept of measuring impact. In one sense, they could be part of the whole: collecting money is impactful because it allows me to do more things as I become wealthier. And indeed, there may be a better word than "impact" for the exact concept which I am imagining.

Penalizing side effects using stepwise relative reachability, by Victoria Krakovna et al. (2018)

From what I am aware, the current approaches which researchers are most optimistic about are the impact measures based on this paper. In the first version of this paper, which came out in 2018 (updated in 2019 for attainable utility), the authors define relative reachability and compare it against a baseline state, which is also defined. I will explore this paper, and the impact measures which are derivative to this one in the next post.

In the last post in this sequence, I promised to "cover the basics of statistical learning theory." Despite the ease of writing those words, I found it to be much more difficult than I first imagined, delaying me a few days. In the meantime, I will focus the next post on surveying recent impact research.



Discuss

Preferences as an (instinctive) stance

6 августа, 2019 - 03:43
Published on August 6, 2019 12:43 AM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

User Veedrac recently commented:

You have shown that simplicity cannot distinguish (p,R) from (−p,−R), but you have not shown that simplicity cannot distinguish a physical person optimizing competently for a good outcome from a physical person optimizing nega-competently for a bad outcome.

This goes to the heart of an important confusion:

  • "Agent A has preferences R" is not a fact about the world. It is a stance about A, or an interpretation of A. A stance or an interpretation that we choose to take, for some purpose or reason.

Relevant for us humans is:

  • We instinctively take a particular preference stance towards other humans; and humans tend to take the same stance towards others and towards each other. This makes the stance feel "natural" and intrinsic to the world, when it is not.
The intentional stance

Daniel Dennett defined the intentional stance as follows:

Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in most instances yield a decision about what the agent ought to do; that is what you predict the agent will do.

In the physical stance, we interpret something as being made of atoms and following the laws of physics. In the intentional stance, we see it as being an agent and following some goal. The first allows for good prediction of the paths of planets; the second, for the outcome of playing AlphaZero in a game of Go.

The preference/(ir)rationality stance What it the intentional stance for?

In a sense, the intentional stance is exactly the same as a preference stance. Dennett takes an object and treats it as an agent, and splits it into preference and rationality. Ok, he assumes that the agent is "rational", but allows for us to "figure out what what beliefs the agent ought to have." That, in practice, allows us to model a lot of irrationality if we want to. And I'm fully convinced that Dennett takes biases and other lapses of rationality into account when dealing with other humans.

So, in a sense, Dennett is already taking a preference/(ir)rationality stance[1] towards the object. And he is doing so for the express purpose of better predicting the behaviour of that object.

What is the preference stance for?

Unlike the intentional stance, the preference stance is not taken for the purpose of better predicting humans. It is instead taken for the purpose of figuring out what the human preferences are - so that we could maximise or satisfy them. The Occam's razor paper demonstrates that, from the point of view of Kolomogorov complexity, taking a good preference stance is not at all the same thing as taking a good (predictive) intentional stance.

But it often feels as if it is; we seem to predict people better when we assume, for example, that they have specific biases. Why is this, and how does it seem to get around the result?

Rationality stance vs empathy machine

There are two preference stances that it is easy for humans to take. The first is to assume that an object is a rational agent with a certain preference. Then we can try and predict which action or which outcome would satisfy that preference, and then expect that action/outcome. We do this often when modelling people in economics, or similar mass models of multiple people at once.

The second is to use the empathy machinery that evolution has developed for us, and model the object as being human. Applying this to the weather and the natural world, we anthropomorphised and created gods. Applying to other humans (and to ourselves) gives us quite decent predictive power.

I suspect this is what underlies Veedrac intuition. For if we apply our empathy machine to fellow humans, we get something that is far closer to a "goodness optimiser", albeit a biased one, than to an "badness nega-optimiser".

But this doesn't say that the first is more likely, or more true, about our fellow humans. It say that the easiest stance for us to take is to treat other humans in this way. And this is not helpful, unless we manage to get our empathy machine into an AI. That is part of the challenge.

And this brings us back to why the empathy machine seems to make better predictions about humans. Our own internal goals, the goals that we think we have on reflection, and how we expect people (including us) to behave given those goals... all of those coevolved. It seems that it was easier for evolution to use our internal goals (see here for what I mean by these) and our understanding of our own rationality, to do predictions. Rather than to run our goals and our predictions as two entirely separate processes.

That's why, when you use empathy to figure out someone's goals and rationality, this also allows you to better predict them. But this is a fact about you (and me), not about the world. Just as "Thor is angry" is actually much more complex than electromagnetism, our prediction of other people via our empathy machine is simpler for us to do - but is actually more complex for an agent that doesn't already have this empathy machinery to draw on.

So assuming everyone is rational is a simpler explanation of human behaviour than our empathy machinery - at least, for generic non-humans.

Or, to quote myself:

A superintelligent AI could have all the world’s video feeds, all of Wikipedia, all social science research, perfect predictions of human behaviour, be able to perfectly manipulate humans... And still conclude that humans are fully rational.

It would not be wrong.

  1. I'll interchangeably call it a preference or an (ir)rationality stance, since given preferences, the (ir)rationality can be deduced from behaviour, and vice versa. ↩︎



Discuss

[AN #61] AI policy and governance, from two people in the field

6 августа, 2019 - 01:59
Published on August 5, 2019 5:00 PM UTC

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Highlights

The new 30-person research group in DC investigating how emerging technologies could affect national security (Rob Wiblin and Helen Toner): This 80,000 Hours podcast with Helen Toner dives into details of AI policy, China and the new Center for Security and Emerging Technology (CSET). I'm only summarizing the parts I found most relevant.

Many of the analogies for AI are quite broken. AI is a very broad set of software technologies, unlike nuclear weapons which are very discrete. It's not feasible to use export controls to keep "AI" within the US. In addition, AI will affect war far more fundamentally than just creating lethal autonomous weapons -- Helen thinks that the biggest military impact might be on logistics. It's also weird to compare data to oil, because oil is a rival good (two people can't use the same oil), whereas data can easily be copied. In addition, one barrel of oil can replace any other barrel, but data is very specific to the particular application. Helen's preferred analogy is thinking of AI as electricity -- a very general purpose tool that will transform lots of aspects of society. However, this analogy can also break down -- for example, the AI research community seems pretty important, but there was no analog for electricity.

And now for a few random points, in no particular order. China "exports" around 50,000 inventors (patent holders) every year, while the US imports 190,000, far more than any other country, suggesting that the US is a global hub for talent. AI is hard to define, because many of its properties lie on a continuum -- for example, is a landmine a lethal autonomous weapon? The way to affect policy is to make small, targeted changes in proposed policies so that the government makes slightly better decisions -- it's far too difficult to execute on a grand plan to get the government to do some big thing. The main skills for engaging with government on technology issues: be able to speak both to scientists as well as bureaucrats, and be able to navigate the DC setting -- knowing what people are doing, what their incentives are, and how to get your thing done given their different incentives.

Rohin's opinion: I enjoyed the section on how analogies for AI are broken -- I don't usually think much about them, but they always felt a bit off, and Helen makes it very clear what the issues are. It was also interesting seeing how the perspectives on AI are quite different from those of us thinking about AGI accident risk -- we often think about single, generally intelligent AGI systems, whereas Helen emphasized how current technologies can be easily deployed in many application-specific contexts. While data for current systems is very application-specific as Helen mentioned, if you believe the unsupervised learning story data may be more interchangeable for AGI systems.

AI Alignment Podcast: On the Governance of AI (Lucas Perry and Jade Leung): Jade makes a lot of points in this podcast, some of which I've summarized here in no particular order.

GovAI works on lots of research topics, including analysis of the inputs to AI, understanding historical cases of competition, looking at the relationship between firms and governments, and understanding public opinion.

Governance is particularly difficult because in the current competitive environment it's hard to implement any form of "ideal" governance; we can only make changes on the margin. As a result, it is probably better if we could get to a state where we could take a long time to deliberate about what ideal governance would look like, without having to worry about competitive pressures.

The biggest risk for governments is that they will make hasty, ill-informed regulation. However, given how uncertain we are, it's hard to recommend any concrete actions right now -- but governance will happen anyway; it won't wait for more research. One useful action we can take is to correct or add nuance to inaccurate memes and information, such as the "race" between the US and China, or the performance-safety tradeoff. Plausibly we should engage with government more -- we may have been biased towards working with private organizations because they are more nimble and familiar to us.

Instead of thinking about short term vs. long term, we should be thinking about the stakes. Some issues, such as privacy or job loss, can be thought of as "short term" but their stakes could scale to be huge in the long term. Those would be good areas to think about.

Rohin's opinion: I don't have any particular thoughts on these topics, but I am glad for both this and the previous podcast, which give more of a birds-eye view of the AI governance landscape, which is hard to get from any single paper.

Technical AI alignmentTechnical agendas and prioritization

On the purposes of decision theory research (Wei Dai): In this post, Wei Dai clarifies that he thinks decision theory research is important because it can help us learn about the nature of rationality, philosophy, and metaphilosophy; it allows us to understand potential AI failure modes; we can better understand puzzles about intelligence such as free will, logical uncertainty, counterfactuals and more; and it could improve human rationality. It is not meant to find the "correct" decision theory to program into an AI, nor to create safety arguments that show that an AI system is free of "decision-theoretic" flaws.

Preventing bad behavior

Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning (Jaime F. Fisac, Neil F. Lugovoy et al): Reinforcement learning is not great at enforcing constraints that hold at all times, because the agent would violate a constraint now if it would lead to higher reward later. In robust optimal control theory, we maximize the minimum of the constraint reward over time to avoid this. We can do this in the Bellman equation by taking a minimum between the current reward and estimated future value (instead of summing), but this does not uniquely define a fixed point. Just as in regular RL, we can use discounting to avoid the problem: in particular, if we interpret the discount as the probability that the episode continues, we can derive a Safety Bellman equation for which Q-learning is guaranteed to converge. They demonstrate their method in classic control environments as well as half-cheetah, with a range of RL algorithms including soft actor-critic (SAC).

Rohin's opinion: I really like how simple the change is here -- it should be a one-line change for many deep RL algorithms. Previously, we had to choose between unconstrained agents for high dimensional problems, or constrained agents for low dimensional problems -- I like that this work is making progress on constrained agents for high dimensional problems, similarly to Constrained Policy Optimization. While this work doesn't involve a performance reward, you could use the resulting safe policy in order to guide a process of safe exploration to learn a policy that safely optimizes a performance metric. Of course, this is all assuming a specification for the constraint to satisfy.

Miscellaneous (Alignment)

Modeling AGI Safety Frameworks with Causal Influence Diagrams (Tom Everitt, Ramana Kumar, Victoria Krakovna et al): This paper describes several AI safety frameworks using the language of causal influence diagrams (AN #49), in order to make it easy to compare and contrast them. For example, the diagrams make it clear that while Cooperative IRL and reward modeling (AN #34) are very similar, there are significant differences: in cooperative IRL, the rewards come directly from the underlying human preferences, whereas in reward modeling, the rewards come from a reward model that depends on human feedback, which itself depends on the underlying human preferences.

Rohin's opinion: I like these diagrams as a way to demonstrate the basics of what's going on in various AI safety frameworks. Sometimes the diagrams can also show the differences in safety features of frameworks. For example, in reward modeling, the agent has an incentive to affect the human feedback in order to affect the reward model directly. (Imagine getting the human hooked on heroin, so that future feedback causes the reward model to reward heroin, which could be easy to produce.) On the other hand, in cooperative IRL, the agent only wants to affect the human actions inasmuch as the actions affect the state, which is a normal or allowed incentive. (Imagine the agent causing the human to leave their house earlier so that they get to their meeting on time.)

AI strategy and policy

Information security careers for GCR reduction (Claire Zabel and Luke Muehlhauser): This post suggests that information security could be a good career path for people looking to reduce global catastrophic risks (GCRs). For AI in particular, such experts could help mitigate attacks by malicious or incautious actors to steal AI-related intellectual property. It also reduces the risk of destabilizing AI technology races. Separately, such experts could think about the potentially transformative impact of AI on cyber offense and defense, develop or advise on credible commitment techniques (see eg. model governance (AN #38)), or apply the security mindset more broadly.

An Interview with Ben Garfinkel (Joshua Monrad, Mojmír Stehlík and Ben Garfinkel): AI seems poised to be a very big deal, possibly through the development of AGI, and it's very hard to forecast what would happen next. However, looking at history, we can see a few very large trajectory shifts, such as the Agricultural Revolution and Industrial Revolution, where everything changed radically. We shouldn't assume that such change must be for the better. Even though it's hard to predict what will happen, we can still do work that seems robustly good regardless of the specific long-term risk. For example, Ben is optimistic about research into avoiding adversarial dynamics between different groups invested in AI, research into how groups can make credible commitments, and better forecasting. However, credible commitments are probably less tractable for AI than with nukes or biological weapons because AI systems don't leave a large physical footprint, can easily proliferate, and are not a clear category that can be easily defined.

Other progress in AIExploration

Self-Supervised Exploration via Disagreement (Deepak Pathak, Dhiraj Gandhi et al) (summarized by Cody): For researchers who want to build a reinforcement learning system that can learn to explore its environment without explicit rewards, a common approach is to have the agent learn a model of the world, and incentivize it to explore places where its model has the highest error, under the theory that these represent places where it needs to interact more to collect more data and improve its world model. However, this approach suffers in cases when the environment is inherently stochastic, since in a stochastic environment (think: sitting in front of a static TV and trying to predict the next frame), prediction error can never be brought to zero, and the agent will keep interacting even when its world model has collected enough data to converge as much as it can. This paper proposes an alternative technique: instead of exploring in response to prediction error, learn an ensemble of bootstrapped next-state prediction models and explore in response to variance or disagreement between the models. This has a few nice properties. One is that, in cases of inherent stochasticity, all models will eventually converge to predicting the mean of the stochastic distribution, and so even though they've not brought error down to zero, the variance among models will be low, and will correctly incentivize our agent to not spend more time trying to learn. Another benefit is that since the reward is purely a function of the agent's models, it can be expressed analytically as a function of the agent's choices and trained via direct backpropogation rather than "black box reward" RL, making it more efficient.

Cody's opinion: I found this approach really elegant and clever as a way of addressing the "static TV" problem in curiosity literature. I'd be curious to see more work that introduces even stronger incentives towards diversity among the ensemble models (different architectures, even more different datasets they're trained on), to see if that amplifies the cases of model disagreement.

Deep learning

Weight Agnostic Neural Networks (Adam Gaier et al) (summarized by Cody): Inspired by the ability of animals to perform some tasks at birth, before learning about the world, this paper tries to find network architectures that perform well over a wide range of possible model parameters. The idea here is that if an architecture performs well with different sampled weights and without training to update those weights, then the architecture itself is what's responsible for encoding the solution, rather than any particular weight configuration. The authors look for such architectures on both classification and reinforcement learning problems by employing NEAT, a evolutionary method from Neural Architecture Search that searches for the best-performing topologies within the space of possible node connections and activations. The authors find that they're able to construct architectures that do better than random on their test problems without training weights explicitly.

Cody's opinion: I appreciate the premise of this paper, and in general feel positively towards papers that delve into a better understanding of how much of modern neural network performance is attributable to (discrete) structural architectures vs particular settings of continuous weight parameters, and I think this paper does that in a clever way by essentially marginalizing over different weight values. The framing of this paper, implicitly comparing networks used without weight training to animals with innate abilities, did make me wonder whether the architecture vs weights analogy to evolution vs learning is a sound one. Because, while it's true that the weights weren't explicitly gradient-descent trained in this paper, the network did still perform optimization based on task performance, just over a set of discrete parameters rather than continuous ones. In that context, it doesn't really seem correct to consider the resulting architectures "untrained" in a way that I think that analogy would suggest. I'd be curious to see more work in this direction that blends in ideas from meta-learning, and tries to find architectures that perform well on multiple tasks, rather than just one.

Hierarchical RL

Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning (Nirbhay Modhe et al)

Miscellaneous (AI)

Explainable AI, Sparse Representations, and Signals: So far, we have built AI systems that store knowledge symbolically or in a distributed fashion (with neural nets being the latter). While the distributed form allows us to learn knowledge and rules automatically, it is much harder to understand and interpret than symbolically represented knowledge. This post argues that the main difference is in the sparsity of the learned knowledge. Of course, with more "sparse" knowledge, it should be easier for us to understand the internal workings of the AI system, since we can ignore the pruned connections. However, the author also argues that sparse knowledge will help 'guide the search for models and agents that can be said to "learn" but also "reason"'. Given that AGI will likely involve finding good representations for the world (in the sense of unsupervised learning), then sparse learning can be thought of as a bias towards finding better bases for world models, that are more likely to be conceptually clean and more in line with Occam's razor.

In a postscript, the author considers arguments for AI risk. Notably, there isn't any consideration of goal-directedness or alignment failures; the worry is that we will start applying superhuman AI systems to superhuman tasks, and we won't know how to deal with these situations.

Rohin's opinion: Sparsity seems like a good objective to shoot for in order to ensure explainability. I'm less convinced that it's worthwhile for representation learning: I doubt humans have any sort of "sparse learning" bias; I think sparsity of knowledge is a natural consequence of having to understand a very complex world with a very small brain. (Whereas current ML systems only have to understand much simpler environments.)

News

Microsoft invests in and partners with OpenAI to support us building beneficial AGI (Greg Brockman): After moving to a capped-profit investment model (AN #52), Microsoft has invested $1 billion in OpenAI. This allows OpenAI to keep their focus on developing and sharing beneficial AGI: instead of having to create a product to cover costs, they can license their pre-AGI technologies, likely through Microsoft.

Research Associate in Paradigms of Artificial General Intelligence and Their Associated Risk (José Hernández-Orallo): CSER is hiring a post-doctoral research assistant to inform the AGI safety agenda by looking at existing and possible kinds of agents; the deadline is August 26.



Discuss

How to navigate through contradictory (health/fitness) advice

5 августа, 2019 - 23:58
Published on August 5, 2019 8:58 PM UTC

I will start with a brief story, but the question can be generalized.

Last year, I decided to do something for my body. I joined and regularly went to K. Training (abbreviated name), a large gym chain in german-speaking countries. The claimed philosophy is different from many gyms: there is no music, no proteine shakes to buy, mostly old people around, and insistence that it is about strength, not show-off, and that strength is what keeps your spine together etc. They have no cardio bikes, no barbell, only machines, and the high-intensity approach is that at each machine you do one continuous exercise for two minutes. If you reach the two minutes, increase the weight next time. It all seems very serious, there is an orthopedist you talk to when you become a member. It all has been in existence for some decades. The founder writes books, of course mentioning that his approach is the only one that works against pain, and that he is not heard by the mainstream. While at the same time they have contracts with many orthopedists and this is part of the marketing.

Now, a back problem. I have seen several orthopedists in my life, but the one I talked to this year (after two GPs, both clueless) is the first who seems competent and also listens. His comment about K. Training: it's ok, but sometimes hard to leave the contract. You could just as well try Yoga or Pilates. Anyways, he gives me a prescription for physical therapy.

Talking about this and that, the therapist speaks out against K Training, because no warming up / cardio (something the founder explicitly defends in his books), and Yoga/Pilates/etc is better anyways.

Then I googled again. Seemingly, gym experts all have their own approach. Some agree to the high-intensity two minutes thing, others disagree.

Then there is also Mr. L.-B., an anti-pain guru with a somewhat different approach I dont really understand, again against the "mainstream" but also against K. Training. And from a lecture of his that I watched on youtube, he seems like a snake-oil seller; but then, he (of course) has many fans.

Now I could just randomize what to do; or try to really read about approaches, but ALL of them seem plausible, if you listen to them. The investment necessary for actual judgement would be studying medicine.

So long story, short question: how do you actually handle such cases of pratically relevant epistemic learned helplessness?



Discuss

My recommendations for gratitude exercises

5 августа, 2019 - 22:30
Published on August 5, 2019 7:04 PM UTC


Gratitude has become an increasingly important part of my life. It has also been one of my greatest sources of improvement of well-being and one of the biggest factors in lifting me out of depression. How does this work? The short answer is that I keep a gratitude journal. The rest of this post is the long answer.

I think there are some theoretical reasons why we should expect gratitude to be helpful or extremely helpful. Try to imagine a time when you were deprived of something that you now have. For example, try to imagine a time when you misplaced your wallet, phone, or passport only to later realize where it was. Think of the sense of relief you got from this. Now recognize that you could feel that way now about all of the things that you have that you could have lost.

The hedonic treadmill refers to the phenomenon of us quickly getting accustomed to any new improvements that we’ve made so that we have to keep running to stay in the same place and maintain our happiness. I think one of the ways that the hedonic treadmill works is by us almost immediately taking everything for granted. If we can stop this process to some extent, through gratitude exercises, we might be able to make large improvements to our well-being.

In my own case, this is particularly vivid. Some years ago I had very bad repetitive strain injuries and associated chronic pain. I did not know if I would ever be able to work again or do many other normal things with my arms. The prospect of improvement seemed dim and my life seemed to be utterly ruined.

It seemed to me that if I could only get the use of my arms back, life would be perfect. At that time I thought to myself if things do you ever get better, if there is anything positive I can draw from this piece of hell, it is to remember that feeling, so that if I recover, I can always feel that my life is perfect. You might be able to leverage past tragedies in your life in this way as well. You might be able to turn that darkness into light.

I still have some trouble with repetitive strain injuries and chronic pain, but my situation is now vastly improved. Do I feel perfect now? Well no, it’s hard to fight the hedonic treadmill, but I do feel a lot better because of gratitude exercises.

I think one mistake people make when it comes to gratitude is thinking too small. While it’s helpful to feel grateful for a lot of different things, and I do write down small things in my gratitude journal, there are lots of big things that we could feel grateful for. We don’t feel grateful for these things because we’ve become accustomed to having them and thoroughly take them for granted. It can take some extra work to feel grateful for these things, but it’s worth it.

Here is a short, and by no means exhaustive, list of some of the things you might want to try feeling grateful for: being alive at all, being alive at this time in history, having loved ones who are alive, being born a human, having functional limbs, being able to make a difference in the world, having access to godlike technology, and having access to a wider range of media for free than any library could hold.

To feel grateful for some of these things you might have to try to vividly imagine being without them for a time. If you are deprived of some of these things for a time (or temporarily believe you are) you can also try to remember what that feels like, so that you can recapture it later when you have them again.

The idea with these techniques is to help them become ingrained as habits. It is to train your mind to see more of the good things that you have and naturally feel more gratitude for them. You should also expect to feel good while doing the gratitude exercise and this should help reinforce the habit. I found this technique to be less effective when I’m feeling quite bad. However, I think practising this technique has made me feel bad less often.

Of course, we could instead imagine things as they could be in some hypothetical utopian society 100 years from now when most forms of suffering are unknown. We could then make ourselves feel bad because things are so much worse than they would be in that society. I don’t think it makes sense to say that any of these comparisons are more correct or meaningful than any other. The only thing we can say is that some of these comparisons are more useful than others. Making comparisons that allow us to feel grateful can be useful in improving our lives.

One fear I had when starting this practice was that feeling gratitude would lead to complacency. However, I think that with some care this can be avoided and we can draw from the practice in order to be more effectively altruistic. If we have the ability to more effectively control and improve our well-being through our own thoughts, without having to spend expensive resources on it, we can allow ourselves to contribute more energy and resources to improving the lives of others. This technique may also point to a way in which we can help others without expending too many resources, since it is an inexpensive means of improving mental health that can be taught.

I think this technique may also be helpful in allowing us to reflect on the suffering of the world without being overwhelmed with grief about it, which allows us to be more motivated to improve it. Part of the practice is reflecting on the suffering, but feeling grateful that we are not going through this suffering now allows us to turn this darkness into light.

People waste a huge amount of resources pursuing ever smaller amounts of happiness as they climb the social ladder. Gratitude promises to be a way of achieving this that doesn’t involve wasting these resources, which could be vastly more useful in improving the well-being of the less fortunate.

Some people might find comparisons between other people to be insensitive or in bad taste. If this is the case for you, you can instead reflect on ‘different hypothetical versions of yourself’ in different states of deprivation. I do think the process can be done in an inoffensive way—you just have to have the right intentions and be tactful. Certainly there are bad ways of doing this, such as if you use the comparison to fuel a sense of superiority or if you use it to ‘lord’ what you have over others.

The perspective I try to approach this from is one of solidarity with all other sentient beings. We may be lucky enough to have many more resources than others and if so should draw whatever we can from those resources to help others.

I sometimes feel that using gratitude in this way is too ‘Pollyanna’ or too ‘sunshine and rainbows.’ If you feel this way, I suggest considering which life feels more lucid and clear eyed to you—one where you are preoccupied with minor details, like the last person who cut you off in traffic, or one where you are keenly aware of all that you have and all that could be taken away from you.

I haven’t looked deeply at the empirical literature on the subject. I suspect that the method does have more promise than Bruce’sis indicated by the studies, because I suspect that many people given the task of gratitude journaling in studies may not be doing so as effectively as they could be. The tips I give in this post should be an improvement on that. In practice it will still probably not be a magic bullet or panacea, but I think the method holds a lot of promise.

It’s possible that gratitude isn’t the word I should be using in this post. Appreciation might be a better word. The word gratitude carries at least a subtle suggestion that there is someone responsible and that person deserves praise, and this isn’t necessarily the case. In particular, God doesn’t exist, and if he did exist, I don’t think he would deserve our praise. Still, gratitude is the word that usually gets used in this context and it is emotionally punchier than ‘appreciation,’ so I’ve decided to keep using it.



Discuss

DC SSC Meetup

5 августа, 2019 - 19:19
Published on August 5, 2019 4:19 PM UTC

SSC meetup this Saturday, August 10th.



Discuss

Страницы