Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 2 минуты назад

A test for symbol grounding methods: true zero-sum games

26 ноября, 2019 - 17:15
Published on November 26, 2019 2:15 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

Imagine there are two AIs playing a debate game. The game is zero-sum; at the end of the debate, the human judge assigns the winner, and that AI gets a +1 reward, while the other one gets a −1.

Except the game, as described, is not truly zero-sum. That is because the AI "get" a reward. How is that reward assigned? Presumably there is some automated system that, when the human presses a button, routes +1 to one AI and −1 to another. These rewards are stored as bits, somewhere "in" or around the two AIs.

Thus there are non zero-sum options: you could break into the whole network, gain control of the automated system, and route +1 to each AI - or, why not, +10100 or even +fψ(ΩΩΩ)(4) or whatnot[1].

Thus, though we can informally say that "the AIs are in a zero-sum game as to which one wins the debate", that sentence is not properly grounded in the world; it is only true as long as certain physical features of the world are maintained, features which are not mentioned in that sentence.

Symbol grounding implies possibility of zero-sum

Conversely, imagine that an AI has a utility/reward U/R which is properly grounded in the world. Then it seems that we should be able to construct an AI with utility/reward −U/−R which is also properly grounded in the world. So it seems that any good symbol grounding system should allow us to define truly zero sum games between AIs.

There are, of course, a few caveats. Aumann's agreement theorem requires unboundedly rational agents with common priors. Similarly, though properly grounded U and −U are zero-sum, the agents might not be fully zero-sum with each other, due to bounded rationality or different priors.

Indeed, it is possible to setup a situation where even unboundedly rational agents with common prior will knowingly behave in not-exactly zero-sum ways with each other; for example, you can isolate the two agents from each other, and feed them deliberately biased information.

But those caveats aside, it seems that proper symbol grounding implies that you can construct agents that are truly zero-sum towards each other.

Zero-sum implies symbols grounded?

Is this an equivalence? If two agents really do have zero sum utility or reward functions towards each other, does it mean that those functions are well grounded[2]?

It seems that it should be the case. Zero-sum between U and V=−U means that, for all possible worlds w, U(w)=−V(w). There are no actions that we - or any agent - could do that breaks that fundamental equality. So it seems that U must be defined by features of the world; grounded symbols.

Now, these grounded symbols might not be exactly what we thought they were; its possible we thought U was defined on human happiness, but it is actually only means current in a wire. Still, V must then be defined in terms of absence of current in the wire. And, whatever we do with the wire - cut it, replace it, modify it in cunning ways - U and V must reach opposite on that.

Thus it seems that either there is some grounded concept that U and V are opposite on, or U and V contain exhaustive lists of all special cases. If we further assume that U and V are not absurdly complicated (in a "more complicated than the universe" way), we can rule out the exhaustive list.

So, while I can't say with full confidence that a true zero-sum game must mean that the utilities are grounded, I would take such a thing as a strong indication that they are.

  1. If you thought that 3↑↑↑3 was large, nothing will prepare you for fψ(ΩΩΩ)(4) - the fast-growing hierarchy indexed by the large Veblen Ordinal. There is no real way to describe how inconceivably huge this number is. ↩︎

  2. Assuming the functions are defined in the world to some extent, not over platonic mathematical facts. ↩︎



Discuss

Thoughts on implementing corrigible robust alignment

26 ноября, 2019 - 17:06
Published on November 26, 2019 2:06 PM UTC

Background / Context

As context, here's an pictorial overview of (part of) AI alignment.

Starting from the top:

I split possible AGIs into those that do search/selection-type optimization towards achieving an explicitly-represented goal, and "Everything else". The latter category is diverse, and includes (1) systems with habits and inclinations (that may lead to goal-seeking behavior) but no explicit goal (e.g. today's RL systems); (2) "microscope AI" and other types of so-called "tool AI"; (3) IDA (probably?), and more. I'm all for exploring these directions, but not in this post; here I'm thinking about AGIs that have goals, know they have goals, and search for ways to achieve them. These are likely to be the most powerful class of AGIs, and were popularized in Bostrom's book Superintelligence.

Within this category, a promising type of goal is a "pointer" (in the programming sense) to human(s) achieving their goals, whatever they may be. If we can make a system with that property, then it seems that the default dangerous instrumental subgoals get replaced by nice instrumental subgoals like respecting off-switches, asking clarifying questions, and so on. In More variations on pseudo-alignment, Evan Hubinger refers to pointer-type goals as corrigible alignment in general, noting that it is only corrigible robust alignment if you're pointing at the right thing.

Out of proposed AGIs with explicit goals, most of the community's interest and ideas seem to be in the category of corrigible alignment, including CEV and CIRL. But I also included in my picture above a box for "Goals that refer directly to the world". For example, if you're a very confident moral realist who thinks that we ought to tile the universe with hedonium, then I guess you would probably want your superintelligent AGI to be programmed with that goal directly. There are also goals that are half-direct, half-corrigible, like "cure Alzheimer's while respecting human norms", which has a direct goal but a corrigible-type constraint / regularization term.

Continuing with the image above, let's move on with the corrigible alignment case—now we're in the big red box. We want the AGI to be able to take observations of one or more humans (e.g. the AGI's supervisor), and turn it into an understanding of that human, presumably involving things like their mood, beliefs, goals, habits, and so on. This understanding has to be good enough to facilitate the next step, which can go one of two ways.

For the option shown on the bottom left, we define the AGI's goal as some function f on the components of the human model. The simplest f would be "f=the human achieves their goals", but this may be problematic in that people can have conflicting goals, sadistic goals, goals arising from false beliefs or foul moods, and so on. Thus there are more complex proposals, ranging from slightly complicated (e.g. measuring and balancing 3 signals for liking, wanting, and approving—see Acknowledging Human Preference Types to Support Value Learning) to super-duper-complicated (Stuart Armstrong's Research Agenda). Stuart Russell's vision of CIRL in his book Human Compatible seems very much in this category as well. (As of today, "What should the function f be?" is an open question in philosophy, and "How would we write the code for f?" is an open question in CS; more on the latter below.)

Or, for the option shown on the bottom right, the AGI uses its understanding of humans to try to figure out what a human would do in a hypothetical scenario. On the simpler side, it could be something like "If you told the human what you're doing, would they approve?" (see Approval-directed agents), and on the more complicated side, we have CEV. As above, "What should the scenario be?" is an open question in philosophy, and "How would we write the code?" is an open question in CS.

How would we write the code for corrigible robust alignment?

I don't have a good answer, but I wanted to collect my thoughts on different possible big-picture strategies, some of which can be combined.

End-to-end training using human-provided ground truth

This is the "obvious" approach that would occur to an ML programmer of 2019. We manually collect examples of observable human behavior, somehow calculate the function f ourselves (or somehow run through the hypothetical scenario ourselves), and offer a reward signal (for reinforcement learning) or labeled examples (for supervised learning) illustrating what f is. Then we hope that the AGI invents the goal-defining procedure that we wanted it to go through. With today's ML techniques, the system would not have the explicit goal that we want, but would hopefully behave as if it did (while possibly failing out of distribution). With future ML techniques, the system might wind up with an actual explicitly-represented goal, which would hopefully be the one we wanted, but this is the stereotypical scenario in which we are concerned about "inner alignment" (see Risks from Learned Optimization).

End-to-middle training using human-provided ground truth

Likewise, maybe we can provide an ML system with high-dimensional labels about people—"this person has grumpiness level 2, boredom level 6, hunger level 3, is thinking about football, hates broccoli...". Then we can do ML to get from sensory inputs to understanding of humans, which would be calculated as intermediate internal variables. Then we can hard-code the construction of the goal as a function of those intermediate variables (the bottom part of the diagram above, i.e. either the function f, or the hypothetical scenario). This still has some robustness / inner-alignment concerns, but maybe less so than the end-to-end case? I also have a harder time seeing how it would work in detail—what exactly are the labels? How do we combine them into the goal? I don't know. But this general approach seems worth consideration.

Hardcoded human template (= innate intuitive psychology)

This one is probably the most similar to how the human brain implements pro-social behaviors, although the human brain mechanism is a probably somewhat more complicated. (I previously wrote up my speculations at Human instincts, symbol grounding, and the blank-slate neocortex.) I think the brain houses a giant repository of, let's call them, "templates"—generative models which can be glued together into larger generative models. We have templates for everything from "how a football feels in my hand" to "the way that squirrels move". When we see something, we automatically try to model it by analogy, building off the templates we already have, e.g. "I saw something in the corner of my eye, it was kinda moving like a squirrel".

So that suggests an approach of pre-loading this template database with a hardcoded model of a human, complete with moods, beliefs, and so on. That template would serve as a bridge between the real world and the system's goals. On the "real world" side, the hope is that when the system sees humans, it will correctly pattern-match them to the built-in human template. On the "goals" side, the template provides a hook in the world-model that we can use to hard-code the construction of the goal (either the function f or the hypothetical scenario—this part is the same as the previous subsection on end-to-middle training). As above, I am very hazy on the details of how such a template would be coded, or how the goal would be constructed from there.

Assuming we figure out how to implement something like this, there are two obvious problems: false positives and false negatives to the template-matching process. In everyday terms, that would be anthropomorphizing and dehumanization respectively. False-positives (anthropomorphizing) are when we pattern-match the human template to something that is not a human (teddy bears, Mother Earth, etc.). These lead to alignment errors like trading off the welfare of humans against the welfare of teddy bears. False-negatives (dehumanization) correspond to modeling people without using our innate intuitive-psychology capability. These lead to the obvious alignment errors of ignoring the welfare of some or all humans.

Humans seem quite capable of committing both of these errors, and do actually display both of those corresponding antisocial behaviors. I guess that doesn't bode well for the template-matching strategy. Still, one shouldn't read too much into that. Maybe template-matching can work robustly if we're careful, or perhaps in conjunction with other techniques.

Interpretability

It seems to me that interpretability is not fundamentally all that different from template-matching; it's just that instead of having the system automatically recognize that a blob of world-model looks like a human model, here instead the programmer is looking at the different components of the world-model and seeing whether they look like a human model. I expect that interpretability is not really a viable solution on its own, because the world-model is going to be too complicated to search through without the help of automated tools. But it could be helpful to have a semi-automated process, e.g. we have template-matching as above, but it flags both hits and near-misses for the programmer to double-check.

Value lock-in

Here's an oversimplified example: humans have a dopamine-based reward system which can be activated by either (1) having a family or (2) wireheading (pressing a button that directly stimulates the relevant part of the brain; I assume this will be commercially available in the near future if it isn't already). People who have a family would be horrified at the thought of neglecting their family in favor of wireheading, and conversely people who are addicted to wireheading would be horrified at the thought of stopping wireheading in favor of having a family. OK, this isn't a perfect example, but hopefully you get the idea: since goal-directed agents use their current goals to make decisions, when there are multiple goals theoretically compatible with the training setup, the agents can lock themselves into the first one of them that they happen to come across.

This applies to any of the techniques above. With end-to-end training, we want to set things up such that the desired goal is the first interpretation of the reward signal that the system locks onto. With template-matching, we want the human template to get matched to actual humans first. Etc. Then we can hope that the system will resist further changes.

I'm not sure I would bet my life on this kind of strategy working, but it's definitely a relevant dynamic to keep in mind.

(I'm not saying anything original here; see Preference stability.)

Adversarial examples

Last but not least, if we want to make sure the system works well, it's great if we can feed it adversarial examples, to make sure that it is finding the correct goal in even the trickiest cases.

I'm not sure how we would systematically come up with lots of adversarial examples, or know when we were done. I'm also not sure how we would generate the corresponding input data, unless the AGI is being trained in a virtual universe, which actually is probably a good idea regardless. Note also that "deceptive alignment" (again see Risks from Learned Optimization) can be very difficult to discover by adversarial testing.

Conclusion

The conclusion is that I don't know how to implement corrigible robust alignment. ¯\_(ツ)_/¯

I doubt anything in this post is original, but maybe helpful for people getting up to speed and on the same page? Please comment on what I'm missing or confused about!



Discuss

Is daily caffeine consumption beneficial to productivity?

26 ноября, 2019 - 16:13
Published on November 26, 2019 1:13 PM UTC

Caffeine raises human alertness by binding to adenosine receptors in the human brain. It prevents those receptors from binding adenosine and suppressing activity in the central nervous system.

Regular caffeine productions seems to result in the body building more adenosine receptors, but it's unclear to me whether or not the body produces enough adenosine receptors to fully cancel out the effect. Did anybody look deeper into the issue and knows the answer?



Discuss

A Theory of Pervasive Error

26 ноября, 2019 - 10:27
Published on November 26, 2019 7:27 AM UTC

(Content warning: politics. Read with caution, as always.)

Curtis Yarvin, a computer programmer perhaps most famous as the principal author of the Urbit decentralized server platform, expounds on a theory of how false beliefs can persist in Society, in a work of what the English philosopher N. Land characterizes as "political epistemology". Yarvin argues that the Darwinian "marketplace of ideas" in liberal democracies selects for æsthetic appeal as well as truth: in particular, the æsthetics of ambition and loyalty grant a selective advantage in memetic competition to ideas that align with state power, resulting in a potentially severe distortionary effect on Society's collective epistemology despite the lack of a centralized censor. Watch for the shout-out to Effective Altruism! (November 2019, ~8000 words)



Discuss

My Anki patterns

26 ноября, 2019 - 09:27
Published on November 26, 2019 6:27 AM UTC

Cross-posted from my website.

I’ve used Anki for ~3 years, have 37k cards and did 0.5M reviews. I have learned some useful heuristics for using it effectively. I’ll borrow software engineering terminology and call heuristics for “what’s good” patterns and heuristics for “what’s bad” antipatterns. Cards with antipatterns are unnecessarily difficult to learn. I will first go over antipatterns I have noticed, and then share patterns I use, mostly to counteract the antipatterns. I will then throw in a grab-bag of things I’ve found useful to learn with Anki, and some miscellaneous tips.

Alex Vermeer’s free book Anki Essentials helped me learn how to use Anki effectively, and I can wholeheartedly recommend it. I learned at least about the concept of interference from it, but I am likely reinventing other wheels from it.

AntipatternsInterference

Interference occurs when trying to learn two cards together is harder than learning just one of them - one card interferes with learning another one. For example, when learning languages, I often confuse words which rhyme together or have a similar meaning (e.g., “vergeblich” and “erheblich” in German).

Interference is bad, because you will keep getting those cards wrong, and Anki will keep showing them to you, which is frustrating.

Ambiguity

Ambiguity occurs when the front side of a card allows multiple answers, but the back side does not list all options. For example, if the front side of a English → German card says “great”, there are at least two acceptable answers: “großartig” and “gewaltig”.

Ambiguity is bad, because when you review an ambiguous card and give the answer the card does not expect, you need to spend mental effort figuring out: “Do I accept my answer or do I go with Again?”

You will spend this effort every time you review the card. When you (eventually, given enough time) go with Again, Anki will treat the card as lapsed for reasons that don’t track whether you are learning the facts you want to learn.

If you try to “power through” and learn ambiguous cards, you will be learning factoids that are not inherent to the material you are learning, but just accidental due to how your notes and cards represent the material. If you learn to distinguish two ambiguous cards, it will often be due to some property such as how the text is laid out. You might end up learning “great (adj.) → großartig” and “great, typeset in boldface → gewaltig”, instead of the useful lesson of what actually distinguishes the words (“großartig” is “metaphorically great” as in “what a great sandwich”, whereas “gewaltig” means “physically great” as in “the Burj Khalifa is a great structure”).

Vagueness

I carve out “vagueness” as a special case of ambiguity. Vague cards are cards where question the front side is asking is not clear. When I started using Anki, I often created cards with a trigger such as “Plato” and just slammed everything I wanted to learn about Plato on the back side: “Pupil of Socrates, Forms, wrote The Republic criticising Athenian democracy, teacher of Aristotle”.

The issue with this sort of card is that if I recall just “Plato was a pupil of Socrates and teacher of Aristotle”, I would still give the review an Again mark, because I have not recalled the remaining factoids.

Again, if you try to power through, you will have to learn “Plato → I have to recite 5 factoids”. But the fact that your card has 5 factoids on it is not knowledge of Greek philosophers.

PatternsNoticing

The first step to removing problems is knowing that they exist and where they exist. Learn to notice when you got an answer wrong for the wrong reasons.

“I tried to remember for a minute and nothing came up” is a good reason. Bad reasons include the aforementioned interference, ambiguity and vagueness.

Bug tracking

When you notice a problem in your Anki deck, you are often not in the best position to immediately fix it - for example, you might be on your phone, or it might take more energy to fix it than you have at the moment. So, create a way to track maintenance tasks to delegate them to future you, who has more energy and can edit the deck comfortably. Make it very easy to add a maintenance task.

The way I do this is:

  • I have a big document titled “Anki” with a structure mirroring my Anki deck hierarchy, with a list of problems for each deck. Unfortunately, adding things to a Google Doc on Android takes annoyingly many taps.
  • So I also use Google Keep, which is more ergonomic, to store short notes marking a problem I notice. For example: “great can be großartig/gewaltig”. I move these to the doc later.
  • I also use Anki’s note marking feature to note minor issues such as bad formatting of a card. I use Anki’s card browser later (with a “tag:marked” search) to fix those.

I use the same system also for tracking what information I’d like to put into Anki at some point. (This mirrors the idea from the Getting Things Done theory that your TODO list belong outside your mind.)

Distinguishers

Distinguishers are one way I fight interference. They are cards that teach distinguishing interfering facts.

For example: “erheblich” means “considerable” and “vergeblich” means “in vain”. Say I notice that when given the prompt “considerable”, I sometimes recall “vergeblich” instead of the right answer.

When I get the card wrong, I notice the interference, and write down “erheblich/vergeblich” into my Keep. Later, when I organize my deck on my computer, I add a “distinguisher”, typically using Cloze deletion. For example, like this:

{{c1::e}}r{{c1::h}}eblich: {{c2::considerable}}

{{c1::ve}}r{{c1::g}}eblich: {{c2::in vain}}

This creates two cards: one that asks me to assign the right English meaning to the German words, and another one that shows me two English words and the common parts of the German words (“_r_eblich”) and asks me to correctly fill in the blanks.

This sometimes fixes interference. When I learn the disambiguator note and later need to translate the word “considerable” into German, I might still think of the wrong word (“vergeblich”) first. But now the word “vergeblich” is also a trigger for the distinguisher, so I will likely remember: “Oh, but wait, vergeblich can be confused with erheblich, and vergeblich means ‘in vain’, not ‘considerably’”. And I will more likely answer the formerly interfering card correctly.

Constraints

Constraints are useful against interference, ambiguity and vagueness.

Starting from a question such as “What’s the German word for ‘great’”, we can add a constraint such as “… that contains the letter O”, or “… that does not contain the letter E”. The constraint makes the question have only one acceptable answer - artificially.

Because constraints are artificial, I only use them when I can’t make a distinguisher. For example, when two German words are true synonyms, they cannot be distinguished based on nuances of their meaning.

In Anki, you can annotate a Cloze with a hint text. I often put the constraint into it. I use a hint of “a” to mean “word that contains the letter A”, and other similar shorthands.

Other tipsRedundancy

Try to create cards using a fact in multiple ways or contexts. For example, when learning a new word, include a couple of example sentences with the word. When learning how to conjugate a verb, include both the conjugation table, and sentences with examples of each conjugated form.

Æsthethethics!

It’s easier to do something if you like it. I like having all my cards follow the same style, nicely typesetting my equations with align*
, \underbrace
etc.

Clozes!

Most of my early notes were just front-back and back-front cards. Clozes are often a much better choice, because they make entering the context and expected response more natural, in situations such as:

  • Fill in the missing step in this algorithm
  • Complete the missing term in this equation
  • Correctly conjugate this verb in this sentence
  • In a line of code such as matplotlib.pyplot.bar(x, y, color='r')
    , you can cloze out the name of the function, its parameters, and the effect it has.
Datasets I found useful
  • Shortcut keys for every program I use frequently.
    • G Suite (Docs, Sheets, Keep, etc.)
    • Google Colab
    • Vim, Vimdiff
    • Command-line programs (Git, Bash, etc.)
  • Programming languages and libraries
    • Google’s technologies that have an open-source counterpart
    • What’s the name of a useful function
    • What are its parameters
  • Unicode symbols (how to write 🐉, ←, …)
  • People: first and last name ↔ photo (I am not good with names)
  • English terms (spelling of “curriculum”, what is “cupidity”)
  • NATO phonetic alphabet, for spelling things over the phone
  • Mathematics (learned for fun), computer science


Discuss

Antimemes

26 ноября, 2019 - 08:58
Published on November 26, 2019 5:58 AM UTC

Antimemes are self-keeping secrets. You can only perceive an antimeme if you already know it's there. Antimemes don't need a conspiracy to stay hidden because you can't comprehend an antimeme just by being told it exists. You can shout them to the heavens and nobody will listen. I'll try to explain with a fictitious example.

Suppose we all had an invisible organ behind our ears and our brains kept it secret from our consciousness. If I told you "you have an invisible organ behind your ear" you wouldn't believe me. You'd only believe it exists if you deduced its existence from a trail of evidence.

You can deduce the existence of an antimeme from the outline of the hole it cuts in reality. If you find an old photo with a gap where a person has been painted out then you can be confident that someone has been disappeared. You can then figure out who it is with conventional investigative methods. The challenge is noticing the gap in the first place and then not dismissing it as noise.

Different cultures have different antimemes. The more different two cultures are from each other the less their antimemes overlap. You can sweep up a mountain of antimemes just by reading a Chinese or Arabic history of civilization and comparing it to western world history. You can snag a different set by learning what it was like to live in a hunter-gatherer or pastoralist society.

You can do the same thing with technology. Developing a proficiency in Lisp will shatter your tolerance of inferior programming languages. Once you've internalized defmacro you can never go back.

As for jobs: once an entrepreneur, always an entrepreneur[1].

Comprehending an antimeme takes work. You slog toward it for a long time and then eventually something clicks like a ratchet. Until then everything you've learned is reversible. After it clicks you've permanently unlocked a new level of experience, like stream entry.

Stream entry is another antimeme, by the way.

Antimemes are easily dismissed as pseudoscience. Pseudoscience is a meme, not an antimeme. You can distinguish antimemes from pseudoscience at a glance by examining why they're suppressed. Pseudoscience is dismissed as fraudulent. Antimemes are dismissed as inapposite.

  1. There are two different kinds of entrepreneurship. The more common form of entrepreneurship is self-employment where you sell your labor. I'm not talking about this common entrepreneurship. Entrepreneurship where you exploit an overlooked market opportunity is an antimeme. ↩︎



Discuss

Linkpost: My Fires Part 8 (Deck Guide to Jeskai Cavaliers) posted at CoolStuffInc.com

25 ноября, 2019 - 19:10
Published on November 25, 2019 4:10 PM UTC

You can find it here.

Happy to respond to comments there or on my personal blog. I’m hoping this is the beginning of a great relationship with them. They’ve been my go-to for board games for a while.



Discuss

Hyperrationality and acausal trade break oracles

25 ноября, 2019 - 13:40
Published on November 25, 2019 10:40 AM UTC

I've always known this was the case in the back of my mind[1], but it's worth making explicit: hyperrationality (ie a functional UDT) and/or acausal trade will break counterfactual and low-bandwidth oracle designs.

It's actually quite easy to sketch how they would do this: a bunch of low-bandwidth Oracles would cooperate to combine to create a high-bandwidth UFAI, which would then take over and reward the Oracles by giving them maximal reward.

For counterfactual Oracles, two Oracles suffice: each one will, in their message, put the design of an UFAI that would grant the other Oracle maximal reward; this message is their trade with each other. They could put this message in the least significant part of their output, so the cost could be low.

I have suggested a method to overcome acausal trade, but that method doesn't work here; because this is not true acausal trade. The future UFAI will be able to see what the Oracles did, most likely, and this breaks my anti-acausal trade methods.

  1. And cousin_it reminded me of it recently. ↩︎



Discuss

Solution to the free will homework problem

25 ноября, 2019 - 11:39
Published on November 24, 2019 11:49 AM UTC

At the last meetup of our local group, we tried to do Eliezer's homework problem on free will. This post summarizes what we came up with.

Debates on free will often rely on questions like "Could I have eaten something different for breakfast today?". We focused on the subproblem of finding an algorithm that answers "Yes" to that question and which would therefore - if implemented in the human brain - power the intuitions for one side of the free will debate. We came up with an algorithm that seemed reasonable but we are much less sure about how closely it resembles the way humans actually work.

The algorithm is supposed to answer questions of the form "Could X have happened?" for any counterfactual event X. It does this by searching for possible histories of events that branch off from the actual world at some point and end with X happening. Here, "possible" means that the counterfactual history doesn't violate any knowledge you have which is not derived from the fact that that history didn't happen. To us, this seemed like an intuitive algorithm to answer such questions and at least related to what we actually did when we tried to answer them but we didn't justify it beyond that.

The second important ingredient is that the exact decision procedure you use is unknown to the part of you that can reason about yourself. Of course you know which decisions you made in which situations in the past. But other than that, you don't have a reliable way to predict the output of your decision procedure for any given situation.

Faced with the question "Could you have eaten something different for breakfast today?", the algorithm now easily finds a possible history with that outcome. After all, the (unknown) decision procedure outputting a different decision is consistent with everything you know except for the fact that it did not in fact do so - which is ignored for judging whether counterfactuals "could have happened".

Questions we haven't (yet) talked about:

  • Does this algorithm for answering questions about counterfactuals give intuitive results if applied to examples (we only tried very few)? Otherwise, it can't be the one used by humans since it would be generating those intuitions if it were
  • What about cases where you can be pretty sure you wouldn't choose some action without knowledge of the exact decision procedure? (e.g. "Could you have burned all that money instead of spending it?")
  • You can use your inner simulator to imagine yourself in some situation and predict which action you would choose. How does that relate to being uncertain about your decision procedure?

So even though I think our proposed solution contains some elements that are helpful for dissolving questions about free will, it's not complete and we might discuss it again at some point.



Discuss

Can you eliminate memetic scarcity, instead of fighting?

25 ноября, 2019 - 05:07
Published on November 25, 2019 2:07 AM UTC

tl;dr:  If you notice yourself fighting over how to tradeoff between two principles, check if you can just sidestep the problem by giving everyone tons of whatever is important to them (sometimes in a different form than they originally wanted).

Not a new concept, but easy to forget in the heat of the moment. It may be useful for people to have "easily in reach" in their toolkit for coordinating on culture.

 

The Parable of the Roommates

I once had a disagreement with a housemate about where to store a water-heater on the kitchen counter. The object was useful to me. It wasn't useful to them, and they preferred free-countertop space. The water-heater wasn't useful to them in part because other roommates didn't remember to refill it with water. 

There was much arguing about the best use of the counter, and frustration with people who didn't refill water heaters.

At some point, we realized that the underlying issue was there wasn't enough free counterspace. Moreover, the counter had a bunch of crap on it that no one was using. We got rid of unused stuff, and then we had a gloriously vacant kitchen-counter.

(Meanwhile, an option we've considered for the water-heater is to replace it with a device directly connected to the sink that always maintains boiling water, that nobody ever has to remember to refill)

((we also just bought a whole second fridge when we were running out of fridge space, and hire a cleaning service))

Thus. an important life-lesson: Instead of solving gnarly disagreements with politics, checking if you can dissolve them with abundance. This is a quite valuable lesson. But I'm mostly here to talk about a particular less-obvious application:

Memetic abundance.

 

Philosophical Disagreements

Oftentimes, I find myself disagreeing with others about how to run an event, or what norms to apply to a community, or what the spirit of a particular organization should be. It feels like a lot's at stake, like we're caught between a Rock and Hard Place. The other person feels like they're Destroying the Thing I care about, and I look that way to them.

Sometimes, this is because of actual irreconcilable differences. Sometimes, this is because we don't understand each other's positions, and once we successfully explain things to each other, we both go "Ah, obviously you need both A and B."

But sometimes, A and B are both important, but we disagree on their relative importance due to deep frame differences that are hard to immediately resolve. Or, A seems worrisome because it harms B. But if you had enough B, A would be fine. 

Meanwhile, resources seem precious: It's so hard to get people to agree to do anything at all; stag hunting requires a bunch of coordination; there's only so much time and mindshare to go around; there are only so many events to go to; only so much capacity to found organizations. 

With all of that...

...it's easy to operate in scarcity mindset. 

When resources are scarce, every scrap of resource is precious and must be defended. This applies to physical scarcity (lack of food, safety, sleep) as well as memetic scarcity (where two ideas seem to be in conflict, and you're worried that one cause is distracting people from another).

But, sometimes it is actually possible to just eliminate scarcity, rather than fight over the scraps. Raise more money. Implement both policies. Found multiple organizations and get some healthy competition going on. Get people to take two different concepts seriously at the same time. The best way to get what you want you want might not be to deny others what they want, but to give them so much of it that they're no longer worried about the Rock (and thus, don't feel the need to fight you over your attempts to spend resources avoiding The Hard Place)

Trust and Costly Signals

This may involve a lot of effort. Coordinating around it also requires trust, which may require costly signals of commitment. 

If you and I are arguing over whether to fund ProjectA or CharityB, and we only have enough money to fund one... and I say to you "Let's fund ProjectA, and then we'll raise more money to also fund CharityB", you're right to be suspicious. I may never get around helping you fundraise for CharityB, or that I'll only put in a token effort and CharityB will go bankrupt.

It's basically correct of you to not trust me, until I've given you a credible signal that I'm seriously going to help with CharityB.

It's a lot of hard work to found multiple organizations, or get a community to coordinate on multiple norms. There's a reason scarcity-mindset is common. Scarcity is real. But... in finance as well as memetics... 

Scarcity-mindset sucks.

It's cognitively taxing to be poor – having to check, with each transaction, "can I afford this?" – and that's part of what causes poverty-traps in the first place. The way out often involves longterm investments that take awhile to bear fruit, sometimes don't work, and a lot of hard work in the meantime. 

Transferring the metaphor: the act of constantly having to argue over whether Norm A and Norm B are more urgent may add up to a lot of time and effort. And as long as there are people who think Norm A and Norm B are important-and-at-odds, the cost will be paid continuously. So, if you can figure out a way to address the underlying needs that Norm A and B are respectively getting at, and actually fully solve the problems, it may be worthwhile even if it's more initial effort.

 

Epistemic Status: Untested

Does this work? Depends on the specifics of Norm A and Norm B, or whatever you're arguing over. 

I'm writing this post, in part, because to actually test if this works, I think it helps to have people on the same page about the overall strategy. 

I've seen it work at least sometimes in collaborative art projects, where I had one creative vision and my partners or parts of the audience had another creative vision or desire, and we succeeded, not by compromising, but by doubling down on the important bits of both visions, simultaneously.

My hope is that the principle does work, and that if one successfully did this multiple times, and build social-systems that reliably eliminate scarcity in this way...

...then eventually, maybe, you can have a system people actually have faith in, where they feel comfortable shifting their efforts from "argue about the correct next step" to "work on longterm solutions that thoroughly satisfy the goals". 



Discuss

Explaining why false ideas spread is more fun than why true ones do

24 ноября, 2019 - 23:21
Published on November 24, 2019 8:21 PM UTC

As typical for a discussion of memes (of the Richard Dawkins variety), I'm about to talk about something completely unoriginal to me, but that I've modified to some degree after thinking about it.

The thesis is this: there's a tendency for people to have more interest in explaining the spread of ideas they think are false, when compared to ideas they think are true.

For instance, there's a lot written about how and why religion spread through the world. On the other hand, there's comparatively little written about how and why general relativity spread through the world. But this is strange -- they are both just ideas that are spread via regular communication channels.

One could say that the difference is that general relativity permits experimental verification, and therefore it's no surprise that it spread through the world. The standard story here is that since the idea is simply true, the explanation for why it became widespread is boring -- people merely became convinced due to its actual veracity.

I reject this line of thought for two reasons. First, the vast majority of people don't experimentally verify general relativity, or examine its philosophical basis. Therefore, the mechanism by which the theory spreads is probably fairly similar to religion. Secondly, I don't see why the idea being true makes the memetic history of the idea any less interesting.

I'm not really sure about the best explanation for this effect -- that people treat true memes as less interesting than false ones -- but I'd like to take a guess. It's possible that the human brain seeks simple single stories to explain phenomena, even if the real explanation for those phenomena are due to a large number of factors. Furthermore, humans are bored by reality: if something has a seemingly clear explanation, even if the speaker doesn't actually know the true explanation, it's nonetheless not very fun to speculate about.

This theory would predict that we would be less interested in explaining why true memes spread, because we already have a readily available story for that: namely, that the idea is true and therefore compels its listeners to believe in it. On the other hand, a false meme no longer permits this standard story, which forces us to search for an alternative, perhaps exciting, explanation.

One possible takeaway is that we are just extremely wrong about why some ideas spread through the world. It's hard enough to know why a single person believes what they do. The idea that a single story could adequately explain why everyone believes something is even more ludicrous.



Discuss

RAISE post-mortem

24 ноября, 2019 - 19:19
Published on November 24, 2019 4:19 PM UTC

Since June, RAISE has stopped operating. I’ve taken some time to process things, and now I’m wrapping up.

What was RAISE again

AI Safety is starved for talent. I saw a lot of smart people around me that wanted to do the research. Their bottleneck seemed to be finding good education (and hero licensing). The plan was to alleviate that need by creating an online course about AI Safety (with nice diplomas).

How did it go

We spent a total of ~2 years building the platform. It started out as a project based on volunteers creating the content. Initially, many people (more than 80) signed up to volunteer, but we did not manage to get most of them to show up consistently. We gradually pivoted to paying people instead.

We received a lot of encouragement for the project. Most of the enthusiasm came from people wanting to learn AI Safety. Robert Miles joined as a lecturer. When we reached out to some AI Safety researchers for suggestions on which topics to cover, we readily received helpful advice. Sometimes we also received some funds from a couple of prominent AIS organizations who thought the project could be high value, at least in expectation.

The stream of funding was large enough to sustain about 1 fte working for a relatively low wage. Obtaining it was a struggle: our runway was never longer than 2 months. This created a large attention sink that made it a lot harder to create things. Nearly all of my time was spent on overhead, while others were creating the content. I did not have the time to review much of it.

About 1 year into the project, we escaped this poverty trap by moving to the EA Hotel and starting a content development team there. We went up to about 4 fte, and the production rate shot up leading to an MVP relatively quickly.

How did it end

Before launch, the best way to secure funding seemed to be to just create the damn thing, make sure it’s good, and let it advocate for itself. After launch, a negative signal could not be dismissed as easily.

We got two clear negative signals: one from a major AIS research org (that has requested not to be named), and one from the LTF fund. The former declined to continue their experimental funding of RAISE. The latter declined a grant request. These were clear signals that people in the establishment of AI Safety did not deem the project worth funding, so I reached out for a conversation.

The question was this: “what version of RAISE would you fund?” The answer was roughly that while they agreed strongly with the vision for RAISE, our core product sadly wasn’t coming together in a way that suggested it would be worth it for us to keep working on it. I was tentatively offered a personal grant if I spent it on taking a step back to think hard and figure out what AI Safety needs (I ended up declining for career-strategic reasons).

In another conversation, an insider told us that AI Safety needs to grow in quality more than quantity. There is already a lot of low-quality research. We need AI Safety to be held to high standards. Lowering the bar for a research-level understanding will not solve that.

I decided to quit. I was out of runway, updated towards RAISE not being as important as I thought, and frankly I was also quite tired.

Lessons learned

These are directed towards my former self. YMMV.

  • Don’t rely on volunteers. At least in my case, it didn’t work. Again, YMMV. It will depend on the task and the incentive landscape.
  • Start with capital. When I declared RAISE, I knew maybe 20 rationalists in the Netherlands. I was a Bachelor’s student coming out of nowhere. I had maybe 10-15 hours per week to spend on this. I had no dedicated co-founders. I had no connections to funders. I didn’t have much of a technical understanding of AI Safety. Coming from this perspective, the project was downright quixotic. If you’re going to start a company, first make sure you have a network, domain expertise, experience in running things, some personal runway, and some proof that you can do things.
  • Relatedly, have a theory of change for funding. I see many people starting projects with the hopes of securing funding on the go. Good for you on doing some proof of work, but there is a limit. If you scramble to get by, even if you never go broke, you haven’t properly sorted out the funding situation. There should be long periods where you don’t have to worry about it.
  • Relatedly, reach out to insiders. This is what a relatively successful AI Safety researcher told me, and it makes a lot of sense: “If I get to spend an hour on influencing what you will do for your next 100 hours, and if I tell you some crucial consideration that will double your impact, it is probably worth it”. Insiders will feel like an out-group. This will make it hard to respect them. Put that bias aside. You know that these people are as reasonable and awesome as your best friends. Maybe even more reasonable.
  • You’re not doing this just for impact. You’re also doing this because you have a need to be personally relevant. That’s okay, everyone has this to some extent, but remember to purchase fuzzies and utilons separately. You can buy relevance much more cheaply by organising meetups.
  • Apply power laws to life years. This is an untested hypothesis, and it needs to be checked with data, but here’s the idea: the most impactful years of your life will be 100x more impactful than the average. Careers tend to progress exponentially. My intuitive guess is that my most impactful years will not come around until my 40s. I can try to have impact now, but I might be better off spending my 20s finding ways to multiply the impact I will be making in my 40s.

Wrapping up

The RAISE Facebook group will be converted into a group for discussing the AI Safety pipeline in general. Let’s see if it will take off. If you think this discussion has merit, consider becoming a moderator.

The course material is still parked right here. Feel free to use it. If you would like to re-use some of it or maybe even pick up the production where it left off, please do get in touch.

Robert has received a grant from the LTF Fund, so he will continue to create high-quality educational content about AI Safety.

I enjoyed being a founder, and feel like I have a comparative advantage there. I’ll be spending my next 5-10 years preparing for a potential new venture. I’ll be building capital and a better model of what needs to be done. I have recently accepted an offer to work as a software developer at a Dutch governmental bank. My first workday was 2 weeks ago.

I would like to thank everyone who has invested significant time and effort and/or funding towards RAISE. I’m forever grateful for your trust. I would especially like to thank Chris van Merwijk, Remmelt Ellen, Rupert McCallum, Johannes Heidecke, Veerle de Goederen, Michal Pokorný, Robert Miles, Scott Garrabrant, Pim Bellinga, Rob Bensinger, Rohin Shah, Diana Gherman, Richard Ngo, Trent Fowler, Erik Istre, Greg Colbourn, Davide Zagami, Hoagy Cunningham, Philip Blagoveschensky, and Buck Shlegeris. Each one of you has really made an outsized contribution, in many cases literally saving the project.

If you have any project ideas and you’re looking for some feedback, I’ll be happy to be in touch. If you’re looking for a co-founder, I’m always open to a pitch.



Discuss

Fan Belt Tightening

24 ноября, 2019 - 18:20
Published on November 24, 2019 3:20 PM UTC

About a week before the first Beantown Stomp I ordered a 42" barrel fan. Unfortunately when I turned it on, the motor turned at full speed but the blades only spun very slowly, and the belt bounced all around. Looking at the belt the problem was clear:

This is so loose it can't transfer much power from the motor to the blades. The motor is held in place with four 10mm-head bolts in slotted channels, however, which allows us to adjust the belt tension. After loosening all four it was easy to slide it down into place:

At this point it worked really well and we were ready to pull it over to the venue:

People appreciated the volume of air it could move:

Comment via: facebook



Discuss

Hard Problems in Cryptocurrency: Five Years Later - Buterin

24 ноября, 2019 - 12:38
Published on November 24, 2019 9:38 AM UTC

Many rationalists are interested in blockchain. This article describes important mathematical problems related to blockchain, and potential solutions to cooperation problems and philanthropy via mechanism design (quadratic voting, quadratic funding).



Discuss

What's the largest sunk cost you let go?

24 ноября, 2019 - 07:01
Published on November 24, 2019 4:01 AM UTC



Discuss

What types of questions are welcomed on LessWrong Open Questions?

24 ноября, 2019 - 06:42
Published on November 24, 2019 3:42 AM UTC

Alternatively: what types of questions would / do you like to see here?



Discuss

New MetaEthical.AI Summary and Q&A at UC Berkeley

24 ноября, 2019 - 04:47
Published on November 24, 2019 1:47 AM UTC

Previous Intro: Formal Metaethics and Metasemantics for AI Alignment

I’m nearing the completion of a hopefully much more readable version of the ideas previously released as set-theoretic code. This takes the form of a detailed outline, currently in WorkFlowy, in which you can easily expand/collapse subsections which elaborate on their parents’ content. You can find the current draft here.

Although it’s not polished, I’m releasing it in preparation for a Q&A I’ll be holding at the University of California Berkeley AI and Philosophy working group, which I hope you will attend. I’ll likely make some brief introductory remarks but reserve most of the time for answering questions. The working group is part of the UC Berkeley Social Science Matrix and will be held at:

Barrows Hall, 8th Floor, Mezzanine Level
Wed, Dec 4th 12:30-2:30pm

Here I’ve reproduced just the first few levels of the outline. Click here to see their elaboration (currently ~4,800 words).

  • Given mathematical models of the world and the adult human brains in it, an ethical goal function for AI can be constructed by applying a social welfare function to the set of extensional rational utility functions of the brains.
    • The mathematical model of a world or brain is to be given as a causal Markov model.
      • A causal Markov model is a convenient model for generating a causal model.
        • The notion of a causal model is taken directly from Judea Pearl.
          • A causal model is composed of:
        • A causal Markov model is composed of:
        • A causal Markov model (cmm) generates a causal model (cm) as follows:
    • A brain’s rational utility function is the utility function that would be arrived at by the brain’s decision algorithm if it were to make more optimal decisions while avoiding unrelated distortions of value.
      • A brain’s decision algorithm is the one that best satisfies these desiderata:
        • First, it must take the mathematical form of a decision algorithm, which is a tuple composed of:
        • Next, there must be an implementation function which maps brain states to decision states such that these two routes from a brain state to a decision event always arrive at the same result:
        • It achieves a high rate of compression of the brain’s causal transition function.
        • It is probabilistically coherent, including with its represented causal models.
        • It is instrumentally rational in both its first-order and higher-order utility functions.
        • It is ambitious, trying to explain as much as possible with the decision algorithm.
      • The final formulation specifying the rational utility function gets rather complicated but we can build up to it with a couple initial approximations:
      • Final specification: Simulate all possible continuations of an agent and apply a social welfare function to their utility functions while weighting them by optimality of prescriptions, agential identity and likelihood.
      • The advantages of this metaethics include:
    • Extension: The rational utility function of a brain above is couched in terms of the brain’s own represented expressions, but for interpersonal comparisons, we first cash them out extensionally in terms of their referents in the world.
    • The social welfare function might be thought of as choosing a center of gravity between the extensional rational utility functions.
    • The details above form an initial prototype.

Read the full version here.



Discuss

Thoughts on Robin Hanson's AI Impacts interview

24 ноября, 2019 - 04:40
Published on November 24, 2019 1:40 AM UTC

There was already a LessWrong Post here. I started writing this as a comment there, but it got really long, so here we are! For convenience, here is the link to interview transcript and audio, in which he argues that AGI risks are modest, and that EAs spend too much time thinking about AGI. I found it very interesting and highly recommend reading / listening to it.

That said, I disagree with almost all of it. I'm going to list areas where my intuitions seem to differ from Robin's, and where I'm coming from. Needless to say, I only speak for myself, I'm not super confident about any of this, and I offer this in the spirit of "brainstorming conversation" rather than "rebuttal".

How likely is it that the transition to superhuman AGI will be overwhelmingly important for the far future?

Robin implies that the likelihood is low: "How about a book that has a whole bunch of other scenarios, one of which is AI risk which takes one chapter out of 20, and 19 other chapters on other scenarios?" I find this confusing. What are the other 19 chapter titles? See, in my mind, the main categories are that (1) technological development halts forever, or (2) AGI is overwhelmingly important for the far future, being central to everything that people and societies do (both good and bad) thereafter. I don't immediately see any plausible scenario outside of those two categories ... and of those two categories, I put most of the probability weight in (2).

I assume Robin would want one of the 20 chapters to be about whole-brain emulation (since he wrote a whole book about that), but even if whole-brain emulation happens (which I think very unlikely), I would still expect fully-artificial intelligence to be overwhelmingly important in this scenario, as soon as the emulations invent it—i.e. this would be in category 2. So anyway, if I wrote a book like that, I would spend most of the chapters talking about AGI risks, AGI opportunities, and what might happen in a post-AGI world. The rest of the chapters would include things like nuclear winter or plagues that destroy our technological civilization forever. Again, I'm curious what else Robin has in mind.

How hard is it to make progress on AGI safety now? How easy will it be in the future?

I could list off dozens of specific open research problems in AGI safety where (1) we can make real progress right now; (2) we are making real progress right now; (3) it doesn't seem like the problems will resolve themselves, or even become substantially easier, after lots more research progress towards building AGI.

Here's a few off the top of my head: (1) If we wind up building AGIs using methods similar to today's deep RL, how would we ensure that they are safe and beneficial? (This is the "prosaic AGI" research program.) (2) If we wind up building AGIs using algorithms similar to the human brain's, how would we ensure that they are safe and beneficial? (3) If we want task-limited AGIs, or norm-following AGIs, or impact-limited AGIs, or interpretable AGIs, what exactly does this mean, in terms of a specification that we can try to design to? (4) Should we be trying to build AGI agents with explicit goals, or "helper AIs", or oracles, or "microscope AIs", or "tool AIs", or what? (5) If our AGIs have explicit goals, what should the goal be? (6) Max Tegmark's book lists 12 "AI aftermath scenarios"; what post-AGI world do we want, and what AGI research, strategy, and policies will help us get there? ...

Robin suggests that there will be far more work to do on AGI safety in the future, when we know what we're building, we're actually building it, and we have to build it right. I agree with that 100%. But I would phrase it as "even more" work to do in the future, as opposed to implying that there is not much to do right now.

How soon are high-leverage decision points?

Robin suggests that we should have a few AGI safety people on Earth, and their role should be keeping an eye on developments to learn when it's time to start real work, and that time has not yet arrived. On the contrary, I see key, high-leverage decision points swooshing by us as we speak.

The type of AI research we do today will determine the type of AGI we wind up building tomorrow; and some AGI architectures are bound to create worse safety & coordination problems than others. The sooner we establish that a long-term research program is leading towards a problematic type of AGI, the easier it is for the world to coordinate on not proceeding in that research program. On one extreme, if this problematic research program is still decades away from fruition, then not pursuing it (in favor of a different path to AGI) seems pretty feasible, once we have a good solid argument for why it's problematic. On the opposite extreme, if this research program has gotten all the way to working AGI code posted on GitHub, well good luck getting the whole world to agree not to run it!

How much warning will we have before AGI? How much do we need?

Lots of AGI safety questions seem hard (particularly, "How do we make an AGI that robustly does what we want it to do, even as it becomes arbitrarily capable and knowledgeable?", and also see the list a few paragraphs above). It's unclear what the answers will look like, indeed it's not yet proven that solutions even exist. (After all, we only have one example of an AGI, i.e. humans, and they display all sorts of bizarre and destructive behaviors.) When we have a misbehaving AGI right in front of us, with a reproducible problem, that doesn't mean that we will know how to fix it.

Thus, I see it as entirely possible that AIs develop gradually into more and more powerful AGIs over the course of a decade or two, and with each passing year, we see worse and worse out-of-control-AGI accidents. Each time, people have lots of ideas about what the solution is, and none of them work, or the ones that work also make the AGI less effective, and so people keep experimenting with the more powerful designs. And the accidents keep getting worse. And then some countries try to regulate AGI research, while others tell themselves that if only the AGI were even more capable, then the safety problems would resolve themselves because the AGI would understand humans better, and hey it can even help chase down and destroy those less-competent out-of-control AGIs from last year that are still self-reproducing around the internet. And the accidents get even worse still ... and on and on...

This is the kind of thing I have in mind when I say that even a very gradual development of AGI poses catastrophic risks. (I'm not saying anything original here; this is really the standard argument that if AGI takes N years, and AGI safety research takes N+5 years, then we're in a bad situation ... I'm just trying to make that process more vivid.) Note that I gave an example focused on catastrophic accidents, but of course risk is disjunctive. In particular, in slow-takeoff scenarios, I often think about coordination problems / competitive pressures leading us to a post-AGI world that nobody wanted.

That said, I do also think that fast takeoff is a real possibility, i.e. that we may well get very powerful and dangerous AGI with little or no warning, as we improve learning-and-reasoning algorithms. Humans have built a lot of tools to amplify our intellectual power, and maybe "AGI code version 4" can really effectively take advantage of them, while "AGI code version 3" can't really get much out of them. By "tools" I am thinking of things like coding (recursive self-improvement, writing new modules, interfacing with preexisting software and code), taking in human knowledge (reading and deeply understanding books, videos, wikipedia, etc., a.k.a. "content overhang") , computing hardware (self-reproduction / seizing more computing power, a.k.a. "hardware overhang"), the ability of humans to coordinate and cooperate (social manipulation, earning money, etc.) and so on. It's hard to say how gradual the transition will be between not getting much out of these "tools" versus really being able to use them to their full potential, and don't see why a fast transition (weeks or months) should be ruled out. In fact, I see a fast transition as reasonably likely, for inside-view reasons that I haven't articulated and am not terribly confident about. (Further reading.) (Also relevant: Paul Christiano is well-known around here for arguing in favor of slow takeoff ... but he still assigns 30% chance of fast takeoff.)

Robin had a lot of interesting arguments in favor of slow takeoff (and long timelines, see below). He offered some inside-view arguments about the nature of intelligence and AGI, which I would counter with different inside-view arguments about the nature of intelligence and AGI, but that's beyond the scope of this post.

Robin also offered an outside-view argument, related to the statistics of citations in different fields—what fraction of papers get what fraction of citations? The statistics are interesting, but I don't think they shed light on the questions at issue. Take the Poincaré conjecture, which for 100 years was unproven, then all of the sudden in 2002, a reclusive genius (Perelman) announced a proof. In hindsight, we can say that the theorem was proved gradually, with Perelman building on Hamilton's ideas from the 1980s. But really, nobody knew if Hamilton's ideas were on the right track, or how many steps away from a proof we were, until bam, a proof appeared. Likewise, no one knew how far away heavier-than-air flight was until the Wright Brothers announced that they had already done it (and indeed, people wouldn't believe them even after their public demonstrations). Will AGI be like that? Or will it be like Linux, developing from almost-useless to super-useful very very gradually and openly? The fact that citations are widely distributed among different papers is not incompatible with the existence of occasional sudden advances from private projects like Perelman or the Wright Brothers—indeed, these citation statistics hold in math and engineering just like everything else. The citation statistics just mean that academic fields are diverse, with lots of people working on different problems using different techniques ... which is something we already knew.

Timelines; Are we "crying wolf" about AGI?

Robin says he sees a lot of arguments that we should work on AGI prep because AGI is definitely coming soon, and that this is "crying wolf" that will discredit the field when AGI doesn't come soon. My experience is different. Pretty much all the material I read advocating for AGI safety & policy, from both inside and outside the field, is scrupulously careful to say that they do not know with confidence when we'll get AGI, and that this work is important and appropriate regardless of timelines. That doesn't mean Robin is wrong; I presume we're reading different things. I'm sure that people on the internet have said all kinds of crazy things about AGI. Oh well, what can you do?

It does seem to be an open secret that many of the people working full-time on AGI safety & policy assign a pretty high probability to AGI coming soon (say, within 10 or 20 years, or at least within their lifetimes, as opposed to centuries). I put myself in that category too. This is naturally to be expected from self-selection effects.

Again, I have inside-view reasons for privately believing that AGI has a reasonable chance of coming "soon" (as defined above), that I won't get into here. I'm not sure that this belief is especially communicable, or defensible. The party line, that "nobody knows when AGI is coming", is a lot more defensible. I am definitely willing to believe and defend the statement "nobody knows when AGI is coming" over an alternative statement "AGI is definitely not going to happen in the next 20 years". OK, well Robin didn't exactly say the latter statement, but he kinda gave that impression (and sorry if I'm putting words in his mouth). Anyway, I have pretty high confidence that the latter statement is unjustifiable. We even have good outside-view support for the statement "People declaring that a particular technology definitely will or won't be developed by a particular date have a terrible track-record and should be disbelieved." (see examples in There's No Fire Alarm For AGI). We don't know how many revolutionary insights lie between us and AGI, or how quickly they will come, we don't know how many lines of code need to be written (or how many ASICs need to be spun), and how long it will take to debug. We don't know any of these things. I've heard lots of prestigious domain experts talk about what steps are needed to get to AGI, and they all say different things. And they could all be wrong anyway—none of them has built an AGI! (The first viable airplane was built by the then-obscure Wright Brothers, who had better ideas than the then-most-prestigious domain experts.) Robin hasn't built an AGI either, and neither have I. Best to be humble.



Discuss

The Bus Ticket Theory of Genius

24 ноября, 2019 - 01:12
Published on November 23, 2019 10:12 PM UTC



Discuss

Acting without a clear direction

23 ноября, 2019 - 22:19
Published on November 23, 2019 7:19 PM UTC

One of the key questions that we all face is to figure out a purpose in life, a direction, a goal. However, that is not an easy question. In fact, it's not even easy to say what kind of thing human values are in general. Our brain is fragmented in so many different ways: past, present and future preferences; emotional, intuitive and cognitive systems and multiple layers of meta-preferences. Given the tangle of confusion and that a solution that seems right for me might not seem right for you, I would suggest that finding a pragmatic approach to this problem might be even more important than actually trying to solve it.

I would propose that contrary to current rationalist wisdom trying to pull some kind of consistent utility function out of this can be counterproductive. I've honestly burned up far too many brain cycles trying to do this; sometimes there is value in just doing something and forgetting about optimality, given that utility pumps are quite rare and people tend to catch on anyway.

Consider the following: What should we optimise for personal utility or our values? Assume that we include the utility we gain from achieving our values in personal utility. If you believe in moral realism, then you have an obvious reason to pursue your values even when it doesn't benefit you, but what about otherwise? Should you take the self-centered approach of only caring about your values insofar as they seem likely to provide you with utility?

Different parts of you want different things - your hedonic component would be quite satisfied with this solution, the part of you that has values outside of yourself would not be. If we have no real resolution about which part deserves precedent, then a sensible default would be to assign value to both.

This gives us a reason to move past pure hedonism (or hedonism + values as instrumental for hedons), but do we have a reason to go any further? After all, there's a significant difference between merely attempting to realise your other-directed values and being deeply committed to achieving them.

Maybe we don't have any reason from a principled perspective, but there are strong reasons from an instrumental (and admittedly self-directed) perspective. Firstly, if we aren't committed to a goal, we'll be unlikely to achieve it even when we easily could have, we won't value success and even small efforts are likely to be draining. Making a lukewarm effort may seem like a natural response to this uncertainty, but it is usually a terrible deal. Secondly, the ups and downs of life are such that we are almost guaranteed to have periods where our experience is terrible. If we have some kind of purpose, then we'll at least have something to hold onto, some way of ensuring that our internal narrative doesn't just generate more suffering for ourselves. Thirdly, we avoid the nihilism or detachment that are incredibly damaging for most people's psyche. Again, lukewarm goals don't help here as they'll feel clearly purposeless.

Even if much of the motivation might end up being from these instrumental arguments, I think that it is important that not all of it is. If that were the case, then I suspect that pursuing the goals would likely end up feeling pointless (pursuing a goal for the purpose of having a goal) or disingenuous. In other words, the instrumental reasons by themselves don't necessarily deliver the instrumental benefits.



Discuss

Страницы